Hello guys, welcome back to my blog. In this article, I will discuss what is CNN(Convolution Neural Network), how a computer reads an image, why not fully connected networks for image recognition, how convolutional neural network works, etc. Each and everything I will try to explain in a simple way.
If you need an article on some other topics then comment us below in the comment section. You can also catch me @ Instagram – Chetan Shidling.
How Computer Reads An Image?
The above image shows how the computer reads an image. The above image consists of three channels R, G, and B. Therefore, the size of the image should be (A*B*3). This is how the computer reads an image.
Why Not Fully Connected Networks?
The fully connected network is not used because as you can see in the above image, for an image with 28*28*3 pixels, the number weights in the first hidden layer will be 2352 and if the size of 200*200*3 pixels, the number of weights in the first hidden layer will be 120,000. therefore if the network has a large number of the parameter will suffer from a problem such as chances of overfitting, slower training time, etc. So, CNN is used because it reduces the image matrix to the lower dimension.
Why Convolutional Neural Network?
The Convolutional Neural Network architecture is used for the analysis of images from the past few years. It is specially designed to process pixel data. By using CNN technic the image dimension can be reduced and the system output will be accurate.
What Is Convolutional Neural Network?
The Convolutional Neural Network is one of the popular architecture. The CNN consist of some layers such as convolution, pooling, ReLU, fully connected. With the help of a convolutional neural network, we can use a large amount of data more accurately and effectively.
How CNN(Convolutional Neural Network) Works?
CNN has the following layers:
02. ReLU Layer
04. Fully Connected
Let’s take an example of X and O image and learn how CNN works.
The image X and O will always be not the same, they sometimes look different which is very challenging for us as you see in the below image.
The CNN figure out whether the image is X or O by computing some steps. If the input image is X then it classifies X as output.
Well, guys, i will give one image of X. The computer understands an image using numbers at each pixel. In this example, I will consider the black pixel will have value 1 and the white pixel will have the value -1.
Using some techniques, the computer compares these images.
The Convolution Neural Network compares the piece of the image by piece. The pieces that it looks for are called features. By finding rough feature matches, in roughly the same position in two images, CNN gets a lot better at seeing similarity than whole image matching schemes.
These are small pieces of the bigger image. We choose a feature and put it on the input image if it matches then the image is classified correctly.
01. Convolution Layer
Here we will move the feature/filter to every possible position on the image. Step -1: Line up the feature and the image. Step -2: Multiply each image pixel by the corresponding feature pixel.
Multiplying the corresponding pixel values.
Adding and Dividing by the total number of pixels.
Now, using the same feature and move it to another location and perform the filtering again.
Now we will put the value of the filter at that position.
Similarly, we will move the feature to every other position of the image and will see how the feature matches that area. Finally, we will get an output as:
02. ReLU Layer
The Rectified Linear Unit, in this layer we remove every negative value from the filtered images and replaces it with zero’s. This is done to avoid the values from summing up to zeros.
03. Pooling Layer
In this layer, we shrink the image stack into a smaller size.
a. Pick a window size (usually 2 or3).
b. Pick a stride (usually 2).
c. Walk your window across your filtered images.
d. From each window, take the maximum value.
Let’s perform pooling with a windows size 2 and a stride 2.
Moving the window across the entire image
Output After Passing Through Pooling Layer
Stacking up the layers
04. Fully Connected Layer
This is the final layer where the actual classification happens. Here we take our filtered and shrinked images and put them into a single list.
When we feed in,’ X’ and “O”. Then there will be some element in the vector that will be high. Consider the image below, as you can see for ‘X’ there are different elements that are high and similarly, for ‘O’ we have different elements that are high.
Consider the below list of new input images.
Let’s compare with X & O and check the output.
The input image is classified as X.
I hope this article may help you all a lot. Thank you for reading.