Essay Example on BackGround Convolutional Neural Networks ConvNets or CNN








BackGround Convolutional Neural Networks ConvNets or CNN Convolutional Neural networks provides vision to computers in other words CNN are used to recognize images by converting the original image through several layers to a class score They consist of neurons with learnable weights and biases which process data and probably pass it through nonlinear functions CNN was inspired by the visual cortex in the brain Similar to the visual cortex of the brain in a CNN recognize images when a series of layers of neurons gets activated and each layer will detect a set of features such as lines edges CNN are basically divided into two parts which are feature learning and classification The featuring learning part has convolution layers Relu layers and pooling layers The classification part has fully connected layers and softmax function The objective of a Convolution layer is to extract features of the input image This layer accepts a three dimensional input naturally an image with height width and depth It consists of a set of K three dimensional filters with sides according kernel size filter size The filters are basically a sets of neurons with weights and a bias which handle data just like in a routine neural network and are responsible for detecting features The output volume formed by sliding the filter over the image is called feature map By having numerous filters working on the image different filters can be educated to look for different features

Having different filters sizes in different layers we can easily construct a model that will detects features at different scales in the image The ReLU Layer is responsible for performing an elementwise activation function called max 0 x which makes negative values to zeros basically thresholding at zero This layer does not changes to the size of its input Pool Layer is responsible for reducing the spatial dimensions of the input and the computational complexity of the model It operates autonomously on every depth slice of the input There are various functions but Max pooling is the most used kind of pooling which only takes the most important part which is the value of the brightest pixel of the given input volume Fully connected layers are responsible for connecting every neuron in one layer to every other neuron in next layer The final fully connected layer uses a softmax activation function for classifying the produced features of the input image into several classes based on the obtained score Resnet ResNet is a Convolutional Neural Network which is developed by the MSRA Microsoft Research Asia in 2015 Kaiming He et al ResNet network has achieved the ImageNet Challenge ILSVRC 2015 in each and every category with big margin over others This is the first architecture that addresses the problem of training deeper convolutional neural network with a residual learning structures The depth of a convoluted neural network has been shown to be critically important During the development of larger and deeper models a degradation problem has been exposed As the network grows deeper its accuracy tends to saturate and degrade quickly This type Convolutional Neural Network has addressed degradation problem be using residual blocks which yielded better accuracy and at the same time easy to optimize One basic residual block is a combination of three stacked layers the first one is a 1x1 convolution layer to make a dimensionality reduction which lower number of feature maps next one is a 3x3 convolution layer and the last one is a 1x1 convolution layer that will recover the original dimensionality to make suitable sum with the identity map Dataset

We need huge number of examples to train a deep neural network which makes preparation of dataset a challenging task Since we are using a pre trained network for our training we required less number of examples than the example required to train the network from the scratch For our application the model has to predict the people which are closer to the subject more accurately than the people farer away from the subject The data set was carefully chosen to have examples which more likely the model will try to predict after training The source images for the dataset are from Google Images Flicker Pascal voc and ImageNet databases we used keywords such as person people and human and downloaded the all From the downloaded images we filtered images which the model most is likely to predict after training After filtering there were about 6000 images chosen and each image was hand labelled with a tool called LabelImg in pascalVoc format However images were separated into three datasets one for training validation and for testing to the built model About 75 which is 4500 images were in the training dataset 10 which is about 600 images are in the validation dataset Test image set consist of about 15 which is 900 images Datasets with labels in pascal VOC format can be downloaded using the following link

Write and Proofread Your Essay
With Noplag Writing Assistance App

Plagiarism Checker

Spell Checker

Virtual Writing Assistant

Grammar Checker

Citation Assistance

Smart Online Editor

Start Writing Now

Start Writing like a PRO