An Introduction To Neural Networks Using Google’s Tensorflow

Visualization of a Convolutional Neural Network. image source

Tensorflow is a framework developed by Google Brain Team for dataflow programming across a range of tasks, used also for deep learning problems such as neural networks.

Why Tensor? It’s a way of representing data in Deep Learning. A Tensor is a multi-dimensional array. It could be 1 dimensional — a single number, or 2 — a square of numbers, or 3 — a cube of numbers or as much as you can imagine.

The dimension of a Tensor is its rank.

In Tensorflow, we have constants, placeholders, and variables to define input data, class labels, weights, and biases. Constants which takes no input, stores constant values and outputs them when needed; Placeholders which allows you to feed input on the run of your computational graph. And Variables that modifies the graph such that it can produce new outputs with respect to the same inputs.

We are going to work with Fashion-MNIST dataset by Zalando to put in context our understanding of Convolutional Neural Network.

So, we begin by importing the necessary modules, especially the Tensorflow module.

And then, we import the dataset, taking into our observation the argument, one_hot=True which converts the categorical data into a vector of numbers.

It’s always good to analyze our datasets to get some basic intuition. And so, we do that now, by studying the shapes of both the training and testing datasets.

From the results, we found out that the training and testing data has 55000 and 10000 samples respectively, each of 784-dimensional vector. The vector is simply a stretched out 28 x 28-dimensional matrix. This will be reshaped to a 28x28x1-dimensional matrix in order to be fed to the CNN model.

We will now create a dictionary to hold the class names with the associated class labels.

And take a look at the images in the dataset.

The images are already rescaled between 0 and 1, so there’s no need to rescale them. We can visualize them, and also check out the maximum and minimum value of the matrix.

Reshape the images to size 28 x 28 x 1 in order to be fed as input into the network.

Put the training and testing labels in separate variables and print their respective shapes.

For the Convolutional Neural Network, we will use three convolutional layers: The first layer having 32–3x3 filters, the second layer 64–3x3 filters and the third 128–3x3 filters. In addition, there will be three max-pooling layers each of size 2x2.

We will start off by defining the learning rate, the training iterations and the batch size, having in mind that these are hyperparameters and hence, have no fixed values. Training iterations to indicate the number of times you train your network, learning rate to reduce the loss and converge the local optima, and batch size which is simply a fixed number of images to be trained at every batch.

For the network parameters, we first define the number of inputs — 28x28x1 matrix which is the reshaped 784-dimensional vector — and then the number of classes which is simply the number of class labels

We will define our placeholders. Recall that they allow us to perform operations and build our computational graphs without feeding data into it. So, we will have an input placeholder x with dimension None x 784, and output placeholder with dimension None x 10. y, holding the label of the training images will have the dimension None x 10. The row dimension is None because we have defined the batch size which tells placeholders that they will receive this dimension at the time we feed the data to them.

It is good to define convolution and max-pooling functions which we can call multiple times whenever we need them in our network.

The convolution function, conv2d() has four arguments: input x, weights W, bias b, and strides. The last argument is set to 1, but can be any value; the first and last stride must always be 1 because the first is for image-number and the last for input-channel. After applying the convolution, we will add bias and apply an activation function, the Rectified Linear Unit(ReLU).

The max-pooling function, maxpool2d(), has the input x, and the kernel size, k, set to 2, meaning that the max-pooling filter will be a square matrix with dimensions 2 x 2 and the stride by which the filter will move in is also 2. Your padding will equal to SAME. This makes sure that while performing the convolution operations, the boundary pixels of the image are not left out; it basically adds zeros at the boundaries of the input and allows the convolution filter to access the boundary pixels as well. The same with the max-pooling operations — it will add zeros. When the weights and biases are defined, we will notice that an input of size 28x28 is downsampled to 4x4 after applying three max-pooling layers.

After the definition of the conv2d() and maxpool2d() wrappers, you can now define your weights and biases variables. You’ll create two dictionaries, one for weights, the other for biases. Recall that the first convolution layer has 32–3x3 filters, so the first key, wc1 in the weights dictionary has an argument shape that takes a tuple of 4 values: the first and second are the filter size, while the third is the number of channels in the input image and the last represents the number of convolution filters you want in the convolution layer. The first key in biases dictionary, bc1, will have 32 bias parameters. Similarly, the second key, wc2 of the weights dictionary has a shape parameter which will take a tuple of 4 values — the first and second refer to the filter size, the third the number of channels from the previous input. Since we passed 32 convolution filters on the input image, we will have 32 channels as an output from the first convolution layer operation. The last value represents the number of filters you want in the convolution filter. Note that the second key in biases dictionary, bc2, will have 64 parameters.

Repeat the same for the third convolution layer. But take care to understand the fourth key, wd1. After applying 3 convolution and max-pooling operations, you are downsampling the input image from 28 x 28 x 1 to 4 x 4 x 1 and you need to flatten this downsampled output to feed this as input to the fully connected layer. This is why you have to do the multiplication operation which is the output of the previous layer or number of channels that are outputted by the convolution layer 3. The second element of the tuple that you passed to shape has the number of neurons that you want in the fully connected layer. Similarly, in the biases dictionary, the fourth key, bd1 has 128 parameters.

Now, let us look at the network architecture. The conv_net() function takes 3 arguments, the input, x, and the dictionaries, weights, and biases. First, you reshape the 784-dimensional input vector to a 28x28x1 matrix, the -1 in the reshape() meaning that it will infer the first dimension on its own but the rest are fixed — 28 x 28 x 1. Next, we will define conv1 which takes input as an image, weights wc1 and biases bc1. Then, we apply max-pooling on the output of conv1 and we will continue this until conv3. After we pass through all the convolution and max-pooling layers, we will flatten the output of conv3, then connect the flattened conv3 neurons with each and every neuron in the next layer. Then we will apply activation function on the output of the fully connected layer, fc1. Finally, in the last layer, we will have 10 neurons since we have to classify 10 labels. This means that we will connect all the neurons of fc1 in the output layer with 10 neurons in the last layer.

Then, we construct a model and call the conv_net() function by passing in input x, weights, and biases. The problem being that of multi-class classification, we will use softmax activation on the output layer to give us the probabilities for each class label, the loss function being cross entropy. Why cross-entropy as a loss function? Because the cross-entropy function’s value is always positive and tends to zero as the neuron gets better at computing the desired output, y, for all training inputs, x. It avoids the problem of learning slowing down which means that if the weights and biases are initialized in a wrong fashion, even then, it helps in recovering faster and does not disturb much the training phase. Both the activation and the cross-entropy loss functions are defined in one line, here in Tensorflow. We pass two parameters which are then predicted output and the ground truth label, y. We will then take the mean, reduce_mean() over all the batches to get a single loss/ cost value. Next, we define the Adam optimizer, a popular optimization algorithm, and then specify the learning rate by explicitly stating minimize cost that we had calculated in the previous step.

Whew! We have gone a long way already.

But we will soon be over with this. Trust me.

Now, we will test our model. We will define two more nodes, correct_prediciton and accuracy. This will evaluate our model after every training iteration which will help us keep track of the performance of our model.

Have in mind that our weights and biases are variables and that we have to initialize them before we can make use of them. So, let’s do that.

Now when training and testing our model in Tensorflow, we will go through these steps: We begin with launching the graph, a class that runs all the Tensorflow operations and launches the graph in a session. All the operations have to be within the indentation; then we run the session which will execute the variables that were initialized in the previous step and evaluates the tensor; after that, we define a for loop that runs for the number of training iterations we had specified in the beginning. Right after that, we will initiate a second for loop which is the number of batches that we will have based on the batch size we chose, so we divide the total number of images by the batch size. We will then input the images based on the batch size we pass in batch_x and their respective labels in batch_y. The crucial step — just like we ran the initializer after creating the graph, we will now feed the placeholders x, and y the actual data in a dictionary and run the session by passing the cost and the accuracy that we had defined earlier. It returns the loss(cost) and accuracy. We can choose to print the loss and the training accuracy after each epoch — that is, each training iteration — is completed. After each training iteration is completed, we run only the accuracy by passing in all the 10,000 test images and labels to give us an idea of how accurately our model is performing while it is training. It is a good practice to test once our model is trained completely and validate only while it is in the training phase after each epoch.

From the output of the training and testing, the test accuracy looks impressive. Not that bad for a start.

This ends our first introduction to the world of Convolution Neural Networks.

You can find the link to the code in my Github. And your feedback as to improving the model, by reducing the overfitting and improving the testing accuracy is highly appreciated.

Artificial Intelligence | Wisdom Eternal | Chess

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store