# An Introduction To Neural Networks Using Google’s Tensorflow

Tensorflow is a framework developed by Google Brain Team for dataflow programming across a range of tasks, used also for deep learning problems such as neural networks.

Why Tensor? It’s a way of representing data in Deep Learning. A Tensor is a multi-dimensional array. It could be 1 dimensional — a single number, or 2 — a square of numbers, or 3 — a cube of numbers or as much as you can imagine.

The dimension of a Tensor is its rank.

In Tensorflow, we have constants, placeholders, and variables to define input data, class labels, weights, and biases. **Constants** which takes no input, stores constant values and outputs them when needed; **Placeholders** which allows you to feed input on the run of your computational graph. And **Variables** that modifies the graph such that it can produce new outputs with respect to the same inputs.

We are going to work with Fashion-MNIST dataset by Zalando to put in context our understanding of Convolutional Neural Network.

So, we begin by importing the necessary modules, especially the Tensorflow module.

`# Import libraries `

import numpy as np

import matplotlib.pyplot as plt

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data

%matplotlib inline

import os

os.environ["CUDA_VISIBLE_DEVICES"]="0" #for training on gpu

And then, we import the dataset, taking into our observation the argument, *one_hot=True *which* *converts the categorical data into a vector of numbers.

`data = input_data.read_data_sets('data/fashion',one_hot=True)`

It’s always good to analyze our datasets to get some basic intuition. And so, we do that now, by studying the shapes of both the training and testing datasets.

# Shapes of training set

print("Training set (images) shape: {shape}".format(shape=data.train.images.shape))

print("Training set (labels) shape: {shape}".format(shape=data.train.labels.shape)) # Shapes of test set

print("Test set (images) shape

{shape}".format(shape=data.test.images.shape))

print("Test set (labels) shape: {shape}".format(shape=data.test.labels.shape))

From the results, we found out that the training and testing data has 55000 and 10000 samples respectively, each of 784-dimensional vector. The vector is simply a stretched out 28 x 28-dimensional matrix. This will be reshaped to a 28x28x1-dimensional matrix in order to be fed to the CNN model.

We will now create a dictionary to hold the class names with the associated class labels.

# Create dictionary of target classes label_dict = { 0: 'T-shirt/top', 1: 'Trouser', 2: 'Pullover', 3: 'Dress', 4: 'Coat', 5: 'Sandal', 6: 'Shirt', 7: 'Sneaker', 8: 'Bag', 9: 'Ankle boot', }

And take a look at the images in the dataset.

plt.figure(figsize=[5,5])

# Display the first image in training data

plt.subplot(121)

curr_img = np.reshape(data.train.images[0], (28,28))

curr_lbl = np.argmax(data.train.labels[0,:])

plt.imshow(curr_img, cmap='gray')

plt.title("(Label: " + str(label_dict[curr_lbl]) + ")") # Display the first image in testing data

plt.subplot(122)

curr_img = np.reshape(data.test.images[0], (28,28))

curr_lbl = np.argmax(data.test.labels[0,:])

plt.imshow(curr_img, cmap='gray')

plt.title("(Label: " + str(label_dict[curr_lbl]) + ")")

The images are already rescaled between 0 and 1, so there’s no need to rescale them. We can visualize them, and also check out the maximum and minimum value of the matrix.

print(data.train.images[0])print(np.max(data.train.images[0]))print(np.min(data.train.images[0]))

Reshape the images to size 28 x 28 x 1 in order to be fed as input into the network.

# Reshape training and testing image

train_X = data.train.images.reshape(-1, 28, 28, 1)

test_X = data.test.images.reshape(-1,28,28,1)train_X.shape, test_X.shape

Put the training and testing labels in separate variables and print their respective shapes.

train_y = data.train.labels

test_y = data.test.labelstrain_y.shape, test_y.shape

For the Convolutional Neural Network, we will use three convolutional layers: The first layer having 32–3x3 filters, the second layer 64–3x3 filters and the third 128–3x3 filters. In addition, there will be three max-pooling layers each of size 2x2.

We will start off by defining the **learning rate**, the **training iterations** and the **batch size**, having in mind that these are hyperparameters and hence, have no fixed values. *Training iterations* to indicate the number of times you train your network, *learning rate* to reduce the loss and converge the local optima, and *batch size* which is simply a fixed number of images to be trained at every batch.

`training_iters = 200`

learning_rate = 0.001

batch_size = 128

For the network parameters, we first define the number of inputs — 28x28x1 matrix which is the reshaped 784-dimensional vector — and then the number of classes which is simply the number of class labels

# MNIST data input (img shape: 28*28)

n_input = 28 # MNIST total classes (0–9 digits)

n_classes = 10

We will define our placeholders. Recall that they allow us to perform operations and build our computational graphs without feeding data into it. So, we will have an input placeholder **x** with dimension **None x 784**, and output placeholder with dimension **None x 10**. **y**, holding the label of the training images will have the dimension **None x 10**. The row dimension is None because we have defined the batch size which tells placeholders that they will receive this dimension at the time we feed the data to them.

`#both placeholders are of type float `

x = tf.placeholder("float", [None, 28,28,1])

y = tf.placeholder("float", [None, n_classes])

It is good to define convolution and max-pooling functions which we can call multiple times whenever we need them in our network.

The convolution function, **conv2d()** has four arguments: input **x**, weights **W**, bias **b,** and **strides**. The last argument is set to 1, but can be any value; the first and last stride must always be 1 because the first is for image-number and the last for input-channel. After applying the convolution, we will add bias and apply an activation function, the Rectified Linear Unit(ReLU).

The max-pooling function, **maxpool2d()**, has the input** x, **and the kernel size, **k**, set to 2, meaning that the max-pooling filter will be a square matrix with dimensions 2 x 2 and the stride by which the filter will move in is also 2. Your padding will equal to SAME. This makes sure that while performing the convolution operations, the boundary pixels of the image are not left out; it basically adds zeros at the boundaries of the input and allows the convolution filter to access the boundary pixels as well. The same with the max-pooling operations — it will add zeros. When the weights and biases are defined, we will notice that an input of size 28x28 is downsampled to 4x4 after applying three max-pooling layers.

def conv2d(x, W, b, strides=1):

# Conv2D wrapper, with bias and relu activation

tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')

x = tf.nn.bias_add(x, b)

return tf.nn.relu(x) def maxpool2d(x, k=2):

return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],padding='SAME')

After the definition of the **conv2d()** and **maxpool2d()** wrappers, you can now define your **weights** and **biases** variables. You’ll create two dictionaries, one for weights, the other for biases. Recall that the first convolution layer has 32–3x3 filters, so the first key, **wc1** in the **weights** dictionary has an argument shape that takes a tuple of 4 values: the first and second are the filter size, while the third is the number of channels in the input image and the last represents the number of convolution filters you want in the convolution layer. The first key in **biases** dictionary, **bc1**, will have 32 bias parameters. Similarly, the second key, **wc2** of the **weights** dictionary has a shape parameter which will take a tuple of 4 values — the first and second refer to the filter size, the third the number of channels from the previous input. Since we passed 32 convolution filters on the input image, we will have 32 channels as an output from the first convolution layer operation. The last value represents the number of filters you want in the convolution filter. Note that the second key in biases dictionary,** bc2**, will have 64 parameters.

Repeat the same for the third convolution layer. But take care to understand the fourth key, **wd1**. After applying 3 convolution and max-pooling operations, you are downsampling the input image from 28 x 28 x 1 to 4 x 4 x 1 and you need to flatten this downsampled output to feed this as input to the fully connected layer. This is why you have to do the multiplication operation which is the output of the previous layer or number of channels that are outputted by the convolution layer 3. The second element of the tuple that you passed to shape has the number of neurons that you want in the fully connected layer. Similarly, in the **biases** dictionary, the fourth key, **bd1** has 128 parameters.

weights = { 'wc1': tf.get_variable('W0', shape=(3,3,1,32), initializer=tf.contrib.layers.xavier_initializer()),

'wc2': tf.get_variable('W1', shape=(3,3,32,64), initializer=tf.contrib.layers.xavier_initializer()),

'wc3': tf.get_variable('W2', shape=(3,3,64,128), initializer=tf.contrib.layers.xavier_initializer()),

'wd1': tf.get_variable('W3', shape=(4*4*128,128), initializer=tf.contrib.layers.xavier_initializer()),

'out': tf.get_variable('W6', shape=(128,n_classes), initializer=tf.contrib.layers.xavier_initializer()),

}

biases = { 'bc1': tf.get_variable('B0', shape=(32), initializer=tf.contrib.layers.xavier_initializer()),

'bc2': tf.get_variable('B1', shape=(64), initializer=tf.contrib.layers.xavier_initializer()),

'bc3': tf.get_variable('B2', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),

'bd1': tf.get_variable('B3', shape=(128), initializer=tf.contrib.layers.xavier_initializer()),

'out': tf.get_variable('B4', shape=(10), initializer=tf.contrib.layers.xavier_initializer()),

}

Now, let us look at the network architecture. The **conv_net()** function takes 3 arguments, the input,** x**, and the dictionaries, **weights**, and **biases**. First, you reshape the 784-dimensional input vector to a 28x28x1 matrix, the **-1** in the **reshape()** meaning that it will infer the first dimension on its own but the rest are fixed — 28 x 28 x 1. Next, we will define **conv1** which takes input as an image, weights **wc1** and biases **bc1**. Then, we apply max-pooling on the output of **conv1** and we will continue this until **conv3**. After we pass through all the convolution and max-pooling layers, we will flatten the output of **conv3**, then connect the flattened **conv3** neurons with each and every neuron in the next layer. Then we will apply activation function on the output of the fully connected layer, **fc1**. Finally, in the last layer, we will have 10 neurons since we have to classify 10 labels. This means that we will connect all the neurons of **fc1** in the output layer with 10 neurons in the last layer.

def conv_net(x, weights, biases): # here we call the conv2d function we had defined above and pass the input image x, weights wc1 and bias bc1.

conv1 = conv2d(x, weights['wc1'], biases['bc1'])

# Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 14*14 matrix.

conv1 = maxpool2d(conv1, k=2)

# Convolution Layer

# here we call the conv2d function we had defined above and pass the input image x, weights wc2 and bias bc2.

conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])

# Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 7*7 matrix.

conv2 = maxpool2d(conv2, k=2)

conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])

# Max Pooling (down-sampling), this chooses the max value from a 2*2 matrix window and outputs a 4*4.

conv3 = maxpool2d(conv3, k=2)

# Fully connected layer

# Reshape conv2 output to fit fully connected layer input

fc1 = tf.reshape(conv3, [-1,weights['wd1'].get_shape().as_list()[0]])

fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])

fc1 = tf.nn.relu(fc1)

# Output, class prediction

# finally we multiply the fully connected layer with the weights and add a bias term.

out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])

return out

Then, we construct a model and call the **conv_net()** function by passing in input **x**, **weights, **and **biases**. The problem being that of multi-class classification, we will use softmax activation on the output layer to give us the probabilities for each class label, the loss function being cross entropy. Why cross-entropy as a loss function? Because the cross-entropy function’s value is always positive and tends to zero as the neuron gets better at computing the desired output, **y**, for all training inputs, **x**. It avoids the problem of learning slowing down which means that if the weights and biases are initialized in a wrong fashion, even then, it helps in recovering faster and does not disturb much the training phase. Both the activation and the cross-entropy loss functions are defined in one line, here in Tensorflow. We pass two parameters which are then predicted output and the ground truth label, y. We will then take the mean, **reduce_mean() **over all the batches to get a single loss/ cost value. Next, we define the Adam optimizer, a popular optimization algorithm, and then specify the learning rate by explicitly stating minimize cost that we had calculated in the previous step.

`pred = conv_net(x, weights, biases) `

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

logits=pred, labels=y))

optimizer = tf.train.AdamOptimizer(

learning_rate=learning_rate).minimize(cost)

Whew! We have gone a long way already.

But we will soon be over with this. Trust me.

Now, we will test our model. We will define two more nodes, **correct_prediciton** and **accuracy**. This will evaluate our model after every training iteration which will help us keep track of the performance of our model.

#Here you check whether the index of the maximum value of the predicted image is equal to the actual labelled image. and both will be a column vector.

correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) #calculate accuracy across all the given images and average them out.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Have in mind that our weights and biases are variables and that we have to initialize them before we can make use of them. So, let’s do that.

`# Initializing the variables `

init = tf.global_variables_initializer()

Now when training and testing our model in Tensorflow, we will go through these steps: We begin with launching the graph, a class that runs all the Tensorflow operations and launches the graph in a session. All the operations have to be within the indentation; then we run the session which will execute the variables that were initialized in the previous step and evaluates the tensor; after that, we define a for loop that runs for the number of training iterations we had specified in the beginning. Right after that, we will initiate a second for loop which is the number of batches that we will have based on the batch size we chose, so we divide the total number of images by the batch size. We will then input the images based on the batch size we pass in **batch_x** and their respective labels in **batch_y**. The crucial step — just like we ran the initializer after creating the graph, we will now feed the placeholders **x,** and **y** the actual data in a dictionary and run the session by passing the cost and the accuracy that we had defined earlier. It returns the loss(cost) and accuracy. We can choose to print the loss and the training accuracy after each epoch — that is, each training iteration — is completed. After each training iteration is completed, we run only the accuracy by passing in all the 10,000 test images and labels to give us an idea of how accurately our model is performing while it is training. It is a good practice to test once our model is trained completely and validate only while it is in the training phase after each epoch.

`with tf.Session() as sess: `

sess.run(init)

train_loss = []

test_loss = []

train_accuracy = []

test_accuracy = []

summary_writer = tf.summary.FileWriter('./Output', sess.graph)

for i in range(training_iters):

for batch in range(len(train_X)//batch_size):

batch_x = train_X[batch*batch_size:

min((batch+1)*batch_size,len(train_X))]

batch_y = train_y[batch*batch_size:

min((batch+1)*batch_size,len(train_y))]

# Run optimization op (backprop).

# Calculate batch loss and accuracy

opt = sess.run(optimizer, feed_dict={x: batch_x,

y:batch_y})

loss, acc = sess.run([cost, accuracy], feed_dict

{x:batch_x,

y: batch_y})

print("Iter " + str(i) + ", Loss= " + \

"{:.6f}".format(loss) + ", Training

Accuracy= " + \

"{:.5f}".format(acc))

print("Optimization Finished!")

# Calculate accuracy for all 10000 mnist test images

test_acc,valid_loss = sess.run([accuracy,cost],

feed_dict={x: test_X,y : test_y})

train_loss.append(loss)

test_loss.append(valid_loss)

train_accuracy.append(acc)

test_accuracy.append(test_acc)

print("Testing Accuracy:","{:.5f}".format(test_acc))

summary_writer.close()

From the output of the training and testing, the test accuracy looks impressive. Not that bad for a start.

This ends our first introduction to the world of Convolution Neural Networks.

You can find the link to the code in my Github. And your feedback as to improving the model, by reducing the overfitting and improving the testing accuracy is highly appreciated.