Deep Learning With MNIST Dataset
This is adapted from the tutorial of the MIT AI Research Scientist, Lex Fridman. It is the application of deep learning to make predictions on the MNIST dataset. So, we are to classify images of hand-written digits using Convolutional Neural Network Classifier. We will take one image out of the 70,000 images, at a resolution of 28 by 28 pixels, as input and predict the most likely digit in that image.
So, we make the necessary imports — Tensorflow, Keras, Numpy and those for visualization.
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense# Commonly used modules
import numpy as np
import os
import sys# Images, plots, display, and visualization
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import cv2
import IPython
from six.moves import urllib
We then load the dataset, the images are 28 x 28 Numpy array, with pixel value ranging between 0 and 255; the labels are an array of integers, ranging from 0 to 9.
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
# reshape images to specify that it's a single channel
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)
The images are scaled from 0 to 1 before they are fed into the neural network. This is very important as the training and testing datasets are to be processed in the same way. To achieve this, the values are divided by 255.
def preprocess_images(imgs): # should work for both a single image and multiple images
sample_img = imgs if len(imgs.shape) == 2 else imgs[0]
assert sample_img.shape in [(28, 28, 1), (28, 28)], sample_img.shape # make sure images are 28x28 and single-channel (grayscale)
return imgs / 255.0
train_images = preprocess_images(train_images)
test_images = preprocess_images(test_images)
We are to display the first 10 images from the training dataset with the class labels beneath each of them to ensure that the data is in the correct format before we begin building and training the model.
plt.figure(figsize=(10,2))
for i in range(10):
plt.subplot(1,10,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i].reshape(28, 28), cmap=plt.cm.binary)
plt.xlabel(train_labels[i])
plt.show()
We then build the model. This requires configuring the layers of the model and then compiling them. Or it is simply stacking the layers.
model = keras.Sequential()
# 32 convolution filters used each of size 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
# 64 convolution filters used each of size 3x3
model.add(Conv2D(64, (3, 3), activation='relu'))
# choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
# randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
# flatten since too many dimensions, we only want a classification output
model.add(Flatten())
# fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
# one more dropout
model.add(Dropout(0.5))
# output a softmax to squash the matrix into output probabilities
model.add(Dense(10, activation='softmax'))
These settings — Loss Function, Optimizer, and Metrics — are added during the model’s compile step before the model is trained and they have very specific functions. Loss Function — to measure the accuracy of the model during training(this has to be minimized); Optimizer — to measure how the model is updated based on the data it sees and its loss function; Metrics — the metric used here is ‘Accuracy’ which is a fraction of correctly classified images.
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
We train the model. This requires three steps: we feed the data into the model; the model learns to associate images with labels; the model makes predictions about a test set, and we verify that the predictions match the labels from the test_labels array.
history = model.fit(train_images, train_labels, epochs=5)
As the model trains, both the loss and accuracy are displayed. The model reaches an accuracy of 98.6% on the training data.
Epoch 1/5
60000/60000 [==============================] - 7s 121us/step - loss: 0.1953 - acc: 0.9410
Epoch 2/5
60000/60000 [==============================] - 6s 100us/step - loss: 0.0842 - acc: 0.9753
Epoch 3/5
60000/60000 [==============================] - 6s 96us/step - loss: 0.0642 - acc: 0.9810
Epoch 4/5
60000/60000 [==============================] - 6s 94us/step - loss: 0.0526 - acc: 0.9835
Epoch 5/5
60000/60000 [==============================] - 6s 94us/step - loss: 0.0443 - acc: 0.9861
After that, we compare how the model generalizes over the test dataset.
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
Sometimes, the accuracy of the training set is a little higher than that of the test set. This is an example of over-fitting. But here, our accuracy is better at 99.13%. This because we effectively used dropout layers for regularization.
10000/10000 [==============================] - 1s 50us/step
Test accuracy: 0.9913
With the trained model, we can use it to make predictions on images. For this, we use high-resolution images generated by a mixture of CPPN, GAN, VAE. See the blog post of Hardmaru for the source data and a description of how the morphed animations are generated.
The above shows the prediction of the network by choosing the neuron with the highest output. For the full source code, you can check it out in my GitHub. And your feedback as to improving the model, by reducing the over-fitting and improving the testing accuracy is highly appreciated.