This notebook contains optional study material. You are not required to work through it in order to meet the learning objectives or complete the assessments associated with this module.

This notebook demonstrates the use of another neural network architecture: the convolutional neural network (CNN). Networks of this type often demonstrate far more robust behaviour than multi-layer perceptrons when it comes to classifying data such as images.

7 Using a convolutional neural network to recognise images (optional)

In the previous notebooks, you saw how a neural network could partition points arranged in a two-dimensional space into distinct groups. In this notebook, you will explore how a neural network can be trained to recognise a variety of images, where each image is represented as a set of pixel values arranged in a two-dimensional array.

The dataset you will use is the MNIST database that you met in an earlier notebook.

7.1 Convolutional neural networks

Whilst an MLP can cope with distinguishing digits as a result of training on the MNIST handwritten digits datasets, it starts to struggle with more complex image-recognition tasks.

In recent years, a different neural network model that is ideally suited to image-recognition problems has come to the fore. Known as a convolutional neural network (CNN), we will not explore its architecture in any formal way, other than to note a couple of differences to the MLP architecture.

In the first case, whereas the multi-layer perceptron is a fully connected network, the CNN is only sparsely connected.

In the second case, an MLP would typically present an image to a network in the form of a single column vector equal to the size of the image in pixels, with each input pixel connected to every node in the first hidden layer. This means that every node in the first hidden layer sees every pixel in the image, no matter how far apart they are.

In a CNN, a smaller grid is used to filter localised areas of the image, preserving information about the local relationship between certain pixels. As we go deeper into the CNN’s hidden layer, this filtering structure repeats, with each neuron only seeing outputs from a selection of neighbouring nodes in the previous layer.

Something else you may notice in what follows is the word ‘tensor’ appearing again. A tensor is a multidimensional array of numbers (often, a large multidimensional array) and it hints at how the data is passed into, and flows through, a network. That is about as much as I’m going to tell you for the purposes of this module. If you want to know more, T194 Engineering: mathematics, modelling, applications starts you on a gentle path by introducing the idea of matrices, ideas which are further developed in MST210 Mathematical methods, models and modelling (It’s not for no reason that a lot of people who work with neural networks have a strong background in mathematics.) The third-level module TM358 Machine learning and artificial intelligence looks in more detail about the machine-learning relevance.

7.2 Recognising handwritten digits provided by you

The handwritten digit recogniser application allows you to write a digit in a small interactive canvas and then see if a pre-trained network can recognise it. If the direct link does not work, then from the notebook home page New menu select the nb_handwritten_digit option.

Within the application:

  • click in the canvas area

  • with your mouse button held down, write a single digit in the range 0 to 9

  • click on the Predict button.

Does the application recognise the digit correctly? Press the Clear button to clear the canvas and have another go.

The application may take a minute or two to completely load because it needs to load a large model file into the browser. If it doesn’t recognise your digit at first, wait a few moments and then try again. If for some reason it doesn’t work at all, or the application doesn’t display, then try the original version on the web.

7.3 Training a handwritten digit recogniser

Several researchers have created interactive, browser-based demonstrations that are capable of training a neural network to recognise the MNIST digits.

The underlying code – and text – for these is licensed under an open license, which means we can redistribute them, and even edit the original versions, with due acknowledgement.

  • ConvNetJS MNIST demo: this is a fascinating example of training a network in the web browser using a JavaScript-based neural network package, and visualising the effects as the network trains. If the direct link does not work, from the notebook home page New menu, select the convnet_mnist option. The training should start automatically a moment or two after the page loads. If you scroll down in the page, you will see in real time how the performance of the network improves as it is trained. It was originally developed by Andrej Karpathy whilst he was a PhD student at Stanford University. If you would like to visit the original, it is still available here.

  • Walked-through training example: this example is taken from the tensorflow.js ‘tfjs-vis’ demos (code; original demo). If the direct link does not work, from the notebook home page New menu, select the tfjs_mnist option. It walks you through an example of training a neural network to recognise the MNIST digits, showing how the error value reduces over time.

Spend a few minutes looking through each of the demos. There is too much detail for us to review in this module, but it may give you a taste of what working with machine-learning models at a technical level involves.

7.4 Inside the mind of a convolutional neural network MNIST handwritten digit recogniser

This final demonstration is an example of a three-dimensional interactive visualisation that allows you to explore how a convolutional neural network filters certain features of an input image: TensorSpace Playground - interaction guide. If the direct link does not work, from the notebook home page New menu, select the nb_tensorspace_playground option.

The following activities are both available from the TensorSpace Playground home page.

  • TensorSpace Playground - trained MNIST demo: in this first example, you can explore a trained network, providing your own handwritten digit as an input to the network and then peering inside the network to see how it encodes, then decodes, various features in making its decision.

  • TensorSpace Playground - training demo: in this example (which could take a few seconds to load the necessary training data before it will run) you can load in a test example, then reset and train the network. You know the network is being trained if you can see the Training Metrics values updating. If you look at the output layer of the network then you can see the network start to home in on the correct output result as the network is trained.

To install the packages for running these demos locally in your own notebook environment, you can find the installers at the following locations:

7.5 Summary

In this notebook, you have seen how we can train a different sort of neural network to the MLP, known as a convolutional neural network (CNN), to recognise handwritten images from a set of training examples. You have also had a peak inside the mind of such a network, exploring how each of the nodes in each of the layers recognises (and essentially looks for and ‘sees’) different abstract patterns of features in the input, before the network as a whole uses these recognisable (though not necessarily visually meaningful, to us at least) patterns to come to a decision about what the input image represents.

If you had to write a set of explicit, handwritten rules to perform such a discrimination task, I think you would find it very challenging indeed. But whilst we can see that each node of the network is recognising something, it may not be obvious to us what it is in any human-meaningful way.

This completes the practical activities for this week.