This notebook contains optional study material. You are not required to work through it in order to meet the learning objectives or complete the assessments associated with this module.

Brief overview: inputs to a neural network are often described as ‘features’. Determining appropriate features for presenting to a network may require significant pre-processing of the original dataset. Pre-processing may often transform input data with large numbers of dimensions to richer features defined over fewer, but more meaningful, feature dimensions.

5 Feature engineering (optional)

In the previous notebook, you saw how a new training set created by cropping and translating the original MNIST images caused recall problems for the original MLP. In this notebook, you will explore one possible strategy for improving the network’s performance: feature engineering.

When presenting a raw image to a neural network, for example as a list of N × M values, one for each pixel in the image, each value represents a distinct feature that the network may use to help it generate a particular classification.

Feature engineering is the name given to the process of deriving new features from the original raw dataset. These new features can then be used either to complement the original dataset, or to be presented to the network for training, and recall, instead of the original data.

The aim of using derived features, rather than the original pixel features, is to try to improve the performance of the network.

5.1 Creating a baseline MLP using an original image training set

Let’s just recap on what happened when we trained an MLP on the original dataset.

First, load in the images and labels datasets:

from nn_tools.sensor_data import load_MNIST_images_and_labels
from sklearn.neural_network import MLPClassifier

# Load images
images_list, labels = load_MNIST_images_and_labels()

The quick_progress_tracked_training() function makes it even easier for us to train a dataset from the images data.

In the training, we hold 100 images back that the network will not see.

Note that we can actually get quite a reasonable-looking performance over the training data from a relatively simple, lightly trained network:

from nn_tools.network_views import quick_progress_tracked_training

hidden_layer_sizes = (40)
max_iterations = 150

held_back = 100

MLP = quick_progress_tracked_training(images_list[:-held_back], labels[:-held_back],
                                 hidden_layer_sizes=hidden_layer_sizes,
                                 # top up an existing pretrained MLP: MLP=MLP,
                                 max_iterations=max_iterations,
                                 loss=True, # show loss function
                                 structure=True # show network params
                                )

Record your own observations here about the performance of the network under training.

How does the network perform on the withheld images it wasn’t trained on and hasn’t previously seen?

from nn_tools.network_views import test_and_report_image_data
from nn_tools.sensor_data import get_images_features

testing_data = get_images_features(images_list[held_back:], normalise=True)
test_and_report_image_data(MLP, testing_data, labels[held_back:])

Record your observations about the network’s performance when tested using previously unseen images.

Let’s now create a jiggled dataset and see how well the network fares when tested against jiggled images:

Every time the test_and_report_random_images() function is run with jiggled=True, newly jiggled images are created and tested.

from nn_tools.network_views import test_and_report_random_images
from nn_tools.sensor_data import get_random_image

test_and_report_random_images(MLP, 
                              get_random_image, images_list[:-held_back], labels=labels[:-held_back],
                              num_samples=100, jiggled=True, cropzoom=False)

Record your observations about how well the network copes with the jiggled images. How does this compare to the previous tests you ran?

5.2 Improving the network performance on translated images

So how might we improve the network’s ability to classify jiggled (i.e. translated) images?

One way might be to train a network based on the cropped images. Then if we cropped each translated test image, the translation variance wouldn’t be an issue.

But one problem with this approach is that different images crop to different sizes, and the network requires the presentation of images with the same dimensions each time.

Run the following cell to display the sizes of 10 randomly selected images that have been cropped:

from nn_tools.sensor_data import trim_image                

for i in range(10):
    (test_image, test_label) = get_random_image(images_list, labels)
    cropped_test_image = trim_image( test_image, background=0, show=False, image=True)
    _size = cropped_test_image.size
    print("Cropped image size:", _size, f'= {_size[0] * _size[1]} pixels')

Record your own observations here regarding how the images are different sizes. What might make it difficult to use these raw images to train the network?

5.2.1 Identifying new features

One constraint we have is that we require a set of input features that has a fixed size and semantics (i.e. feature number 7 from one test pattern means the same thing as feature number 7 from another).

In the original MLP, the features comprised the 28 × 28 = 784 individual pixel values. Using cropped images gives varying numbers of features, so unless we scale the cropped image back to a standard image size, we can’t naively use cropped images as inputs.

An alternative approach is to encode the original images somehow with a fixed-size signature that encodes the image in a translation-invariant way.

5.2.2 Activity – What distinguishes one digit from another?

Let’s ponder a random image for a moment or two. What distinguishes this digit from another digit?

from nn_tools.sensor_data import zoom_img

(test_image, test_label) = get_random_image(images_list, labels, show=True)

zoom_img(test_image)

Write your thoughts here about what makes one handwritten digit different from another. How might we represent those differences in a concise and consistent way?

Example observations

Click the arrow in the sidebar or run this cell to reveal some example observations.

Some digits include loops (0, 6, 8, 9), some have flat bits at the top and bottom (2, 5, 7), some have vertical lines (1, 7), some have lines in the middle (3, 4, 5). Can we use that information somehow?

Reflecting on this a little more, it may occur to us that we can count the number of lines that appear in each row or each column of the image array to try to capture some sense of the straight line and loopiness.

One way of counting lines is to count the number of transitions from black to white or white to black along each row or each row and column in the dataframe.

5.2.3 Generating an image signature

I have created a simple function, generate_signature(), that can be used to generate a ‘signature’ for various sorts of input along the lines (pun intended!) of how lines appear across, and up and down, the image frame.

The signature is made up from sets of four values, one set per row of the image, image dataframe, or image array:

  • the number of black-to-white and white-to-black transitions in the row (that is, the number of edges in the row)

  • the value of the initial pixel in the row

  • a count of the longest run of white pixels in the row (that is, the width of the broadest white band in the row)

  • a count of the longest run of black pixels in the row (that is, the width of the broadest black band in the row).

Before the signature is generated, the image is converted to a binarised black-and-white image. The black-and-white threshold value is set by passing a parameter which defaults to a value of 127 (threshold=127).

Experiment with different threshold values to explore what effect it has on the binarisation of the image.

Run the following code cell to create a simple interactive application that provides an index slider to select the image from the dataframe and a threshold slider to set the threshold value.

Click the Run Interact button to see the effect of creating the black-and-white version of the selected image using the specified threshold value.

from ipywidgets import interact_manual
from nn_tools.sensor_data import make_image_black_and_white

@interact_manual(threshold=(0, 255),
                 index=(0, len(images_list)-1))
def bw_preview(index=0, threshold=200):
    _original_img = images_list[index]
    
    # Generate a black and white image
    _demo_img = make_image_black_and_white(_original_img,
                                           threshold=threshold)
    # %sim_bw_image_data --index -1 --threshold 100 --crop 3.3,17,17
    zoom_img( _original_img)
    zoom_img( _demo_img )

When you think you have a feel for a sensible threshold, use it in the following code cell to generate a sample signature:

from nn_tools.sensor_data import generate_signature

# Set your threshold value here
threshold = 127

(test_image, test_label) = get_random_image(images_list, labels, show=True)

# The fill value identifies the background colour
signature = generate_signature(test_image, fill=0, show=True, threshold=threshold)

We can preview the signature to see how it looks:

signature

Record your own observations here about how each line of the signature describes each row of the image.

What elements of the image can you still ‘see’ in its signature?

This signature gives us 28 × 4 = 112 distinct features. If we pass the linear=True argument to generate_signature() then it returns the values from the dataframe as a list of features that can be used to train the MLP:

generate_signature(test_image, fill=0, linear=True)

Whilst it is not completely translation invariant, certain features do tend to be preserved by this signature.

As an example, consider the digit 1: the top and bottom and left-most and right-most columns are typically black, the number of horizontal black–white plus white–black changes is typically either 0 or 2, and so on.

5.2.4 Creating signatures in bulk

To create the signature bases training data for the MLP, we can iterate through the list of images loaded in from the original images_array and create the signature for each image, returning the signature of each image as a list that can be used to train the network:

from tqdm.notebook import tqdm

signatures_list = []
for image in tqdm(images_list[:500]):
    signatures_list.append(generate_signature(image, fill=0, linear=True)) 

Running that code may take some time – the signature-generating code is far from optimal! – so as declared above it would only generate signatures for some of the sample images.

To save time creating the signatures lists, I ran the above code and saved a copy of the signatures using the Python pickle() function:

import pickle

with open('data/signatures.pickle', 'wb') as outfile:
    pickle.dump(signatures_list, outfile)

If you choose to do so, you may copy the signature-generating code into a code cell and run it to generate the signatures yourself. Alternatively, run the following code cell to load in a version I prepared earlier:

import pickle

with open('data/signatures.pickle', 'rb') as infile:
    signatures_list = pickle.load(infile)
    
with open('data/jiggled_signatures.pickle', 'rb') as infile:
    jiggled_signatures_list = pickle.load(infile)

We can also generate signatures for a set of jiggled images:

from nn_tools.sensor_data import jiggle

jiggled_signatures_list = []
for image in tqdm(images_list[:500]):
    jiggled_signatures_list.append(generate_signature(jiggle(image), fill=0, linear=True))

Once again, I created a version of these jiggled signatures that you can load in directly:

import pickle

with open('data/signatures.pickle', 'rb') as infile:
    signatures_list = pickle.load(infile)

with open('data/jiggled_signatures.pickle', 'rb') as infile:
    jiggled_signatures_list = pickle.load(infile)

5.2.5 Training an MLP using the line-crossing signature features

Let’s create a simple MLP network:

from sklearn.neural_network import MLPClassifier

hidden_layer_sizes=(20)
max_iterations=1500

MLP4 = MLPClassifier(hidden_layer_sizes=hidden_layer_sizes, max_iter=max_iterations)

And train it on the jiggled signatures data:

from sklearn.preprocessing import normalize
from nn_tools.network_views import network_structure


training_signature_data = normalize(jiggled_signatures_list, axis=1)
training_labels = labels[:500]


MLP4.fit(training_signature_data, labels[:500])

# Quick status report
network_structure(MLP4, loss=True)

Record your own observations here about the network’s performance under training. If it is performing badly, consider changing the number of hidden layers, and/or the maximum number of training iterations and retrain the network.

Let’s see how well it behaves on the training data:

from nn_tools.network_views import test_and_report

test_and_report(MLP4, training_signature_data, training_labels)

Record your own observations here about how well the network performs on the training data.

How well does it perform if we present it with the original unjiggled image signatures?

test_and_report(MLP4, normalize(signatures_list, axis=1), training_labels)

Record your own observations here about how well the network performs on the original, previously unseen, unjiggled data.

How well does this network perform compared to the original network trained on the original image pixel data array?

My network was far from perfect, but it performed much better than the network trained on the original images and tested against jiggled images.

If we pass the use_signature=True parameter into the test_and_report_random_images() function, then we can randomly select images, find their signature, and then test those against the network:

test_and_report_random_images(MLP4,
                              get_random_image, images_list, labels,
                              num_samples=100, jiggled=True, use_signature=True)

Record your own observations here about how well the network performs on the randomly selected and jiggled data.

When I tried it, the network was getting the correct classification a bit more than half the time. Not ideal, but far better than when testing the naive classifier trained on the original image pixel data against jiggled images.

Experimenting with hidden layer sizes and the number of training iterations, and perhaps even a larger training set, might also improve matters.

5.3 Training and testing the network against smaller images

How well does the signature method work if we use it on smaller images, such as the original images reduced in size from 28 × 28 pixels to 14 × 14 pixels?

from nn_tools.network_views import resized_images_pipeline

resized_images = resized_images_pipeline(images_list, size=(14, 14))

The quick_progress_tracked_training() function has support for signature-based training if you pass the use_signature=True parameter. We can also choose to jiggle the images that are created for training:

training_samples = 500

hidden_layer_sizes = (20, 10)
max_iterations = 2000

MLP_small = quick_progress_tracked_training(resized_images[:training_samples], labels[:training_samples],
                                 hidden_layer_sizes=hidden_layer_sizes,
                                 # top up an existing pretrained MLP: MLP=MLP,
                                 max_iterations=max_iterations,
                                 loss=True, # show loss function
                                 structure=True, # show network params
                                 jiggled=True,
                                 use_signature=True,
                                )

Record your own observations here about the network’s performance under training. If it is performing badly, consider changing the number of hidden layers, and/or the maximum number of training iterations and retrain the network.

We can test the network on images sampled from the original list of unjiggled resized images, which are likely to differ from the jiggled images the network was trained on:

test_and_report_random_images(MLP_small,
                              get_random_image, resized_images, labels,
                              num_samples=100, jiggled=False, use_signature=True)

Record your own observations about how well the network performs on the original unjiggled images.

When I tried it, the performance wasn’t brilliant, but changing the network structure and number of training iterations may improve matters.

We can also test against randomly jiggled images:

test_and_report_random_images(MLP_small,
                              get_random_image, resized_images, labels,
                              num_samples=100, jiggled=True, use_signature=True)

Record your own observations about how well the network performs this time.

To save time, I have already generated some small image signature data:

import pickle
               
with open('data/small_signatures.pickle', 'rb') as infile:
    small_signatures_list = pickle.load(infile)

with open('data/small_jiggled_signatures.pickle', 'rb') as infile:
    small_jiggled_signatures_list = pickle.load(infile)

We can train and test an MLP using just the signature data rather than having to generate it from images each time if we pass the image_data=False parameter to the quick_progress_tracked_training() function:

training_samples = 500
hidden_layer_sizes = 20

max_iterations = 1500

MLP_small2 = quick_progress_tracked_training(small_jiggled_signatures_list, labels[:training_samples],
                                 hidden_layer_sizes=hidden_layer_sizes,
                                 # top up an existing pretrained MLP: MLP=MLP,
                                 max_iterations=max_iterations,
                                 loss=True, # show loss function
                                 structure=True, # show network params
                                 image_data=False # State that we are using signature data not image data
                                )

Record your own observations here about the network’s performance under training. If it is performing badly, consider changing the number of hidden layers, and/or the maximum number of training iterations and retrain the network.

We can test the network on the small jiggled image signatures dataset to see how it performs:

test_and_report(MLP_small2, small_jiggled_signatures_list, labels[:training_samples])

Record your own observations here about the network’s performance on the training data.

How about if we test it on the previously unseen original, unjiggled, small images signatures?

test_and_report(MLP_small2, small_signatures_list, labels[:training_samples])

Record your own observations here about the network’s performance on the previously unseen, unjiggled image data.

How does it compare with the behaviour of the network trained on the full-size image data?

5.4 Parameter sweeping

At this point, you may be thinking this is all so much voodoo, and how on earth are you expected to be able to find appropriate settings for the number and arrangement of the hidden layers and the number of training iterations, let alone the selection of appropriate training images?

And you wouldn’t be wrong!

One approach to finding appropriate configuration setups is a brute-force approach know as parameter sweeping. Under this approach, you set up an automated experiment to create and train lots and lots of networks for increasing numbers of hidden nodes, and increasing numbers of training cycles. Then you pick the best one.

Performing parameter sweeps, as well as other techniques for trying to find effective network architectures and training schedules, is beyond the scope of this module.

If you would like to learn more about training and developing neural networks, consider taking the OU module TM358 Machine learning and artificial intelligence.

5.5 Summary

In this notebook you have been introduced to the idea of feature engineering, in which the neural network design engineer tries to find ways of encoding the input data and identifying meaningful features in it that the network can then use to distinguish between the presented patterns.

As well as improving performance by reducing the number of features, and so reducing the size and complexity of the network, some feature selections may also improve performance by ‘masking’ different sorts of irrelevant variation in the presented pattern. For example, you saw how a signature-based encoding of the MNIST image data might provide translation-invariant features in certain respects.

In the next notebook, you will look at another sort of neural network architecture, the convolutional neural network (CNN), before going on to explore how well CNNs cope with the MNIST data.