2 Collecting digit image and class data from the simulator

If we wanted to collect image data from the background and then train a network using those images, we would need to generate the training label somehow. We could do this manually, looking at each image and then by observation recording the digit value, associating it with the image location coordinates. But could we also encode the digit value explicitly somehow?

If you look carefully at the MNIST_Digits background in the simulator, you will see that alongside each digit is a solid coloured area. This area is a greyscale value that represents the value of the digit represented by the image. That is, it represents a training label for the digit.

The greyscale encoding is quite a crude encoding method that is perhaps subject to noise. Another approach might be to use a simple QR code to encode the digit value.

As usual, load in the simulator in the normal way:

from nbev3devsim.load_nbev3devwidget import roboSim, eds

%load_ext nbev3devsim

Clear the datalog just to ensure we have a clean datalog to work with:

%sim_data --clear

The solid greyscale areas are arranged so that when the left light sensor is over the image, the right sensor is over the training label area.

%%sim_magic_preloaded -b MNIST_Digits -O -R -AH -x 400 -y 50

# Sample the light sensor reading
sensor_value = colorLeft.reflected_light_intensity

# This is essentially a command invocation
# not just a print statement!
print("image_data both")

We can retrieve the last pair of images from the roboSim.image_data() dataframe using the get_sensor_image_pair() function:

from nn_tools.sensor_data import zoom_img
from nn_tools.sensor_data import get_sensor_image_pair

# The sample pair we want from the logged image data
pair_index = -1

left_img, right_img = get_sensor_image_pair(roboSim.image_data(),
                                            pair_index)

zoom_img(left_img), zoom_img(right_img)

The image labels are encoded as follows:

greyscale_value = 25 * digit_value

One way of decoding the label is as follows:

  • divide each of the greyscale pixel values collected from the right-hand sensor array by 25

  • take the median of these values and round to the nearest integer; in a noise-free environment, using the median should give a reasonable estimate of the dominant pixel value in the frame

  • ensure we have an integer by casting the result to an integer.

The pandas package has some operators that can help us with that if we put all the data into a pandas Series (essentially, a single-column dataframe):

import pandas as pd

def get_training_label_from_sensor(img):
    """Return a training class label from a sensor image."""
    # Get the pixels data as a pandas series
    # (similar to a single column dataframe)
    image_pixels = pd.Series(list(img.getdata()))

    # Divide each value in the first column (name: 0) by 25
    image_pixels = image_pixels / 25

    # Find the median value
    pixels_median = image_pixels.median()

    # Find the nearest integer and return it
    return int( pixels_median.round(0))

# Try it out
get_training_label_from_sensor(right_img)

The following function will grab the right and left images from the datalog, decode the label from the right-hand image, and return the handwritten digit from the left light sensor along with the training label:

def get_training_data(raw_df, pair_index):
    """Get training image and label from raw dataframe."""
    
    # Get the left and right images
    # at specified pair index
    left_img, right_img = get_sensor_image_pair(raw_df,
                                            pair_index)
    
    # Find the training label value as the median
    # value of the right habd image.
    # Really, we should probably try to check that
    # we do have a proper training image, for example
    # by encoding a recognisable pattern 
    # such as a QR code
    training_label = get_training_label_from_sensor(right_img)
    return training_label, left_img
    

# Try it out
label, img = get_training_data(roboSim.image_data(),
                               pair_index)
print(f'Label: {label}')
zoom_img(img)

We’re actually taking quite a lot on trust in extracting the data from the dataframe in this way. Ideally, we would have unique identifiers that reliably associate the left and right images as having been sampled from the same location. As it is, we assume the left and right image datasets appear in that order, one after the other, so we can count back up the dataframe to collect different pairs of data.

Load in our previously trained MLP classifier:

# Load model
from joblib import load

MLP = load('mlp_mnist14x14.joblib')

We can now test that image against the classifier:

from nn_tools.network_views import image_class_predictor

image_class_predictor(MLP, img)

2.1.1 Activity – Testing the ability to recognise images slight off-centre in the image array

Write a simple program to collect sample data at a particular location and then display the digit image and the decoded label value.

Modify the x- or y-coordinates used to locate the robot by a few pixel values away from the sampling point origins and test the ability of the network to recognise digits that are slightly off-centre in the image array.

How well does the network perform?

Hint: when you have run your program to collect the data in the simulator, run the get_training_data() function with the roboSim.image_data() to generate the test image and retrieve its decoded training label.

Hint: use the image_class_predictor() function with the test image to see if the classifier can recognise the image.

Hint: if you seem to have more data in the dataframe than you thought you had collected, did you remember to clear the datalog before collecting your data?

# Your code here

Record your observations here.

2.1.2 Optional activity – Collecting image sample data from the MNIST_Digits background

In this activity, you will need to collect a complete set of sample data from the simulator to test the ability of the network to correctly identify the handwritten digit images.

Recall that the sampling positions are arranged along rows 100 pixels apart, starting at x=100 and ending at x=2000; along columns 100 pixels apart, starting at y=50 and ending at y=1050.

Write a program to automate the collection of data at each of these locations.

How would you then retrieve the handwritten digit image and its decoded training label?

Hint: import the time package and use the time.sleep() function to provide a short delay between each sample collection. You may also find it convenient to import the trange() function to provide a progress bar indicator when iterating through the list of collection locations: from tqdm.notebook import trange.

Your program design notes here.

# Your program code

Describe here how you would retrieve the handwritten digit image and its decoded training label.

Example solution

Click on the arrow in the sidebar or run this cell to reveal an example solution.

To collect the data, I use two range() commands, one inside the other, to iterate through the x- and y-coordinate values. The outer loop generates the x-values and the inner loop generates the y-values:

# Make use of the progress bar indicated range
from tqdm.notebook import trange
import time

# Clear the datalog so we know it's empty
%sim_data --clear


# Generate a list of integers with desired range and gap
min_value = 50
max_value = 1050
step = 100

for _x in trange(100, 501, 100):
    for _y in range(min_value, max_value+1, step):

        %sim_magic -R -x $_x -y $_y
        # Give the data time to synchronise
        time.sleep(1)

We can now grab and view the data we have collected:

training_df = roboSim.image_data()
training_df

The get_training_data() function provides a convenient way of retrieving the handwritten digit image and the decoded training label.

label, img = get_training_data(training_df, pair_index)
zoom_img(img), label

2.2 Summary

In this notebook, you have automated the collection of handwritten digit and encoded label image data from the simulator, and seen how this can be used to generate training data made up of scanned handwritten digit and image label pairs. In principle, we could use the image and test label data collected in this way as a training dataset for an MLP or convolutional neural network (CNN).

The next notebook in the series is optional and demonstrates the performance of a CNN on the MNIST dataset. The required content continues with a look at how we can start to collect image data using the simulated robot whilst it is on the move.