Classify images is a major topic in computer vision these days. It is useful for medical diagnoses, autonomous driving, Optical Character recognition, and a lot more topics. In this tutorial, we will walk through classifying The German Traffic Sign Recognition Benchmark, GTSRB. After this tutorial, we will have implemented the following.
- Prepared the dataset.
- Defined the model using keras.
- Trained the model and monitored the training progress.
- Used the model.
First, let’s talk a little bit about the data set.
The German Traffic Signs Competitions
Years ago, Institut für Neuroinformatik lunched a two competitions, one for signs recognition, i.e. classification, and the other was for signs detection, which we will talk about in another tutorial. The data set for these competitions were used later as benchmarks and for educational purposes, like what we are doing right now 🙂
For more information about the data set, you can refer to their website.
Enough talking, let’s get our hands dirty.
The data come in two parts, a training data set, and a testing one. We will use only the training set, as it is more than enough for this course.
First, we need to download the data set. You can do so by the following link, which is the images and the annotations for the training data set.
I have created a notebook using Jupyter, and placed the images somewhere in the disk, ‘../../GTSRB/Final_Training/Images/’ to be precise. The images are separated into 43 directory, each one represents a different class, so our labeling will come from the directories. Directory names always come in five digits, where the empty digits are filled with zeros, so we will be using zfill function to do that, so ’11’ will be ‘00011’ and ‘1’ will be ‘00001’ without bothering.
The model we will develop will read images as 32 * 32 3 channel pixels, so we will resize all images to that size.
The following code snippets is used for loading and preparing images as disscussed.
import glob import cv2 images_directory = '../../GTSRB/Final_Training/Images/' y =  x =  for class_num in range(43): images_path = glob.glob(images_directory + str(class_num).zfill(5) + '/*.ppm') for image_path in images_path: image = cv2.imread(image_path, cv2.IMREAD_COLOR) resized_image = cv2.resize(image, (32, 32)) x.append(resized_image) y.append(class_num)
Having the data prepared, I can assure you that we have done so far a huge part of images classification. Sometimes it take weeks or months just to prepare the data for the training. It is not easy to get a labeled data set, so thanks for ready data sets like ImageNet, COCO, and GTSRB.
Explore the data
Since I am using Jupyter notebook, I will be using Matplotlib to show images inline in the notebook. I would use OpenCV imshow function if I am developing using other IDEs like IDLE or spyder. As I will use Matplotlib, it is important to note that Matplotlib expects the images to be in RGB format, while OpenCV reads them in BGR, so, a little conversion will do the needful.
%matplotlib inline will configure matplotlib plotting to plot in the notebook, not in a window. I could not find a similar option for OpenCV, so I have to use to Matplotlib.
import matplotlib.pyplot as plt %matplotlib inline plt.axis('off') image_rgb = cv2.cvtColor(x, cv2.COLOR_BGR2RGB) # Why 4000, I do not know : ) plt.imshow(image_rgb)
Thinking about viewing more images? me too 🙂 I will use cv2.hconcat and cv2.vconcat to attach sign images together in a single image.
from random import randint demo_images =  for i in range(10): demo_images.append(x[randint(0, len(x) - 1)]) _, ax = plt.subplots(1, 1, figsize=(20,10)) images = cv2.vconcat([cv2.hconcat(demo_images[0:5]), cv2.hconcat(demo_images[5:10])]) ax.imshow(cv2.cvtColor(images,cv2.COLOR_BGR2RGB)) ax.axis("off") ax.set_title("Sample Images") plt.show()
Each time this code runs, a different set of images will be drawn, since I used randint to get the images index. There are more powerful methods to extract a random sample data from a set, but this will do for now.
The output for my run will be different than yours. Mine is:
Data shuffling and splitting
Following machine learning standards, we tend to use three data sets, for training, validation, and testing. For this tutorial, we will be using only two sets, training and validation. Do not try this at home 🙂 use three sets, it will affect the performance in your system and reduces over fitting.
Following, we will shuffle the data, and split it into training and validation sets. The splitting function we will be using can do shuffling, so calling the shuffle function is not mandatory in our case, but it might be useful in your case, so I thought I should introduce it to you. For these tasks, I am gonna use sklearn package. The model I am gonna use, accepts the data as numpy arrays, so I will convert the data from this step to np.array.
from sklearn.model_selection import train_test_split import numpy as np x = np.array(x, dtype = 'uint8') y = np.array(y, dtype = 'uint8') X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2) print (X_train.shape, X_test.shape, y_train.shape, y_test.shape)
The output will be:
(31367, 32, 32, 3) (7842, 32, 32, 3) (31367,) (7842,)
The first value is the count, the second two are the dimensions, the final one is the channels count. For the label, or y, there is only one label, thus single dimension value is used. This will not be the final shape of the output, as we will be using one hot encoding. Later 🙂
Defining the model
One easy to use library is keras, in few lines we will have a complete model. keras is a wrapper library that can use tensorflow, theano or CNTK. Since I am in love with Google, I will be using it on top of tensorflow.
One of the popular models, is LeNet, that was introduced by Yann LeCun. It gives great results in small classes and small input sizes, like classifying digits, or in our case, signs.
Since it is a popular model, a lot of ready implementations can be found online, that can be useful to alter and generate your final model, or use as is. As for me, I was inspired by LeNet, and did something very close, and also inspired by the following repo.
There is a lot of theory explanation behind each layer here, but I will try to make things shallow here as this tutorial cannot hold much of information. Will do something more detailed in following posts soon.
from keras.layers import Conv2D, Dense, Flatten, Activation, MaxPooling2D from keras.models import Sequential model = Sequential() model.add(Conv2D(20, (2,2), padding='valid', input_shape=(32, 32, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), padding='valid')) model.add(Conv2D(50, (2,2), padding='valid')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), padding='valid')) model.add(Flatten()) model.add(Dense(400)) model.add(Activation('relu')) model.add(Dense(150)) model.add(Activation('relu')) model.add(Dense(43)) model.add(Activation('softmax'))
Quick overview about the stuff above:
Sequential is the model type. Conv2D is a convolution layer. Activation is used to create the activation layer. Maxpooling is to select the maximum of a kernel, used for overfitting. Dense is the regular neural network layer. Flatten is used to convert the data into one dimensional array.
Training the model
The model is ready now for training. We need to define the loss and the optimizer. Adam optimizer is great. Since we are dealing with classes, I will use categorical_crossentropy loss, which requires the output to be one hot encoded.
One hot encoded is having the output length is the same as the classes count for each sample. Suppose we have three classes, 1,2,3. if the first item is labeled as class 1, then its encoding will be [1,0,0], having the desired value index is one, and the rest are zeros. To convert the output array to this form, we will use the keras function to_categorical.
This will be it for model fitting in this tutorial. More information about the options available in upcoming posts.
from keras.optimizers import Adam from keras.utils.np_utils import to_categorical y_train_hot = to_categorical(y_train, num_classes=43) y_test_hot = to_categorical(y_test, num_classes=43) model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.0007), metrics=["accuracy"]) history = model.fit(X_train, y_train_hot, epochs=3, batch_size=256, validation_data=(X_test, y_test_hot))
The output will be initially a progress bar, and the loss with the accuracy printed out for the training. Once an epoch is finished, the model will be evaluated against the validation set. If you have a GPU, the required time for each epoch will be less than mine, as I was training using CPU, and somehow old one 😀
Train on 31367 samples, validate on 7842 samples Epoch 1/3 31367/31367 [==============================] - 35s 1ms/step - loss: 6.8984 - acc: 0.3277 - val_loss: 0.8157 - val_acc: 0.7701 Epoch 2/3 31367/31367 [==============================] - 42s 1ms/step - loss: 0.4126 - acc: 0.8883 - val_loss: 0.3286 - val_acc: 0.9129 Epoch 3/3 31367/31367 [==============================] - 37s 1ms/step - loss: 0.1480 - acc: 0.9634 - val_loss: 0.2227 - val_acc: 0.9425
Reaching validation accuracy of 95% from the third epoch is good enough for me. But what does that mean?
Using the model
Following, I will be testing the model against some images from the validation set. Please note that this is not accurate, as we have to test on a test set that was not used before.
I keep saying that the testing the model against the validation set is not accurate, because the model might overfit the validation set, but someone might ask, the training set is used for training the model, while the validation set was not! Then why the results may be not accurate? Well to answer that, let me ask you, what exactly make you pick the model, or the hyper parameters? It is the validation accuracy! You keep bending the model and your work to give higher accuracy results for the validation set. With this, your model will be overfitting the validation set, but in an indirect way, so that is why we are advised to use a test set, and use it only once we are completely satisfied with our model so far.
Back to practice. We will use this model to predict a sign. First, I have some code to load sign names into a list, called y_labels.
Now, regarding the prediction.
plt.axis('off') img_to_test = X_test image_rgb = cv2.cvtColor(img_to_test, cv2.COLOR_BGR2RGB) plt.imshow(image_rgb) max_index = np.argmax(model.predict(np.array([img_to_test]))) print ('Index of the maximum probability:', max_index) print (y_labels[max_index])
This code will print:
What is next?
The next step is to detect the presence of a sign in an image. Suppose you are developing an autonomous vehicle, and you want it to comply with speed limit, so you need to identify the speed limit signs for example. To do so, you need to scan frames for potential signs. There are several methods available to do so. We will talk about them in an upcoming post. For now, try the code in this tutorial, and try to play around with more parameters or optimizers.
If you have any question, please feel free to ask in the comments. If you have a contribution to this article, I will be glad to hear from you.
Have fun 🙂