Introduction

There are three main forces that drive my passion for software development.

  1. A desire to learn how things work and how they fit together.
  2. Passion to get puzzles solved regardless of how complex they are.
  3. Sharing what I have learnt with those who are interested in learning new things.

This blog post will cover one of the coolest projects that I have worked on in my career so far, and it will neatly tie up all three of my passions.

In 2017, I took a sabbatical from my full-time job to learn about self-driving cars from the course that Sebastian Thrun himself just launched at Udacity. The course curriculum has looked very intense and was full of buzz words of all of the different technologies I haven’t had a chance to work with, including Machine Learning, Deep Neural Networks, Artificial Intelligence, Computer Vision, Sensors Fusion and many more. Plus, it was taught by the Godfather of Self-Driving cars! A pure genius whom I was fortunate to meet in person.

Multiple exciting projects were covering different topics, but automating a self-driving car simulator was the one that stood out to me the most. It was based on a scientific paper written by Nvidia titled “End to End Learning for Self-Driving Cars”. The idea is that a computer can learn how a human drives their car by just watching them, and then, later on, they will be able to apply the acquired skills by driving the same car on any other road. Cool, right!? Obviously, using a real car would be very expensive, so Udacity has built a car simulator using a popular video game engine called Unity. But the rest of the logic is just like the paper describes it. Let’s take it apart.

A few words on Machine Learning

According to Wikipedia, the official definition of Machine Learning…

… is the study of computer algorithms that improve automatically through experience.

Wikipedia

What does that actually mean? I’ll describe it in a good level of details in a different post (subscribe or follow me on social media if you’d like to see that post). Still, to simplify, Machine Learning is an algorithm that is really good at recognising different patterns by observing the data itself instead of being explicitly coded to search for specific patterns.

For example, say we want to understand what animal is displayed on this photo:

Before Machine Learning, we could have designed a very clever Computer Vision algorithm that looks through the image trying to count the number of limbs, the number of eyes, the presence and the size of the tail, the size of the ears relative to the size of the head, the presence and length of whiskers, etc. We could achieve very accurate results with a said algorithm, but writing would take many hours of writing and debugging code. With a Machine Learning algorithm, all we need to do is show a lot of images of different cats to the algorithm and tell it explicitly that “this is a cat”. Not too different from how humans learn. The Machine Learning algorithm will take care of the rest, and a programmer would not even know how exactly the algorithm understood that a given image looks like a cat. In fact, the algorithm doesn’t even understand that it looks at images. As far as it is concerned, it’s looking at many numbers and performs mathematical operations on them. Mind-blowing!

Using Machine Learning to figure out the steering angle of a car

Now that we understand how Machine Learning works in general let’s talk about the idea behind Nvidia’s paper. To simplify, what they have done is they installed three different cameras onto their car and connected a device that can measure the steering wheel’s current angle to “watch” how a human would drive the car given different conditions of the road.

One camera is installed right at the centre of the car itself. The second camera is installed at the furthest left edge of the car. And the last camera is installed at the furthest right edge of the car. All three are looking at the road right in front of the car itself.

It allowed them to collect the data to train the machine learning algorithm to recognise what steering angle to use given a single video frame from the camera. It’s very similar to what we would do with a picture of the cat. Instead of showing a photo of a cat, we show a video frame taken from the central camera and say, “given the road looks like this, the angle of the steering wheel should be X number of degrees”.

The central camera was used very effectively to teach the Machine Learning algorithm how to stay in the lane’s centre. Engineers used the left and right cameras to simulate the car from going off track. Effectively the training data told the algorithm: “If the road looks like this, then add 15 degrees to the current steering angle to return yourself to the centre of the road”.

Once the engineers collected enough data and trained the Machine Learning algorithm, they could use the central camera and the algorithm to simulate a human driver. The algorithm observed the road through the camera and set the angle of the steering angle based on what it has seen.

My job was to replicate the paper results using the simulator provided by Udacity.

Data collection

The simulator provided by Udacity could run in two different modes. The first one was fully controlled by the human driver (i.e. myself) and recorded all car details into a folder for further processing. The data collection included the recording from the three virtual cameras and a spreadsheet that included the video frame name, the observed steering wheel angle, and the car speed. To ensure that I have enough data available to train the algorithm, I first needed to drive the car manually around the track and collect as much information as possible. Simple enough, but I have faced several problems that I had to resolve.

Problem 1: Rough changes in the steering angle

At first, I have used the keyboard, but it turns out that the steering angle collected from the keyboard was too rough. For example, pressing the left arrow key would immediately set the steering angle to 90 degrees left and then releasing the button would reset the angle back to zero. The keyboard was fine to drive around the track, but it confused the hell out of my algorithm as it couldn’t understand why two frames that look almost the same had such a different steering angle. Please take a look at the video recording and keep an eye on the Angle value and how drastically it changes.

I have attempted to re-record my driving multiple times, but in the end, the result always comes out rough and confusing to the algorithm. I managed to resolve this problem by recording with a video game steering wheel similar to the one below. This allowed me to collect much more gentle changes to the steering angle.

Problem 2: Driving straight bias

Visualising the collected steering wheel angles, I have realised that a disproportionate number of collected values the driving angle was either zero or very close to that. The problem with this is that the Machine Learning algorithm learns based on the provided data. If the vast majority of values, the steering angle is deficient, then the algorithm will tend to produce low values given a random image. To minimise this bias, I have removed over half of the images and associated values where the angle was close to zero.

Problem 3: Steering left bias

After I have removed the close to zero values, I have realised that the next major portion of the steering angles was steering left. That is because the lap itself was a circle with a few turns and therefore, there were more left turns than there were right turns. I have worked around this issue in two ways.

First, I have driven the same track in the opposite direction the same number of times. I.e. two laps turning left, then two laps turning right. But then I have also doubled the amount of data by flipping the images horizontally, just like one would do with photoshop and multiplying the steering angle by negative one, e.g.:

Original Photo. Steering angle -30°
Flipped Photo. Steering angle 30°

Data Processing

Now that I have collected many training images, I was about to start training the algorithm. But, as I have mentioned above, the Machine Learning algorithm doesn’t know what it looks at; it doesn’t even know that it is looking at a photo, not to mention that it was looking at a photo of a road. I needed to make sure that the algorithm focuses on relevant parts of the image and not on those that should not affect the steering angle at all. For example, the video frames contain my simulator’s bonnet, the sky, trees, hills, the lake, and many other elements that should not affect the steering angle. The easiest way to make sure that the model focuses on the road itself and not anything else is by removing all irrelevant details from the photos. I have managed to achieve some of it by cropping the top and the bottom of the image, and for some of it, I have had to use more creative algorithms. Here is what I have tried.

Before
After

Converting the images to Greyscale

I figured that colour might not be important to the algorithm, and it will be easier to process the images by removing the colour from each photo.

Before
Centre

Using Canny Edge Detector

Canny Edge Detector is a collection of techniques that allows processing the images by removing all irrelevant data and leaving just the edges. Australian Scientist John F Canny discovered this technique in 1986! It’s amazing how much computers have changed in recent years, yet we still use some of the fundamental science discovered multiple decades ago.

Before
After

Data processing visualisation application

I have had to experiment with a lot of different values. Each algorithm comes with a bunch of different parameters. I needed to make sure that whenever I change a value, the application’s output would still make sense. To solve this, I wrote a GUI application in python that allows me to experiment with different values. Here is a screen recording of this application.

Machine Learning Algorithm itself

The Machine Learning Algorithm itself goes beyond this post’s topic, and I don’t want to go into too many details. However, I would like to highlight how short and simple the end code looks. Once I have created the data processing algorithm, I only needed 23 lines of code to train the model (excluding the empty lines and comments).

 
import numpy as np
from keras.models import Sequential
from keras.layers import BatchNormalization, Convolution1D, Dropout, Flatten, Dense, Convolution2D
from preprocess import read_data

#Loading the data
filename = "driving_log.csv"

#Y_train is the angle of the camera
X_train, y_train = read_data(filename, pre_process=True, flip=True, dropSmallValuesWithRate=50)

#My model
model = Sequential()

def train_model(X_train, y_train):
    model.add(Convolution2D(24, 5, 5, subsample=(2, 2), activation='relu',
                            input_shape=(X_train[0].shape[0], X_train[0].shape[1], X_train[0].shape[2])))
    model.add(Convolution2D(36, 5, 5, subsample=(2, 2), activation='relu'))
    model.add(Convolution2D(48, 5, 5, subsample=(2, 2), activation='relu'))
    model.add(Convolution2D(64, 3, 3, activation='relu'))

    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(100))
    model.add(Dense(50))
    model.add(Dense(10))
    model.add(Dense(1))

    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
    model.summary()
    model.load_weights("model_with_color.h5")

    # Training
    model.fit(np.array(X_train), y_train, nb_epoch=13, validation_split=0.2, shuffle=True)

    model.save("model_with_color.h5")

train_model(X_train, y_train)

For those who are curious, I’m using the same neural network as Nvidia engineers have used in their paper using a Python Machine Learning library called Keras.

Also, notice that this algorithm doesn’t say anything about “Cars” or “Steering Angles” or anything like that. 99% of the algorithm can be re-used to train a completely different model and solve a completely different challenge.

The end result

Here is a demo of the car driving itself based on the trained model to put this all together. What happens under the hood is that my algorithm takes a video feed from the central camera, applies the same data-processing as it did during the training and then asks the machine learning algorithm what angle to use given the processed frame. The provided angle is applied directly to the steering wheel up to 24 times per second. Don’t you love watching computers doing the hard work!?

I hope this blog post helped you understand one of the many techniques utilised by self-driving cars. I have had an enormous amount of fun building and fine-tuning it. Reach out if you have any questions!

%d bloggers like this: