Self-driving car through imitation learning and reinforcement learning

Guest article by Cegeka - Platinum Partner Techorama 2019

Recent advancements in AI have allowed us to solve previously unsolvable problems. These advancements also brought us the promise of self-driving cars. But what goes into creating a self-driving car? In this article we will discuss some of the ideas behind self-driving cars and AI in general. On top of that we will talk about the techniques we used to create our own self-driving car and share the results with you. Before you know it, you’ll be creating your own self-driving (mini-)car!

When talking about self-driving cars, one of the topics that often comes up is the level of autonomy. The level of autonomy is, as the name suggests, a level that’s assigned to a car based on how autonomously it can drive. The six levels were defined by the US Department of Transportation's National Highway Traffic Safety Administration (NHTSA) and they range from no automation at all (level 0) to fully-automated without any human intervention needed (level 5). Most of the newer autonomous car companies these days like Tesla or Comma.ai get to level 2 of automation currently. Level 2 means that the system can both take over the task of steering and acceleration/deacceleration. This still means that the driver needs to always pay attention to his driving and take over when needed.

There is however one company that does even better than level 2. Waymo (owned by Google) is able to achieve level 3 and arguably even level 4 of autonomy, which means that all “safety-critical” functions can be delegated to the system. The driver does no longer need to monitor the driving, but he still needs to be able to react within a given time-frame. Google claims that they stopped testing level 3 after they’re engineers literally fell asleep during the testing, so you might say they have reached level 4 by now. An interesting side note is that starting from level 3 and up, it’s sometimes the automaker that takes liability for accidents that happen.

At level 5 of automation, the driver is no longer even required and the vehicle can act completely by itself. There’s also another way of looking at automation levels, by measuring the number of miles driven per disengagement. In this regard, Tesla and Uber (amongst others) achieve around 100 miles/disengagement. Waymo does a lot better and claims to achieve more than 100 000 miles/disengagement, which is actually on par with another human driving the car.

So how do these companies achieve these incredible endeavors? We can find the answer to this in the advancements in AI.

The recent interests in AI have started peeking ever since the ImageNet competition back in 2012. In the ImageNet competition the challenge was to classify a huge dataset of more than 14 million images into more than 20 000 categories. The clear winner of that competition was the so called AlexNet created by Alex Krizhevsky. AlexNet got a top-5 error of 15,3% which was 10,8 % better than the runner-up of the competition. The reason why this is so interesting, is because AlexNet used a so called convolutional neural net while other competitors used more traditional (and more hand-crafted) classification techniques. This sparked the interest of the AI-community and businesses to look more into neural networks for solving AI-problems. Because at that time, neural networks still seemed to only be a “nice idea” but not actually useful in practice.

Now fast-forward several years and neural networks have become omnipresent. We generally refer to using neural nets in machine learning as “deep learning”. Since 2012, not only did the existing architectures, like AlexNet, for neural nets improve, it also gave rise to a lot of new deep learning architectures like LSTM’s, VAE’s, GAN’s, … and so on. Deep learning also had a big impact on the field of reinforcement learning, which is concerned with training an agent to act appropriately in an environment (e.g. a self-driving car). It has to be noted though, that deep learning (at least for now) is only effective when it’s accompanied with huge datasets and a lot computing power. These same algorithms wouldn’t have been feasible to use 30 or even 20 years ago.

So, if someone would ask the question why all these self-driving car companies are popping up now and not some while ago, then it’s safe to say we can pinpoint this to the progress made in the field of deep learning. Let’s take a look at how the big companies are actually using these techniques.

To sketch the AI self-driving car landscape, we’ll break the self-driving car problem down into some subproblems. Namely: localization, perception, planning and controls.

First there’s the problem of hardware controls. This is probably the easiest problem of the bunch. Although it definitely took effort and fine engineering to get to where we are today, hardware controls have become fine-grained enough and do generally not form a problem anymore when creating a self-driving car.

Localization is (obviously) the problem of the car determining its own position, down to a few centimeters. This problem is largely solved by using a lot of sensors, especially through the use of a LIDAR. A LIDAR is a sensor that uses a laser and measures the time it takes for the laser light to return to the sensor. By firing millions of rays of light per second, a LIDAR can determine down to a few centimeters how near objects are in a full 360° field of view and at a range of 50+ meters depending on the sensor. Furthermore, Google also uses region maps that are created in advance.

The third problem is perception, the car needs to recognize his surroundings. This was a hard problem, but Google was mostly able to solve it through the use of techniques like convolutional neural networks (as mentioned above). These neural nets allow us to detect people and objects around the car, as well as do semantic segmentation of the environment. Through this semantic segmentation they can differentiate what parts of the terrain are “drivable” and what parts to avoid.

The last problem is the planning problem, how should the car act given the information it gathered from his surroundings? Although solving the first three problems will get you pretty far (perhaps 1000 miles/disengagement) without a very sophisticated planning algorithm, you will need to solve planning in order to get to the next level of automation. Unfortunately, planning is also the hardest problem of them all and it’s also the differentiating factor between self-driving car competitors. In order to tackle this problem, different companies use different tactics, let’s go over some of them.

Tesla is probably the most well-known company when it comes to self-driving cars. Their AI-director, Andrej Karpathy, gave a presentation back in 2018 on how Tesla’s autopilot operates. Tesla uses only 8 cameras, ultrasonics, radar and IMU to do localization and perception. The raw sensor data captured by the car is fed through a convolutional neural network which in turn feeds the latent features in another neural network to do planning. In the presentation, mr. Karpathy put a lot of stress on the fact that more and more so, people are moving away from the traditional software stacks that require a lot of custom engineering to make things work. We are moving towards a form of meta-development, where we have an engineering pipeline that measures its own performance and then updates its architecture to improve performance. In the case of Tesla, this mostly means that they build a new toolset to capture and label data, they then feed that labeled data in their pipeline, which will then use a lot of computing power to build an optimal convolution neural network architecture measured on a validation training set. The work that needs to be done is thus mostly in creating good datasets, with a fair distribution of edge cases so that your neural net also learns how to act given these edge cases. Also doing correct labeling is extremely important, and although that seems like a trivial task, there are many complex road situations where even humans might not be sure how to act.

Google Waymo might be the most interesting company to have a look at, since they are able to present the best results on self-driving cars at the moment. Google Waymo uses a set of neural networks to build feature maps on the sensor data that they get as input. They then use these feature maps as input to a reinforcement learning algorithm that decides what action to take. An important part of Google Waymo is that the policy algorithm used in this reinforcement learning algorithm is able to determine its own certainty of its decision. This way, when the algorithm is too uncertain of its own decision, they can let a so called “expert-algorithm” take over. These expert algorithms are simply hand-crafted algorithms such that the car also handles the exceptional traffic situations well. Google Waymo calls these exceptional situations “the long tail” because getting a car to autonomously drive is relatively easy, but then there are just so many exceptions that the algorithm simply can’t account for yet.

Having learned from the big companies, we tried to come up with our own solution. Our goal was to create a small self-driving car that was just able to stay within a lane. There are two approaches we pursued.

The first approach was to train an AI agent through reinforcement learning. Reinforcement learning is a general learning framework in which an agent acts in an environment and is able to take actions. These actions can influence the environment and the agents own state. In the case of a self-driving car, the car is the agent and we have to define a so-called reward function which the agent can try to optimize. We used the popular DQN-algorithm which is a q-learning algorithm that uses a neural network to encode the state of the agent. The state of our agent was a vector containing the pixel values of the car and the reward function was how close the car could stay to the middle of the lane. Q-learning algorithms are algorithms that define a policy by estimating the cumulative reward an agent will get by taking a certain action in a certain state. For our self-driving car this means that the algorithm will learn how to steer given what the camera sees and the agent will think he’s doing a good job if he manages to stay close to the middle of the lane. In order to also account for temporal features in our state encoding, we not only encode the current frame the agent sees, but also some previous frames, this will give our agent a sense of speed.

We were lucky to find a github-project of a tiny car simulation. This way, we could train the car in simulation and then evaluate its performance in real life. Performance was already quite okay, our agent manages to stay within a lane fairly well, but the results were very dependent of the environment our agent was in. This is why we wanted to use a technique called domain randomization. Domain randomization is a very recent technique that showed very promising results in transferring simulation results to real life. The idea behind the technique is pretty easy to explain, instead of training our agent in just one virtual environment, we train it in a lot of virtual environments, but each time with different parameters. These parameters can range from changing colors in the environment to completely changing the environments physics engine. This will force our agent to learn the essence of the task (driving) even in environments it hasn’t encountered before yet. In the reinforcement learning setup, the agent learns how to act completely by itself, based on the reward function we defined. Another approach is to use imitation learning.

Imitation learning is often used in tandem with reinforcement learning. There’s a human who demonstrates the agent how to act given a certain situation. We used this idea in its simplest form where we recorded camera footage of the car camera while operating the car ourselves. By collecting this data, we essentially transform our reinforcement learning problem to a supervised learning problem, where we learn how to map a state (the camera input) to an action we took. This approach of course has many problems, like collecting imperfect data, or coming across unseen situations. But the approach works fairly well and if we used the same neural net as we used for our reinforcement learning algorithm, this actually provided with a good weight initialization and significantly reduced the training time needed.

Our first attempts failed miserably. But after experimenting with different algorithms and tweaking them, we did manage to pull something off. If you want to see what some of the problems we ran into and what our final results looked like, then be sure to drop by at our presentation!

Self-driving car through imitation learning and reinforcement learning by Cegeka

Self-driving car landscape

The state of AI

What about the big boys?

Let’s try it ourselves!

Where did we end up?

About

Practical

Social