To sketch the AI self-driving car landscape, we’ll break the self-driving car problem down into some subproblems. Namely: localization, perception, planning and controls.
First there’s the problem of hardware controls. This is probably the easiest problem of the bunch. Although it definitely took effort and fine engineering to get to where we are today, hardware controls have become fine-grained enough and do generally not form a problem anymore when creating a self-driving car.
Localization is (obviously) the problem of the car determining its own position, down to a few centimeters. This problem is largely solved by using a lot of sensors, especially through the use of a LIDAR. A LIDAR is a sensor that uses a laser and measures the time it takes for the laser light to return to the sensor. By firing millions of rays of light per second, a LIDAR can determine down to a few centimeters how near objects are in a full 360° field of view and at a range of 50+ meters depending on the sensor. Furthermore, Google also uses region maps that are created in advance.
The third problem is perception, the car needs to recognize his surroundings. This was a hard problem, but Google was mostly able to solve it through the use of techniques like convolutional neural networks (as mentioned above). These neural nets allow us to detect people and objects around the car, as well as do semantic segmentation of the environment. Through this semantic segmentation they can differentiate what parts of the terrain are “drivable” and what parts to avoid.
The last problem is the planning problem, how should the car act given the information it gathered from his surroundings? Although solving the first three problems will get you pretty far (perhaps 1000 miles/disengagement) without a very sophisticated planning algorithm, you will need to solve planning in order to get to the next level of automation. Unfortunately, planning is also the hardest problem of them all and it’s also the differentiating factor between self-driving car competitors. In order to tackle this problem, different companies use different tactics, let’s go over some of them.
Tesla is probably the most well-known company when it comes to self-driving cars. Their AI-director, Andrej Karpathy, gave a presentation back in 2018 on how Tesla’s autopilot operates. Tesla uses only 8 cameras, ultrasonics, radar and IMU to do localization and perception. The raw sensor data captured by the car is fed through a convolutional neural network which in turn feeds the latent features in another neural network to do planning. In the presentation, mr. Karpathy put a lot of stress on the fact that more and more so, people are moving away from the traditional software stacks that require a lot of custom engineering to make things work. We are moving towards a form of meta-development, where we have an engineering pipeline that measures its own performance and then updates its architecture to improve performance. In the case of Tesla, this mostly means that they build a new toolset to capture and label data, they then feed that labeled data in their pipeline, which will then use a lot of computing power to build an optimal convolution neural network architecture measured on a validation training set. The work that needs to be done is thus mostly in creating good datasets, with a fair distribution of edge cases so that your neural net also learns how to act given these edge cases. Also doing correct labeling is extremely important, and although that seems like a trivial task, there are many complex road situations where even humans might not be sure how to act.
Google Waymo might be the most interesting company to have a look at, since they are able to present the best results on self-driving cars at the moment. Google Waymo uses a set of neural networks to build feature maps on the sensor data that they get as input. They then use these feature maps as input to a reinforcement learning algorithm that decides what action to take. An important part of Google Waymo is that the policy algorithm used in this reinforcement learning algorithm is able to determine its own certainty of its decision. This way, when the algorithm is too uncertain of its own decision, they can let a so called “expert-algorithm” take over. These expert algorithms are simply hand-crafted algorithms such that the car also handles the exceptional traffic situations well. Google Waymo calls these exceptional situations “the long tail” because getting a car to autonomously drive is relatively easy, but then there are just so many exceptions that the algorithm simply can’t account for yet.