Q: How Do Self-Driving Cars See?
A: It’s a sunny day, and you’re biking along one of Mountain View’s tree-lined esplanades. You head into a left turn, and before you change lanes, you crane your head around for a quick look back. That’s when you see it. The robot. Chugging along behind you, in that left lane you’re aiming to call your own. Your pressing question—Does it see me?—is answered when the vehicle slows down, giving you plenty of space. And so now you wonder, how did it do that? How, exactly, do self-driving cars see?
Perhaps unwittingly, you’ve hit on a crackler of a question. Making a robot that perceives its surroundings—not just spotting that lumpy mass but understanding it’s a child someone has put actual time and effort into—is the main challenge of this young industry. Get the thing to understand what’s going on around it as well as humans do, and the process of deciding how to apply the throttle, brake, and steering becomes something like easy.
Dozens of companies are trying to build self-driving cars and self-driving car technology, and they all approach the engineering challenges differently. But just about everybody relies on three tools to mimic the human’s ability to see. Take a look for yourself. (Be careful—you’re on a bike, remember?)
We’ll start with radar, which rides behind the car’s sheet metal. It’s a technology that has been going into production cars for 20 years now, and it underpins familiar tech like adaptive cruise control and automatic emergency braking. Reliable and impervious to foul weather, it can see hundreds of yards and can pick out the speed of all the objects it perceives. Too bad it would lose a sightseeing contest to Mr. Magoo. The data it returns, to quote one robotics expert, are “gobbledegook.” It’s nowhere near precise enough to tell the computer that you’re a cyclist, but it should be able to detect the fact that you’re moving, along with your speed and direction, which is helpful when trying to decide how to avoid slicing your bike into a unicycle.
Cameras
Now, gaze upon the roof. Up here, and maybe dotting the sides and bumpers of the car too, you’ll find the second leg of this sense-ational trio.
The cameras—sometimes a dozen to a car and often used in stereo setups—are what let robocars see lane lines and road signs. They only see what the sun or your headlights illuminate, though, and they have the same trouble in bad weather that you do. But they’ve got terrific resolution, seeing in enough detail to recognize your arm sticking out to signal that left turn. That’s so vital that Elon Musk thinks cameras alone can enable a full robot takeover. Most engineers don’t want to depend on just cameras, but they’re still working hard on the machine-learning techniques that will let a computer reliably parse a sea of pixels. Seeing your arm is one thing. Distinguishing it from everything else is the tricky bit.
Lidar
If you spot something spinning, that’ll be the lidar. This gal builds a map of the world around the car by shooting out millions of light pulses every second and measuring how long they take to come back. It doesn’t match the resolution of a camera, but it should bounce enough of those infrared lasers off you to get a general sense of your shape. It works in just about every lighting condition and delivers data in the computer’s native tongue: numbers. Some systems can even detect the velocity of the things it sees, which makes deciding what matters far easier. The main problems with lidar are that it’s expensive, its reliability is unproven, and it’s unclear if anyone has found the right balance between range and resolution. The 50-plus companies developing lidar are working to solve all of these problems. (Oh, and they don’t always spin.)
Some outfits also use ultrasonic sensors for close-range work (those are what let your car beep you into madness when you’re backing into a tight space) and microphones to listen for sirens, but that’s just icing on the cake.
Once the sensors pull in their data, the car’s computer puts it all together and starts the hard part: identifying what’s what. Is that a toddler or a garbage can? A leaf or a pigeon? A teen riding a scooter or a Wacky Waving Inflatable Arm-Flailing Tubeman? Better hardware makes answering such questions easier, but the real work here relies on machine learning—the art of teaching a robot that this cluster of dots is an old man using a walker, and that swath of pixels is a three-legged dog. But once it knows how to see, the question of how to drive gets easy: Don’t hit either one of them.