Sign up for daily news updates from CleanTechnica on email. Or follow us on Google News!
It’s been a decade since I first wrote about Tesla’s approach to autonomous driving, comparing it to Google’s (now Waymo’s). At the time, my technical assessment based on my experience with both AI and robotics was that Tesla’s approach was superior.
It still is, but it might need to pivot. Both Tesla and other FSD firms mostly have to, but Tesla has a smaller pivot to make and a better place to start from should it be able to pivot with the current inclinations and distractions of its CEO.
Let’s step back in time. A couple of decades ago I trawled through the dissertations of PhDs and masters students in global robotics programs. There was a clear split between the world map camp and the subsumption camp, and both were deriding each other.
Traditional robotics and AI approaches, the world map camp, rely on complex central planning systems that process vast amounts of data to create detailed models of the environment before making decisions. These systems often struggle in dynamic and unpredictable settings due to their reliance on precise inputs and heavy computational demands. They require very fine grained 3D maps of the world in order to do route finding and obstacle avoidance.
Subsumption robotics, pioneered by Rodney Brooks in the 1980s, introduced a revolutionary approach to robot control by emphasizing decentralized, layered behavior rather than complex central planning. Brooks, a former MIT professor and co-founder of iRobot and Rethink Robotics, developed this architecture to enable robots to respond adaptively to their environments through independent behavioral layers. Lower-level behaviors, such as obstacle avoidance and resilience operate autonomously or even through basic physical robustness, while higher layers build on them to achieve more complex tasks. This approach, which challenged traditional AI’s reliance on symbolic reasoning, led to the creation of autonomous robots like Genghis and influenced modern applications in industrial automation, consumer robotics, and AI.
When I was reading all of the theses and dissertations, it was clear that a subsumption base with a much lower resolution world map perspective to provide goal setting was the obvious strategy and that the dichotomy between the two was artificial, a construct of academic camps more than a useful distinction. Exploratory efforts in robotics use cases like mine sweeping and lawn maintenance made it clear that the toolkit for virtual simulation iteration and energy density of batteries was inadequate at the time. Both of those barriers are now removed, but my collaborators and I had moved on. I worked professionally with AI in a global tech firm, but at one remove. I did do a global survey of machine learning and cleantech, as well as the various key intellectual aspects of the technology, and published a report on the subject in 2020.
As I pointed out a decade ago, Tesla was relying on layered subsumption approaches with a light world map from basic mapping software, while Google was relying on world map approaches. When Tesla introduced Autopilot in October of 2014, it did it in a car that was incredibly robust both in terms of acceleration, cornering, and braking, but also in terms of collision survival. Meanwhile, Google produced a four-wheeled soap bubble with a nipple on top, the lidar sensor. Tesla was making the right choice.
Tesla’s Autopilot could drive on any roads, albeit sometimes badly, while Google’s approach only worked on roads that had been mapped with lidar to centimeter scale. Initially the Google car only worked in Mountain View, California. Meanwhile, shortly after Tesla introduced its Autopilot software, a group of enthusiasts completed an unofficial Cannonball Run across the United States using the semi-autonomous driving system. The Tesla Model S traveled from Los Angeles to New York in a record time for an electric vehicle, with Autopilot handling much of the highway driving. Around 95% of the driving was done by the car and often at fairly high speeds.
A second differentiation was that Tesla had chosen to not use lidar, a laser sensing technology, and only had cameras, radar, and sonar, with the latter relegated to very short distances for parking use cases. Meanwhile, the nipple on the Google car was an $80,000 or so rotating lidar sensor, something most other autonomous vehicle firms chose to include in their sensor set. I assessed the set of sensors eight years ago and concluded that Tesla had made the right choice, and that a simpler sensor set with visual and radar sensors didn’t require lidar, as the two sensors provided all of the necessary information to be vastly superior to human drivers.
Among other things, solid-state cameras and radar sensors were a lot cheaper than the rotating mirrors and lasers of lidar at the time, and still cheaper than the less capable solid-state lidar sensors that were being introduced. Of course, the world has moved on incredibly rapidly and now iPhones come with tiny solid-state lidar units that enable apps to map individual rooms. This doesn’t necessarily mean that lidar on cars is the right choice. Simplicity is good, and if two sensors provide sufficient information to be vastly better than human senses, three is overkill.
Tesla’s approach was using reinforcement learning, a machine learning approach where an agent learns to make decisions by interacting with an environment and receiving rewards or penalties based on its actions. Over time, the agent optimizes its behavior to maximize cumulative rewards, making the technique well-suited for applications such as robotics, game playing, and autonomous systems. The Tesla model had a neural net with a specific hierarchical structure that was used and received feedback from the human drivers as they steered out of bad spots in specific circumstances. This rich data set of users saying “Whoa, let’s not do that” was fed into new training sessions to iterate the model. The current large language model (LLM) AI hysteria is about the training sessions of LLMs like ChatGPT, but they don’t have much new user input compared to full self driving.
Sensor integration has always been a challenge with machine learning. The more sensors, the more challenging it is to feed the data into a machine learning system and have coherent results emerge. Limiting the sensors to cameras, radar, and sonar had merit in that regard as well, and I thought Tesla had made the right choice.
Then in 2021, Tesla chose to remove radar from its sensor set. At the time I reserved judgement, as the pro and con arguments had merit. Humans drive without radar, after all, and cameras and machine learning had evolved to the point where mere human eyes and wetware were unlikely to be able to compete. Adjusting the behavior of the car to slow down in limited visibility conditions made a lot of sense, in part because other drivers were sharing the road and slowing down if they couldn’t see as well.
Over the years, I’ve been monitoring the Full Self Driving incremental progress. The removal of the separation of city versus highway driving was a good step, and the things Teslas can do now by themselves is remarkable. It’s still not full autonomy, and it’s long after the time when fully autonomous cars were promised.
Now Tesla has leaned even further into promising fully autonomous cars with its Cybercab, a two-passenger, no steering wheel vehicle conceptualized to provide the large majority of Americans who don’t have the option to bike, walk, or take transit for short hops with a way to get around the sprawling cities that demand cars. It’s going to increase congestion in the country’s cities, as I pointed out nine years ago. Yet it’s still a long way from fully autonomous driving.
Tesla has all the potential conditions for success for making this approach work. A big one is that it has the most sensor data and feedback from drivers of any company in the world, as I pointed out seven years ago. So why isn’t it delivering Full Self Driving?
In 2018, reinforcement learning was the big thing. It had been demonstrated in the lab. It had been demonstrated in the real world. It was going to be transformative. It was the basis of Tesla’s and Google’s strategy, as well as most other autonomous driving approaches. Then it started running into a couple of difficulties.
The first was sensor integration. Eight to ten years ago, lidar was considered to be essential for autonomous driving and digital twins of existing infrastructure. However, mainstream machine learning wasn’t paying attention to lidar point clouds, but to camera image recognition. There was a divergence in sensor assessment, in part because there is an incredible amount of imagery with identifying metadata on the internet, and virtually no public lidar data to speak of. It was just a lot easier and cheaper to train models on images rather than lidar as a result, so everyone did that. As a result, all of the startups and OEMs depending on lidar had nothing to work with but their own data sets, while everyone working only with images had industrial strength technologies. Many of them are foundering as a result.
The second is that reinforcement learning has turned out to require rather absurd amounts of reinforcement and has been much slower to deliver any dependable results. Despite Tesla’s extraordinary numbers of voluntary drivers sending signals that correct the neural net’s choices, it still has challenges with stuff that humans don’t. Will it get there? Perhaps. It’s turned out to be like the thought experiment of moving halfway toward a destination with every step, in that every step gets shorter and you never arrive. My opinion remains that for the many, many use cases where it works, Tesla’s solution is still better than the statistical average human driver by quite a lot, but that doesn’t mean it’s arriving at autonomy.
Waymo and other options aren’t doing much better. They require absurdly detailed world maps and still end up doing remarkably inane things like honking at each other in parking lots and making Waymo traffic jams in dead end streets.
The machine learning community has moved on to large language models like ChatGPT and visual question asking, where an image is provided to the LLM and questions asked about it. This leans heavily into absurdly accomplished image recognition machine learning neural nets that have been trained with massive numbers of images, and LLMs that have been trained with extraordinary amounts of data. Paste a picture of a streetscape into an LLM and ask it to count the people or whether there’s a bus stop, and it will. Paste a picture of a set of pipes into it and ask it to identify rust and other failure conditions and it will. Paste a picture of a field into it and ask if there are cows or other ungulates in it, and it will tell you all about them.
It won’t do that quickly enough for a car to avoid a cow in the road without a specialized LLM deployed in the car, something that’s possible but still might not be fast enough.
The primary use cases for machine learning have diverged from both the sensors and the speed requirements of autonomous driving, while reinforcement learning has proved to be much slower to achieve results and require vastly more feedback than initially thought. That’s not a great combination for autonomous driving.
To be clear, I was as equally wrong in my assumptions about how machine learning and reinforcement learning would play out. My assessments from ten and eight and seven and five years ago turned out to be imperfect, and in line with most other people’s in the space. Luckily for me, I guess, my couple of attempts to engage in startups with the technology didn’t click. I say luckily as there are innumerable startups founded five to ten years ago that promised that reinforcement learning was going to do the trick after a brief period of reinforcement learning with cheap resources from India and the like which still have big groups of people in low-labor cost regions doing exactly the same thing that they were doing five to ten years ago, taking the same too long of a length of time to do it and costing the same too high costs to do it. Tesla isn’t the only firm that has this particular challenge.
What does this mean for Tesla’s autonomous driving future? Well, it’s based on reinforcement learning, not the absurd advances in image recognition and visual question asking, so it’s not only behind the curve, it’s on a different curve entirely. Tesla has to shoulder all of the R&D itself. There’s probably a pivot that would be possible with a different CEO, but they’ve got Musk.
They don’t have radar, which is a pro and a con. Just as machine learning hasn’t been dealing with lidar, stranding everyone else, it hasn’t been dealing much with radar. Sensor integration remains a problem and humans do manage to drive without constantly crashing in the dark through a combination of savannah instincts and dumb luck.
Visual question asking approaches could probably be optimized for driving real-time requirements to the subset that are pertinent and the questions that are pertinent, if the organization was still able to pivot. Maybe it is, maybe it isn’t. Musk isn’t paying attention.
Chip in a few dollars a month to help support independent cleantech coverage that helps to accelerate the cleantech revolution!
Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.
Sign up for our daily newsletter for 15 new cleantech stories a day. Or sign up for our weekly one if daily is too frequent.
CleanTechnica uses affiliate links. See our policy here.
CleanTechnica’s Comment Policy