Tesla has unveiled its latest version of its Dojo supercomputer, and it’s apparently so powerful it’s tripped up Palo Alto’s power grid.
Dojo is Tesla’s custom supercomputer platform, built from the ground up for AI machine learning and more specifically for video training using video data from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new custom-built Dojo computer uses chips and an entire infrastructure designed by Tesla.
The custom-built supercomputer is expected to augment Tesla’s ability to train neural networks using video data, which is essential to his computer vision technology that powers his self-driving effort.
Last year on Tesla AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its efforts at the time. He only had his first chip and training tiles, and he was still working on building a full Dojo cabinet and a cluster or “Exapod”.
Today, Tesla unveiled the progress made with the Dojo program over the past year at its AI Day 2022 last night.
The company has confirmed that it has successfully moved from a chip and tile to a system tray and full cabinet.
Tesla claims it can replace 6 GPU enclosures with a single Dojo tile, which the company says costs less than one GPU enclosure. There are 6 of these tiles per board.
Tesla says a single tray is equivalent to “3-4 fully loaded supercomputer racks”.
The company embeds its host interface directly on the system tray to create a large complete host assembly:
Tesla can install two of these system trays with host assembly in a single Dojo cabinet.
This is what the Dojo cabinet looks like closed and open:
That’s pretty much where Tesla is now, as the automaker is still developing and testing the infrastructure needed to put together a few cabinets to create the first “Exapod Dojo.”
Bill Chang, Tesla’s lead systems engineer for Dojo, said during
“We knew we had to re-examine every aspect of the data center infrastructure to support our unprecedented cooling and power density.”
They had to develop their own high-power cooling and power supply system to power the Dojo cabinets.
Chang said Tesla tripped its local power grid substation during the infrastructure test earlier this year:
“Earlier this year, we started load testing our power and cooling infrastructure and were able to push it to over 2MW before we tripped our substation and got a call from the city.”
This is what the open and closed Tesla Dojo Exapod looks like:
Tesla has released the main specifications for an Exapod Dojo: 1.1 EFLOPs, 1.3 TB of SRAM and 13 TB of high-bandwidth DRAM.
The company used the event to try to recruit more talent, but it also shared that it was on schedule to have its first full cluster, or Exapod, in Q1 2023.
He currently plans to have 7 Exapodes Dojo in Palo Alto.
Why does Tesla need a Dojo supercomputer?
It’s a good question. Why is a car manufacturer developing the most powerful supercomputer in the world? Well, Tesla would tell you that it’s not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.
But more specifically, Tesla needs Dojo to automatically tag its fleet’s train videos and train its neural networks to build its self-driving system.
Tesla realized that its approach to developing a self-driving system using neural network training on millions of videos from its customer fleet required a lot of computing power. and he decided to develop his own supercomputer to provide that power.
That’s the short-term goal, but Tesla will have a lot to do with the supercomputer in the future as it has big ambitions to develop other artificial intelligence programs.
FTC: We use revenue-generating automatic affiliate links. After.
Subscribe to Electrek on YouTube for exclusive videos and subscribe to the podcast.