Tesla, an American electric vehicle and a clean energy-based company, has released a new whitepaper regarding a new standard for the Dojo supercomputing platform. This standard specifies arithmetic formats and methods for arithmetic (floating point) operation in computer programming environments during deep learning neural network training.
Tesla handles an insane amount of video data from its fleet of over 1 million vehicles to train its neural nets. Over the last two years, Tesla has been teasing the development of a new in-house supercomputer called ‘Dojo’ that optimizes neural net video training.
Automakers found themselves unsatisfied with current hardware options to train their computer vision neural nets and believed they could do better internally. CEO Elon Musk has marked the paper as “more important than it may seem.” The dojo was recently unveiled by Tesla at Tesla’s AI Day in August, and it could potentially become the most powerful supercomputer in the world.
The whitepaper on Dojo technology primarily aims to define a standard that provides a method for computing floating-point numbers. This method yields similar results whether the processing is done in hardware, software, or a combination of both.
Developers initially got motivated from the original IEEE 754 standard (1985), that speciﬁed formats and methods for ﬂoating-point arithmetic in computer systems. Recently, Google Brain, an artiﬁcial intelligence research group at Google, developed the Brain Floating Point (BFloat16) format in their TPU (tensor processing unit) architecture for their machine learning training systems. The BFloat16 format is consumed by some major processors like Intel, ARM, AMD, TensorFlow, or Nvidia. It still differs from IEEE Float16 format in the number of bits allocated for mantissa and exponent bits as shown below:
Tesla extended precision support by introducing Configurable Float8 (CFloat8), an eight-bit floating-point format. This format reduces memory storage and bandwidth in storing weights, activations, and gradient values essential for training and increasing larger networks. While IEEE Float16 and Bﬂoat16 format have a fixed number of bits allocated to mantissa and exponent bits, eight bits can only accommodate a small number of mantissa and exponent bits. CFloat8 requires some conﬁgurability to ensure high accuracy and convergence of the training models.
Even last year, Musk teased that Tesla’s Dojo would have a capacity of over an exaflop, which means quintillion (1018 ) floating-point operations per second. Although the automaker already has a Dojo chip and tile, it is still building its full rack to create the supercomputer.