On November 30, 2021, Amazon introduced three new amazon EC2 instances powered by AWS-designed Chips. The newly launched chips help developers/customers improve the performance and energy efficiency of their ML models. In the AWS re:Invent event, Amazon launched three instances called Graviton3, Trn1 and Nitro SSDs. However, Trn1 gained more attention among ML enthusiasts as Trn1 instances will be capable of delivering bandwidth of about 800 gigabytes per second.Â
This feature from AWS makes it more suitable for large-scale and multi-node distributed training use cases like natural language processing, object detection, recommendation engines, image recognition, etc. The company claims that these processors are also optimized for high-performance computing, media encoding, batch processing, scientific modeling, ad serving, and distributed analytics.
In the traditional cloud ML process, 90% of the cost of ML operations is spent on performing inference about the ML models. To avoid this, in 2019, Amazon came up with a processor called Inferentia. It delivers the best performance and throughput needed for machine learning inference at a lower price than GPU-based instances.
Read more: Q-learning algorithm to generate shots for walking Robots in Soccer Simulations
Similar to the inference process, ML training will also be costly since it requires high-performance computing features with parallel processing methods. To simplify the training process of ML models, last year, Amazon introduced a Trainium chip that is specifically designed for machine learning models.
Yesterday, Amazon released the Trn1 chip, considered a sequel of previously launched Inferentia and Trainium chips. The critical feature of the Trn1 chip is that it boosts ML model training by internally performing highly parallel math operations with the highest computing power. The newly released chip provides a 25 percent higher performance compared to previously launched chips.Â
In Trn1, the company doubled the networking bandwidth to 800 gigabytes per second from 400 gigabytes per second, which is the bandwidth of previous chips. The increase in bandwidth brings down the latency and provides the fastest ML training methodology available in the overall cloud services. The Trn1 instances can be combined with thousands of instances to train even the most complicated machine learning models with trillions of parameters.
To have a preview of Trn1 instances, visit the link.Â