OpenAI released Triton 1.0, an open-source Python-like programming language for GPUs. It serves researchers with no CUDA (Compute Unified Device Architecture) experience to write highly efficient GPU codes. Triton enables peak hardware performance with relatively little effort and is 2x more efficient than equivalent Torch implementations.
In recent times, Deep Neural Networks (DNN) models have outperformed across many domains ranging from natural language processing to computer vision. However, deep learning neural models undergo high computation with parallel work, thereby requiring multi and many-core processors. Such High-Performance Computing (HPC) needs have increased the demands for GPUs to compute the processing of large data at a rapid speed.
Modern research in deep learning is implemented using a combination of the native framework of operators, which may require the creation of many temporary tensors. This approach lacks flexibility, is too verbose, and degrades the performance of the neural network. However, OpenAI’s Triton mitigated this issue by providing an intermediate language and compiler.
Triton’s success lies in the modular system architecture that is centered around Triton-IR, allowing their compiler to automatically perform a wide variety of important program optimizations. They had to revisit the traditional “Single Program, Multiple Data” (SPMD) thread execution model for GPU, and proposed a block algorithm, useful while performing sparse operations. This Block-based algorithm aggressively optimizes programs for data locality and parallelism.
OpenAI’s Triton aims to provide an open-source environment to write code faster with higher productivity and flexibility than CUDA and other existing DSLs (Domain Specific Language) respectively. Currently, Triton is compatible only with Linux and supports NVIDIA GPUs (Compute Capability 7.0+) hardware. The next release may support AMD GPUs, CPUs, and the foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations.