The speech AI team at NVIDIA has built a new automatic speech recognition model with its NeMo framework for one of the most popular Indian languages, Telugu.
Telugu is the most popular language in the southern Indian regions. Still, it is considered a low-resource language in speech AI that needs more datasets for building AI models in automatic speech recognition in the Telugu language.
The speech AI team at NVIDIA used the Nemo framework to build an automatic speech recognition model in the Telugu language and train state-of-the-art conversational AI models. NVIDIA NeMo framework builds, trains, and fine-tunes GPU-accelerated speech and natural language models with a simple Python interface.
The Telugu ASR model by NVIDIA won first place in the ASR challenge for the Telugu language held by the International Institute of Information Technology (IIIT), Hyderabad, in October 2022.
NVIDIA NeMo-enabled speech AI models outperformed all the other models created using popular frameworks such as ESPnet, Kaldi, and SpeechBrain with an error rate of 13% for the closed and 12% for the opened tracks approximately.
Nithin Koluguri, the senior research scientist of the conversational AI team at NVIDIA, mentioned that NVIDIA NeMo is the only framework that supports scaling training to multi-GPU systems and multi-node clusters. He also noted that all the models by NVIDIA Nemo frameworks are open-sourced and available for users to fine-tune and use.