NVIDIA announced in its speech AI summit that it would collaborate with Mozilla Common Voice for its new speech AI. Both NVIDIA and Mozilla Common Voice aim to speed up the growth of automatic speech recognition models.
Mozilla Common Voice is an open-source platform with multi-language datasets of voices that users can use to train any speech-enabled application. It has datasets for 34 languages, including Hindi, English, Bengali, Marathi, Tamil, and more.
NVIDIA realized that standard voice assistants such as Amazon Alexa and Google have very few of the world’s spoken languages. To solve this problem, NVIDIA and Mozilla Common Voice decided to improve linguistic inclusion in speech AI.
In the same speech AI summit, Caroline de Brito Gottlieb, product manager at NVIDIA, said that Demographic diversity is the key to capturing language diversity. She also stated that many factors, like underserved dialects, pidgins, and accents, can impact speech variation. With the NVIDIA-Mozilla Common Voice partnership, NVIDIA can create a dataset ecosystem to build speech datasets and models for any language.
NVIDIA has been developing speech AI for many use cases, like artificial speech translation, automatic speech recognition, and text-to-speech. NVIDIA’s Riva is a GPU-accelerated speech AI SDK used to build and deploy fully customizable real-time AI pipelines.