NVIDIA unveils Artificial Intelligence Technology for Speech Synthesis

NVIDIA’s new RAD-TTS speech synthesis model can accurately mimic human-like voice modulations that can be used to train voice bots, assistants, and many other purposes.

By Dipayan Mitra

September 2, 2021

NVIDIA recently unveiled its research on speech synthesis that would make voice assistants like Google Assistant and Siri sound way more human-like. The current technology used for generating speech has improved by many folds over the past years but still lacks critical human speech elements like rhythm and intonation.

NVIDIA researchers are developing a new speech synthesis technology that would make voice assistants sound richer and will be able to produce voice modulations and dynamics closer to those made by humans.

The complete research will be released in session at the Interspeech 2021 conference that will commence on 3rd September. The researchers from NVIDIA’s text-to-speech department have developed a unique model named RAD-TTS that can achieve the aforementioned qualities in a voice bot.

The developers conducted extensive research on conversational artificial intelligence, natural language processing, audio enhancement, automated speech recognition, and various other subjects to build the RAD-TTS model. The model can efficiently be run on NVIDIA GPUs, and the company has open-sourced some of the research works through their NVIDIA NeMo toolkit.

NVIDIA mentioned in a blog, “With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator’s voice.”

The blog also mentioned that users could use the technology to tweak the generated speech to improve the narration and flow of videos. Earlier speech synthesis models were not able to produce accurate voice modulations, so they could not add the emotional aspect of speech to narrations.

NVIDIA’s RAD-TTS comes with a unique feature that can change a speaker’s voice to sound like someone else. “The AI model’s capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice,” said NVIDIA.

NVIDIA unveils Artificial Intelligence Technology for Speech Synthesis

LEAVE A REPLY Cancel reply

Most Popular

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think

NVIDIA unveils Artificial Intelligence Technology for Speech Synthesis

Subscribe to our newsletter

RELATED ARTICLES

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

Yann LeCun Launches AMI Labs

LEAVE A REPLY Cancel reply

Most Popular

OpenAI Raised $122 Billion. The Math Still Doesn’t Close.

The First $1 Billion AI Company With One Employee Is Here — And It’s Not Who You Think