Friday, July 19, 2024
HomeNewsNVIDIA unveils Artificial Intelligence Technology for Speech Synthesis

NVIDIA unveils Artificial Intelligence Technology for Speech Synthesis

NVIDIA’s new RAD-TTS speech synthesis model can accurately mimic human-like voice modulations that can be used to train voice bots, assistants, and many other purposes.

NVIDIA recently unveiled its research on speech synthesis that would make voice assistants like Google Assistant and Siri sound way more human-like. The current technology used for generating speech has improved by many folds over the past years but still lacks critical human speech elements like rhythm and intonation. 

NVIDIA researchers are developing a new speech synthesis technology that would make voice assistants sound richer and will be able to produce voice modulations and dynamics closer to those made by humans. 

The complete research will be released in session at the Interspeech 2021 conference that will commence on 3rd September. The researchers from NVIDIA’s text-to-speech department have developed a unique model named RAD-TTS that can achieve the aforementioned qualities in a voice bot. 

Read More: Implantable AI Chip Developed for Classification of Biosignals in Real-time

The developers conducted extensive research on conversational artificial intelligence, natural language processing, audio enhancement, automated speech recognition, and various other subjects to build the RAD-TTS model. The model can efficiently be run on NVIDIA GPUs, and the company has open-sourced some of the research works through their NVIDIA NeMo toolkit. 

NVIDIA mentioned in a blog, “With this interface, our video producer could record himself reading the video script, and then use the AI model to convert his speech into the female narrator’s voice.” 

The blog also mentioned that users could use the technology to tweak the generated speech to improve the narration and flow of videos. Earlier speech synthesis models were not able to produce accurate voice modulations, so they could not add the emotional aspect of speech to narrations. 

NVIDIA’s RAD-TTS comes with a unique feature that can change a speaker’s voice to sound like someone else. “The AI model’s capabilities go beyond voiceover work: text-to-speech can be used in gaming, to aid individuals with vocal disabilities or to help users translate between languages in their own voice,” said NVIDIA. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Dipayan Mitra
Dipayan Mitra
Dipayan is a news savvy writer, who does not leave a single page of news paper unturned. He is also a professional vocalist who enjoys ghazals. Building a dog shelter is his forever dream.


Please enter your comment!
Please enter your name here

Most Popular