Thursday, December 19, 2024
ad
HomeNewsMicrosoft announces new text-to-speech AI model called VALL-E

Microsoft announces new text-to-speech AI model called VALL-E

VALL-E can synthesize audio of that person saying anything and do it in a way that attempts to preserve the speaker's emotional tone.

Microsoft announced a new text-to-speech AI model called VALL-E on Thursday that can simulate a person’s voice closely when given a three-second audio sample. 

Once the model learns a specific voice, it can synthesize audio of that same person saying anything and preserves the speaker’s emotional tone. Its creators claim that VALL-E could be used for high-quality text-to-speech applications and audio content creation when brought together with other generative AI models like GPT-3. 

Microsoft describes VALL-E as a neural codec language model which builds on a technology called EnCodec. Unlike other text-to-speech methods that usually synthesize speech by manipulating waveforms, VALL-E creates discrete audio codec codes from text and acoustic prompts using EnCodec.

Read More: Minnesota Startup Claims To Have Made World’s First AI-Generated Online Course

It basically processes how a person sounds, breaks that information into discrete components called tokens, and uses training data to know how that voice would sound in other phrases outside of the three-second sample. 

Microsoft trained VALL-E’s speech synthesis system on an audio library called LibriLight, which Meta assembled. It contains about 60,000 hours of English language speeches from over 7,000 speakers, mainly from LibriVox public domain audiobooks. For VALL-E to create a good result, the voice in the sample must closely match a voice in the training data.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Sahil Pawar
Sahil Pawar
I am a graduate with a bachelor's degree in statistics, mathematics, and physics. I have been working as a content writer for almost 3 years and have written for a plethora of domains. Besides, I have a vested interest in fashion and music.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular