Microsoft announces new text-to-speech AI model called VALL-E

VALL-E can synthesize audio of that person saying anything and do it in a way that attempts to preserve the speaker's emotional tone.

By Sahil Pawar

January 10, 2023

Microsoft announced a new text-to-speech AI model called VALL-E on Thursday that can simulate a person’s voice closely when given a three-second audio sample.

Once the model learns a specific voice, it can synthesize audio of that same person saying anything and preserves the speaker’s emotional tone. Its creators claim that VALL-E could be used for high-quality text-to-speech applications and audio content creation when brought together with other generative AI models like GPT-3.

Microsoft describes VALL-E as a neural codec language model which builds on a technology called EnCodec. Unlike other text-to-speech methods that usually synthesize speech by manipulating waveforms, VALL-E creates discrete audio codec codes from text and acoustic prompts using EnCodec.

It basically processes how a person sounds, breaks that information into discrete components called tokens, and uses training data to know how that voice would sound in other phrases outside of the three-second sample.

Microsoft trained VALL-E’s speech synthesis system on an audio library called LibriLight, which Meta assembled. It contains about 60,000 hours of English language speeches from over 7,000 speakers, mainly from LibriVox public domain audiobooks. For VALL-E to create a good result, the voice in the sample must closely match a voice in the training data.

Microsoft announces new text-to-speech AI model called VALL-E

LEAVE A REPLY Cancel reply

Most Popular

Microsoft announces new text-to-speech AI model called VALL-E

Subscribe to our newsletter

RELATED ARTICLES

Google Rolls Out Deep Think in Gemini App to Power Ultra‑Reasoning AI

Meta Unveils Vision for Personal Superintelligence

Google NotebookLM Video Overviews Launch Turns Research into AI‑Powered Explainer Videos

LEAVE A REPLY Cancel reply

Most Popular