SeamlessM4T by Meta is a Multimodal AI Model for Speech and Text Translations

SeamlessM4T can understand language from speech or text and generate translations into either or both.

August 28, 2023

Meta's SeamlessM4T, Multimodal AI Model for translating speech and text — Image Credits: AD

Meta launched SeamlessM4T (Massively Multilingual & Multimodal Machine Translation), a cutting-edge AI translation model, both multimodal and multilingual, facilitating seamless communication through speech and text across diverse languages.

According to the SeamlessM4T research paper, Meta developed the SeamlessM4T model by using one million hours of open speech audio data for self-supervised speech representation with w2v-BERT 2.0. They combined this with a multimodal corpus named SeamlessAlign, containing aligned translations covering more than 470,000 hours. By adding human-labeled and pseudo-labeled data totaling 406,000 hours, Meta created a robust dataset. This enabled them to develop the first multitasking and multilingual system, SeamlessM4T.

SeamlessM4T provides a wide range of functions. It recognizes speech in almost 100 languages, translates speech to text for nearly 100 input and output languages, translates speech to speech for nearly 100 input and 36 output languages, converts text to text in nearly 100 languages, and transforms text to speech with support for around 100 input and 36 output languages.

Meta has made a public release of SeamlessM4T under CC BY-NC 4.0, inviting researchers and developers to build on this work. Additionally, they are sharing the metadata of SeamlessAlign, the largest open multimodal translation dataset ever.

In addition, they have further simplified the process for the developer’s community to conduct data mining on their individual monolingual datasets using SONAR, a comprehensive collection of text and speech sentence encoders, as well as stopes, a Meta library for processing multimodal data and parallel data mining. All these advancements are backed by fairseq2, Meta’s advanced sequence modeling library.

Users can explore the seamless communication translation demo through this link. This research demo supports translation for around 100 input and 35 output languages. It allows users to record a sentence, choose upto three languages for translation, and view transcriptions and translations.

SeamlessM4T by Meta is a Multimodal AI Model for Speech and Text Translations

LEAVE A REPLY Cancel reply

Most Popular

Unlocking Tomorrow: The Future of Artificial Intelligence and Its Impact on Our Lives

SeamlessM4T by Meta is a Multimodal AI Model for Speech and Text Translations

Subscribe to our newsletter

RELATED ARTICLES

Grok 4: xAI’s Boldest AI Model Yet Brings Voice, Vision, and Reasoning to the Forefront

Perplexity’s Comet Browser Redefines AI-Powered Browsing with Agentic Search

Gemini Adds AI Magic: Turn Your Photos Into Videos with Google’s Latest Tool

LEAVE A REPLY Cancel reply

Most Popular

Unlocking Tomorrow: The Future of Artificial Intelligence and Its Impact on Our Lives