Meta launched SeamlessM4T (Massively Multilingual & Multimodal Machine Translation), a cutting-edge AI translation model, both multimodal and multilingual, facilitating seamless communication through speech and text across diverse languages.
According to the SeamlessM4T research paper, Meta developed the SeamlessM4T model by using one million hours of open speech audio data for self-supervised speech representation with w2v-BERT 2.0. They combined this with a multimodal corpus named SeamlessAlign, containing aligned translations covering more than 470,000 hours. By adding human-labeled and pseudo-labeled data totaling 406,000 hours, Meta created a robust dataset. This enabled them to develop the first multitasking and multilingual system, SeamlessM4T.
SeamlessM4T provides a wide range of functions. It recognizes speech in almost 100 languages, translates speech to text for nearly 100 input and output languages, translates speech to speech for nearly 100 input and 36 output languages, converts text to text in nearly 100 languages, and transforms text to speech with support for around 100 input and 36 output languages.
Read More: Meta Introduced Code Llama, an AI-powered Tool for Coding and Debugging
Meta has made a public release of SeamlessM4T under CC BY-NC 4.0, inviting researchers and developers to build on this work. Additionally, they are sharing the metadata of SeamlessAlign, the largest open multimodal translation dataset ever.
In addition, they have further simplified the process for the developer’s community to conduct data mining on their individual monolingual datasets using SONAR, a comprehensive collection of text and speech sentence encoders, as well as stopes, a Meta library for processing multimodal data and parallel data mining. All these advancements are backed by fairseq2, Meta’s advanced sequence modeling library.
Users can explore the seamless communication translation demo through this link. This research demo supports translation for around 100 input and 35 output languages. It allows users to record a sentence, choose upto three languages for translation, and view transcriptions and translations.