Tuesday, July 23, 2024
HomeNewsSeamlessM4T by Meta is a Multimodal AI Model for Speech and Text...

SeamlessM4T by Meta is a Multimodal AI Model for Speech and Text Translations

SeamlessM4T can understand language from speech or text and generate translations into either or both.

Meta launched SeamlessM4T (Massively Multilingual & Multimodal Machine Translation), a cutting-edge AI translation model, both multimodal and multilingual, facilitating seamless communication through speech and text across diverse languages.

According to the SeamlessM4T research paper, Meta developed the SeamlessM4T model by using one million hours of open speech audio data for self-supervised speech representation with w2v-BERT 2.0. They combined this with a multimodal corpus named SeamlessAlign, containing aligned translations covering more than 470,000 hours. By adding human-labeled and pseudo-labeled data totaling 406,000 hours, Meta created a robust dataset. This enabled them to develop the first multitasking and multilingual system, SeamlessM4T.

SeamlessM4T provides a wide range of functions. It recognizes speech in almost 100 languages, translates speech to text for nearly 100 input and output languages, translates speech to speech for nearly 100 input and 36 output languages, converts text to text in nearly 100 languages, and transforms text to speech with support for around 100 input and 36 output languages.

Read More: Meta Introduced Code Llama, an AI-powered Tool for Coding and Debugging

Meta has made a public release of SeamlessM4T under CC BY-NC 4.0, inviting researchers and developers to build on this work. Additionally, they are sharing the metadata of SeamlessAlign, the largest open multimodal translation dataset ever.

In addition, they have further simplified the process for the developer’s community to conduct data mining on their individual monolingual datasets using SONAR, a comprehensive collection of text and speech sentence encoders, as well as stopes, a Meta library for processing multimodal data and parallel data mining. All these advancements are backed by fairseq2, Meta’s advanced sequence modeling library. 

Users can explore the seamless communication translation demo through this link. This research demo supports translation for around 100 input and 35 output languages. It allows users to record a sentence, choose upto three languages for translation, and view transcriptions and translations. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Tejaswini Kasture
Tejaswini Kasture
I'm a Computer Engineer with a passion for app development and writing. Currently, writing content within the Analytics Drift Content team. Passion for technology helps me to adapt to new waves of innovation that affect every aspect of our lives. Apart from my technical interests, I also like hiking and dancing.


Please enter your comment!
Please enter your name here

Most Popular