Wednesday, April 17, 2024
HomeNewsAmazon releases 51-language dataset for language understanding

Amazon releases 51-language dataset for language understanding

The release of MASSIVE will help Amazon in expanding Alexa’s support for numerous regional languages.

Amazon released an open-source speech dataset called MASSIVE that supports 51 languages, encouraging developers to create more third-party Alexa apps and services. 

This is a step for Amazon towards the company’s goal of making voice assistant Alexa available to diverse people in their local languages. Currently, Alexa supports languages like English, Hindi, Spanish, German, Portuguese, French, Japanese, and Arabic. 

Developers will now be able to create multilingual natural-language understanding (MMNLU), a paradigm in which a single machine learning model can parse and understand inputs from many typologically diverse languages. 

Read More: Microsoft launches Microsoft AI Innovate and CodeTitans Hackathon for Indian Startups

The MASSIVE dataset enables models to learn shared representations of utterances with the same intents regardless of language, allowing for cross-linguistic training on NLU tasks. 

According to Amazon, the model can transfer knowledge from languages with vast amounts of training data to those with limited training data by learning a shared data representation that crosses languages. 

Along with MASSIVE, Amazon is also releasing open-source code that shows how to do multilingual NLU modeling and allows practitioners to re-create the baseline results for intent classification and slot filling reported in the company’s research. 

Vice President of Alexa AI, Natural Understanding at Amazon, Prem Natarajan, said, “We are very excited to share this large multilingual dataset with the worldwide language research community,” 

He further added that they anticipate that by sharing this dataset, researchers from all over the world will be able to make new advancements in multilingual language understanding, hence expanding the availability and reach of conversational AI systems. 

Additionally, Amazon is also launching a new competition using the MASSIVE dataset called Massively Multilingual NLU 2022 (MMNLU-22). The Massively Multilingual NLU 2022 workshop will be held in conjunction with EMNLP 2022 on December 7 or 8, offline and online, in Abu Dhabi.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Dipayan Mitra
Dipayan Mitra
Dipayan is a news savvy writer, who does not leave a single page of news paper unturned. He is also a professional vocalist who enjoys ghazals. Building a dog shelter is his forever dream.


Please enter your comment!
Please enter your name here

Most Popular