Amazon released an open-source speech dataset called MASSIVE that supports 51 languages, encouraging developers to create more third-party Alexa apps and services.
This is a step for Amazon towards the company’s goal of making voice assistant Alexa available to diverse people in their local languages. Currently, Alexa supports languages like English, Hindi, Spanish, German, Portuguese, French, Japanese, and Arabic.
Developers will now be able to create multilingual natural-language understanding (MMNLU), a paradigm in which a single machine learning model can parse and understand inputs from many typologically diverse languages.
Read More: Microsoft launches Microsoft AI Innovate and CodeTitans Hackathon for Indian Startups
The MASSIVE dataset enables models to learn shared representations of utterances with the same intents regardless of language, allowing for cross-linguistic training on NLU tasks.
According to Amazon, the model can transfer knowledge from languages with vast amounts of training data to those with limited training data by learning a shared data representation that crosses languages.
Along with MASSIVE, Amazon is also releasing open-source code that shows how to do multilingual NLU modeling and allows practitioners to re-create the baseline results for intent classification and slot filling reported in the company’s research.
Vice President of Alexa AI, Natural Understanding at Amazon, Prem Natarajan, said, “We are very excited to share this large multilingual dataset with the worldwide language research community,”
He further added that they anticipate that by sharing this dataset, researchers from all over the world will be able to make new advancements in multilingual language understanding, hence expanding the availability and reach of conversational AI systems.
Additionally, Amazon is also launching a new competition using the MASSIVE dataset called Massively Multilingual NLU 2022 (MMNLU-22). The Massively Multilingual NLU 2022 workshop will be held in conjunction with EMNLP 2022 on December 7 or 8, offline and online, in Abu Dhabi.