Last September, Facebook had unveiled Dynabench, an AI data collection and benchmarking tool that creates complex test datasets by putting humans and models “in the loop.” They are now unveiling ‘Dynatask,’ a new feature that unlocks Dynabench’s full potential for the AI community.
By allowing human annotators to engage organically with NLP models, Dynatask assists researchers in identifying flaws in the models. Dynatask has created a new artificial intelligence model benchmarking system that is more accurate and impartial than previous techniques. Researchers will be able to take advantage of the Dynatask platform’s powerful features and compare models on the dynamic leaderboard. This includes measuring techniques for fairness, robustness, compute, and memory, in addition to accuracy.
Dynabench analyses how easily humans can deceive AI using dynamic adversarial data collection technique, which Facebook says is a stronger determinant of a model’s quality than current benchmarks. Last year, academics from Facebook and the University College London showed evidence that 60 to 70 percent of responses produced by models evaluated on open-domain benchmarks are embedded somewhere in the training sets.
Meanwhile, in the past few years, the AI community has taken an interest in open-domain question-answering for its practical uses, and more recently, to assess language models’ understanding of factual information. However, a thorough grasp of the types of questions that models can answer remains unresolved. Further, unknown factors regarding the distribution of questions and responses in benchmark corpora make it challenging to interpret the results.
Fortunately, these benchmarks have been quickly saturating in recent years in NLP. Looking through the annals, it took the research community 18 years to attain human-level performance on MNIST and roughly six years to outperform humans on ImageNet. Meanwhile, beating humans on the GLUE language understanding benchmark took almost a year.
Despite these impressive feats, we are still a long way from having robots that can thoroughly comprehend the essence of natural language.
Read More: Facebook Launches Captum v0.4.0, adding new functionalities to build AI responsibly
When Dynabench was first released, it included four tasks: natural language inference, question answering, sentiment analysis, and hate speech identification. With its recent breakthroughs, the Facebook AI research team has powered the multilingual translation challenge at Workshop on Machine Translations. These attempts to acquire dynamic data resulted in eight articles being published and over 400K raw samples being collected.
Dynabench provided new complex datasets that combine people and models to test NLP models properly. This approach identifies gaps in current models, allowing the next generation of AI models to be trained in the loop. It also assesses how readily people may deceive AI algorithms in a dynamic rather than static setting. Dynatask takes these features a step further.
The Facebook Dynatask platform’s potential is limitless, thanks to its highly flexible and configurable feature platform. It opens up a whole new universe for task designers. They may simply create annotation interfaces to allow interactions with models hosted on any number or machine learning competition such as Dynabench and can set up their own challenges with no coding skills. This will enable researchers to acquire dynamic adversarial data.
Each task in Dynatask will have one or more owners who will define the task parameters. They will be able to choose the required assessment indicators/metrics from a list of options, including accuracy, robustness, fairness, compute, and memory. According to the Facebook newsroom blog, anyone can upload models to the task’s evaluation cloud, which is set up to compute scores and other metrics on specified data sets. They may be moved into the loop for dynamic data gathering and human-in-the-loop evaluation after uploading the model, calculation, and evaluation. Task owners may also gather data via the dynabench.org web interface or using annotators (such as Mechanical Turk).
Let’s say you wanted to start a Natural Language Inference task but there weren’t any yet.
Step 1: Log into your Dynabench account and go to your profile page to fill out the “Request new task” form.
Step 2: Once approved, you will have a dedicated task page and corresponding admin dashboard that you control, as the task owner.
Step 3: After uploading the model, select the existing datasets you want to use to evaluate models from the dashboard, as well as the metrics you want to use.
Step 4: Finally, propose or request baseline models from the community.
Step 5: Upload fresh contexts to the system and begin collecting data using the task owner interface to gather a new round of dynamic adversarial data, in which annotators are instructed to produce instances that deceive the model.
Step 6: Once you’ve collected enough data and established that training on the data improves the system, you may upload improved models and then put them in the data collecting loop to create even stronger ones.
The same fundamental process was used by the Facebook AI Research team to create various dynamic data sets, such as Adversarial Natural Language Inference. With the tools available to the larger AI community, the team believes that anybody can build data sets with people and models in the loop.