At the conference on Empirical Methods in Natural Language Processing (EMNLP) 2021, Hugging Face’s “Datasets: A Community Library for Natural Language Processing” paper was awarded for the best demonstration paper. It is their second award in a row from EMNLP, where the paper on “Transformers: State of the Art Natural Language Processing” got the best demonstration paper award last year at EMNLP 2020.
EMNLP 2021 aims at five categories of awards such as best long paper, best short paper, main conference papers, outstanding papers, as well as the best demo paper, where Hugging Face grabs the award for best demo paper.
The award-winning paper comprises the Hugging Face’s dataset projects that have more than 300 contributors. It is termed as a community project that allows researchers to access hundreds of datasets with ease. It has provided the new use cases of cross-dataset NLP and advanced the existing features for tasks like indexing and streaming large datasets.
Read more: Hugging Face Launches Their Free NLP Course
The project has lightweight libraries that provide two significant features, such as one-line data loaders for many public datasets and efficient data pre-processing techniques.
In addition, it provides access to +15 evaluation metrics and is specially designed to let users quickly add and share new datasets and metrics. Further attributes include smart caching, built-in interoperability, etc. Datasets originated from a fork of the TensorFlow dataset and developed further.
For more information about the paper on “Datasets: A Community Library for Natural Language Processing,” you can read the official documentation from Hugging Face in the link.