Wednesday, May 29, 2024
HomeNewsMeta developed a metric for pruning AI datasets

Meta developed a metric for pruning AI datasets

The newly developed dataset pruning technique will improve training scalability from a power-law to exponential decay.

Meta and Stanford University researchers have developed a new metric for pruning AI datasets. The metric will enhance training scalability by following a power-law relationship where additional data samples would be required to increase the performance by a few percentage points. 

The pruning techniques used at present are either inefficient or severely compute-intensive. This new pruning algorithm will require much lesser computational time and is self-sufficient.

Researchers used statistical mechanics to show that proper dataset pruning can scale the performance by an exponential-decay relationship. Exponential-decay relationships require less additional sample data to output the same performance. 

Meta’s researchers started by developing a theoretical model of data pruning and determining a ‘margin’ of the training example, where “easy” indicated a large margin and “hard” meant a smaller one. 

Read More: Reddit introduces a new method to accept crypto payments using Community Points

They used K-means clustering on an embedding space. The pruning metric is the distance between the dataset example and the nearest cluster centroid. The researchers observed that the best pruning results depended on the initial dataset size. They concluded that as dataset size increases, the number of datasets required for pruning would also increase to achieve significant results via exponential decay.

This is not the first time that model performance has become the focus of a research project. In 2020, OpenAI also published research based on accuracy trends of NLP models. The research also prioritized dataset sizes as a factor affecting the model performance.

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Disha Chopra
Disha Chopra
Disha Chopra is a content enthusiast! She is an Economics graduate pursuing her PG in the same field along with Data Sciences. Disha enjoys the ever-demanding world of content and the flexibility that comes with it. She can be found listening to music or simply asleep when not working!


Please enter your comment!
Please enter your name here

Most Popular