MIT has recently declared that models trained on synthetic data can provide real performance improvements compared to traditional models. The models trained on synthetic data can also eliminate some privacy, copyright, and ethical issues of using actual data.
Researchers teach machines to identify human actions with the help of massive-scale video datasets that show humans performing actions. However, developing such datasets is expensive, and it violates the privacy of personal information such as people’s faces, licenses, number plates, and more. Therefore, to avoid these issues, researchers have started using synthetic datasets.
Synthetic datasets are artificially generated rather than real-world events. They are made by machines using 3D models of scenes, objects, and humans. MIT researchers have developed a synthetic dataset of 150,000 videos capturing various human actions to train machine learning models. They later used six video datasets taken from the real world to the same machine learning models to recognize human actions in those datasets.
MIT’s Researchers found that the synthetic-trained models performed better than those trained on the real datasets. This invention can help researchers use synthetic datasets on machine learning models to achieve more accuracy.
Rogerio Feris, the principal scientist, manager at the MIT-IBM Watson AI Lab, and the research’s co-author, stated that the research’s goal was to replace actual data pretraining with synthetic data pretraining. Although there is a cost in creating the actions in synthetic data, once that is done, you can generate unlimited images or videos by changing poses, lighting, and more.