Facebook AI announces SEER (SElf-supERvised), a billion-parameter self-supervised model that can give superior output without labeled image data. Facebook’s head of AI and chief AI scientists have been vocal about the potential of self-supervised learning as the way forward for the artificial intelligence industry. As a result, Facebook has been actively working on furthering the development of the self-supervised learning technique.
Over the years, self-supervised learning has been the driving force of several natural language processing tasks like machine translation, natural language inference, and question answering. With the release of Facebook AI’s SEER, self-supervised learning is not making inroads into computer vision.
Although there is no shortage of images in the world, the research community has been struggling to curate labeled image data. Manually curating images takes a lot of effort and increases the cost of research and development or application that highly rely on label datasets to offer state-of-the-art results.
Facebook AI’s SEER uses an open-source VISSL library to build this billion parameters model. VISSL is a PyTorch-based library that allows developers to implement self-supervised learning on image datasets. According to researchers, SEER, however, is not similar to large language models that easily scale to billions and trillion of parameters. In natural language processing, sentences can be broken into words but in computer vision determining the pixels are not straightforward. Other challenges include the variation of the same image due to different angles of capture.
To label new images, it is important to have large convolutional networks, along with algorithms that can learn from images without metadata. Facebook AI’s researchers used its in-house algorithm called SwAV to cluster images based on similarities. And to ensure they have a proper model architecture that can handle the workload, the researchers leveraged RegNets; it has the potential to even scale to trillions of parameters.
Read the full research paper here.