Machine learning (ML) is gaining much traction in recent years owing to the disruption and development it brings in enhancing existing technologies. Every month, hundreds of ML papers from various organizations and universities get uploaded on the internet to share the latest breakthroughs in this domain. As the year ends, we bring you the Top 22 ML research papers of 2022 that created a huge impact in the industry. The following list does not reflect the ranking of the papers, and they have been selected on the basis of the recognitions and awards received at international conferences in machine learning.
- Bootstrapped Meta-Learning
Meta-learning is a promising field that investigates ways to enable machine learners or RL agents (which include hyperparameters) to learn how to learn in a quicker and more robust manner, and it is a crucial study area for enhancing the efficiency of AI agents.
This 2022 ML paper presents an algorithm that teaches the meta-learner how to overcome the meta-optimization challenge and myopic meta goals. The algorithm’s primary objective is meta-learning using gradients, which ensures improved performance. The research paper also examines the potential benefits due to bootstrapping. The authors highlight several interesting theoretical aspects of this algorithm, and the empirical results achieve new state-of-the-art (SOTA) on the ATARI ALE benchmark as well as increased efficiency in multitask learning.
- Competition-level code generation with AlphaCode
One of the exciting uses for deep learning and large language models is programming. The rising need for coders has sparked the race to build tools that can increase developer productivity and provide non-developers with tools to create software. However, these models still perform badly when put to the test on more challenging, unforeseen issues that need more than just converting instructions into code.
The popular ML paper of 2022 introduces AlphaCode, a code generation system that, in simulated assessments of programming contests on the Codeforces platform, averaged a rating in the top 54.3%. The paper describes the architecture, training, and testing of the deep-learning model.
- Restoring and attributing ancient texts using deep neural networks
The epigraphic evidence of the ancient Greek era — inscriptions created on durable materials such as stone and pottery — had already been broken when it was discovered, rendering the inscribed writings incomprehensible. Machine learning can help in restoring, and identifying chronological and geographical origins of damaged inscriptions to help us better understand our past.
This ML paper proposed a machine learning model built by DeepMind, Ithaca, for the textual restoration and geographical and chronological attribution of ancient Greek inscriptions. Ithaca was trained on a database of just under 80,000 inscriptions from the Packard Humanities Institute. It had a 62% accuracy rate compared to historians, who had a 25% accuracy rate on average. But when historians used Ithaca, they quickly achieved a 72% accuracy.
- Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Large neural networks use more resources to train hyperparameters since each time, the network must estimate which hyperparameters to utilize. This groundbreaking ML paper of 2022 suggests a novel zero-shot hyperparameter tuning paradigm for more effectively tuning massive neural networks. The research, co-authored by Microsoft Research and OpenAI, describes a novel method called µTransfer that leverages µP to zero-shot transfer hyperparameters from small models and produces nearly perfect HPs on large models without explicitly tuning them.
This method has been found to reduce the amount of trial and error necessary in the costly process of training large neural networks. By drastically lowering the need to predict which training hyperparameters to use, this approach speeds up research on massive neural networks like GPT-3 and perhaps its successors in the future.
- PaLM: Scaling Language Modeling with Pathways
Large neural networks trained for language synthesis and recognition have demonstrated outstanding results in various tasks in recent years. This trending 2022 ML paper introduced Pathways Language Model (PaLM), a 780 billion high-quality text token, and 540 billion parameter-dense decoder-only autoregressive transformer.
Although PaLM just uses a decoder and makes changes like SwiGLU Activation, Parallel Layers, Multi-Query Attention, RoPE Embeddings, Shared Input-Output Embeddings, and No Biases and Vocabulary, it is based on a typical transformer model architecture. The paper describes the company’s latest flagship surpassing several human baselines while achieving state-of-the-art in numerous zero, one, and few-shot NLP tasks.
- Robust Speech Recognition via Large-Scale Weak Supervision
Machine learning developers have found it challenging to build speech-processing algorithms that are trained to predict a vast volume of audio transcripts on the internet. This year, OpenAI released Whisper, a new state-of-the-art (SotA) model in speech-to-text that can transcribe any audio to text and translate it into several languages. It has received 680,000 hours of training on a vast amount of voice data gathered from the internet. According to OpenAI, this model is robust to accents, background noise, and technical terminology. Additionally, it allows transcription into English from 99 different languages and translation into English from those languages.
The OpenAI ML paper mentions the author ensured that about one-third of the audio data is non-English. This helped the team outperform other supervised state-of-the-art models by maintaining a diversified dataset.
- OPT: Open Pre-trained Transformer Language Models
Large language models have demonstrated extraordinary performance f on numerous tasks (e.g., zero and few-shot learning). However, these models are difficult to duplicate without considerable funding due to their high computing costs. Even while the public can occasionally interact with these models through paid APIs, complete research access is still only available from a select group of well-funded labs. This limited access has hindered researchers’ ability to comprehend how and why these language models work, which has stalled progress on initiatives to improve their robustness and reduce ethical drawbacks like bias and toxicity.
The popular 2022 ML paper introduces Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers with 125 million to 175 billion parameters that the authors want to share freely and responsibly with interested academics. The biggest OPT model, OPT-175B (it is not included in the code repository but is accessible upon request), which is impressively proven to perform similarly to GPT-3 (which also has 175 billion parameters) uses just 15% of GPT-3’s carbon footprint during development and training.
- A Path Towards Autonomous Machine Intelligence
Yann LeCun is a prominent and respectable researcher in the field of artificial intelligence and machine learning. In June, his much-anticipated paper “A Path Towards Autonomous Machine Intelligence” was published on OpenReview. LeCun offered a number of approaches and architectures in his paper that might be combined and used to create self-supervised autonomous machines.
He presented a modular architecture for autonomous machine intelligence that combines various models to operate as distinct elements of a machine’s brain and mirror the animal brain. Due to the differentiability of all the models, they are all interconnected to power certain brain-like activities, such as identification and environmental response. It incorporates ideas like a configurable predictive world model, behavior-driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.
- LaMDA: Language Models for Dialog Applications
Despite tremendous advances in text generation, many of the chatbots available are still rather irritating and unhelpful. This 2022 ML paper from Google describes the LaMDA — short for “Language Model for Dialogue Applications” — system, which caused the uproar this summer when a former Google engineer, Blake Lemoine, alleged that it is sentient. LaMDA is a family of large language models for dialog applications built on Google’s Transformer architecture, which is known for its efficiency and speed in language tasks such as translation. The model’s ability to be adjusted using data that has been human-annotated and the capability of consulting external sources are its most intriguing features.
The model, which has 137 billion parameters, was pre-trained using 1.56 trillon words from publicly accessible conversation data and online publications. The model is also adjusted based on the three parameters of quality, safety, and groundedness.
- Privacy for Free: How does Dataset Condensation Help Privacy?
One of the primary proposals in the award-winning ML paper is to use dataset condensation methods to retain data efficiency during model training while also providing membership privacy. The authors argue that dataset condensation, which was initially created to increase training effectiveness, is a better alternative to data generators for producing private data since it offers privacy for free.
Though existing data generators are used to produce differentially private data for model training to minimize unintended data leakage, they result in high training costs or subpar generalization performance for the sake of data privacy. This study was published by Sony AI and received the Outstanding Paper Award at ICML 2022.
- TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data
The use of a model that converts time series into anomaly scores at each time step is essential in any system for detecting time series anomalies. Recognizing and diagnosing anomalies in multivariate time series data is critical for modern industrial applications. Unfortunately, developing a system capable of promptly and reliably identifying abnormal observations is challenging. This is attributed to a shortage of anomaly labels, excessive data volatility, and the expectations of modern applications for ultra-low inference times.
In this study, the authors present TranAD, a deep transformer network-based anomaly detection and diagnosis model that leverages attention-based sequence encoders to quickly execute inference while being aware of the more general temporal patterns in the data. TranAD employs adversarial training to achieve stability and focus score-based self-conditioning to enable robust multi-modal feature extraction. The paper mentions extensive empirical experiments on six publicly accessible datasets show that TranAD can perform better in detection and diagnosis than state-of-the-art baseline methods with data- and time-efficient training.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
In the last few years, generative models called “diffusion models” have been increasingly popular. This year saw these models capture the excitement of AI enthusiasts around the world.
Going ahead of the current text to speech technology of recent times, this outstanding 2022 ML paper introduced the viral text-to-image diffusion model from Google, Imagen. This diffusion model achieves a new state-of-the-art FID score of 7.27 on the COCO dataset by combining the deep language understanding of transformer-based large language models with the photorealistic image-generating capabilities of diffusion models. A text-only frozen language model provides the text representation, and a diffusion model with two super-resolution upsampling stages, up to 1024×2014, produces the images. It employs several training approaches, including classifier-free guiding, to teach itself conditional and unconditional generation. Another important feature of Imagen is the use of dynamic thresholding, which stops the diffusion process from being saturated in specific areas of the picture, a behavior that reduces image quality, particularly when the weight placed on text conditional creation is large.
- No Language Left Behind: Scaling Human-Centered Machine Translation
This ML paper introduced the most popular Meta projects of the year 2022: NLLB-200. This paper talks about how Meta built and open-sourced this state-of-the-art AI model at FAIR, which is capable of translating 200 languages between each other. It covers every aspect of this technology: language analysis, moral issues, effect analysis, and benchmarking.
No matter what language a person speaks, accessibility via language ensures that everyone can benefit from the growth of technology. Meta claims that several languages that NLLB-200 translates, such as Kamba and Lao, are not currently supported by any translation systems in use. The tech behemoth also created a dataset called “FLORES-200” to evaluate the effectiveness of the NLLB-200 and show that accurate translations are offered. According to Meta, NLLB-200 offers an average of 44% higher-quality translations than its prior model.
- A Generalist Agent
AI pundits believe that multimodality will play a huge role in the future of Artificial General Intelligence (AGI). One of the most talked ML papers of 2022 by DeepMind introduces Gato – a generalist agent. This AGI agent is a multi-modal, multi-task, multi-embodiment network, which means that the same neural network (i.e. a single architecture with a single set of weights) can do all tasks while integrating inherently diverse types of inputs and outputs.
DeepMind claims that the general agent can be improved with new data to perform even better on a wider range of tasks. They argue that having a general-purpose agent reduces the need for hand-crafting policy models for each region, enhances the volume and diversity of training data, and enables continuous advances in the data, computing, and model scales. A general-purpose agent can also be viewed as the first step toward artificial general intelligence, which is the ultimate goal of AGI.
Gato demonstrates the versatility of transformer-based machine learning architectures by exhibiting their use in a variety of applications. Unlike previous neural network systems tailored for playing games, stack blocks with a real robot arm, read words, and caption images, Gato is versatile enough to perform all of these tasks on its own, using only a single set of weights and a relatively simple architecture.
- The Forward-Forward Algorithm: Some Preliminary Investigations
AI pioneer Geoffrey Hinton is known for writing paper on the first deep convolutional neural network and backpropagation. In his latest paper presented at NeurIPS 2022, Hinton proposed the “forward-forward algorithm,” a new learning algorithm for artificial neural networks based on our understanding of neural activations in the brain. This approach draws inspiration from Boltzmann machines (Hinton and Sejnowski, 1986) and noise contrast estimation (Gutmann and Hyvärinen, 2010). According to Hinton, forward-forward, which is still in its experimental stages, can substitute the forward and backward passes of backpropagation with two forward passes, one with positive data and the other with negative data that the network itself could generate. Further, the algorithm could simulate hardware more efficiently and provide a better explanation for the brain’s cortical learning process.
Without employing complicated regularizers, the algorithm obtained a 1.4 percent test error rate on the MNIST dataset in an empirical study, proving that it is just as effective as backpropagation.
The paper also suggests a novel “mortal computing” model that can enable the forward-forward algorithm and understand our brain’s energy-efficient processes.
- Focal Modulation Networks
In humans, the ciliary muscles alter the shape of the eye and hence the radius of the curvature lens to focus on near or distant objects. Changing the shape of the eye lens, changes the focal length of the lens. Mimicking this behavior of focal modulation in computer vision systems can be tricky.
This machine learning paper introduces FocalNet, an iterative information extraction technique that employs the premise of foveal attention to post-process Deep Neural Network (DNN) outputs by performing variable input/feature space sampling. Its attention-free design outperforms SoTA self-attention (SA) techniques in a wide range of visual benchmarks. According to the paper, focal modulation consists of three parts: According to the paper, focal modulation consists of three parts:
a. hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from close-up to a great distance;
b. gated aggregation to selectively gather contexts for each query token based on its content; and
c. element-wise modulation or affine modification to inject the gathered context into the query.
- Learning inverse folding from millions of predicted structures
The field of structural biology is being fundamentally changed by cutting-edge technologies in machine learning, protein structure prediction, and innovative ultrafast structural aligners. Time and money are no longer obstacles to obtaining precise protein models and extensively annotating their functionalities. However, determining a protein sequence from its backbone atom coordinates remained a challenge for scientists. To date, machine learning methods to this challenge have been constrained by the amount of empirically determined protein structures available.
In this ICML Outstanding Paper (Runner Up), authors explain tackling this problem by increasing training data by almost three orders of magnitude by using AlphaFold2 to predict structures for 12 million protein sequences. With the use of this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers is able to recover native sequence on structurally held-out backbones in 51% of cases while recovering buried residues in 72% of cases. This is an improvement of over 10% over previous techniques. In addition to designing protein complexes, partly masked structures, binding interfaces, and numerous states, the concept generalises to a range of other more difficult tasks.
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Within the AI research community, using video games as a training medium for AI has gained popularity. These autonomous agents have had great success in Atari games, Starcraft, Dota, and Go. Although these developments have gained popularity in the field of artificial intelligence research, the agents do not generalize beyond a narrow range of activities, in contrast to humans, who continually learn from open-ended tasks.
This thought-provoking 2022 ML paper suggests MineDojo, a unique framework for embodied agent research based on the well-known game Minecraft. In addition to building an internet-scale information base with Minecraft videos, tutorials, wiki pages, and forum discussions, Minecraft provides a simulation suite with tens of thousands of open-ended activities. Using MineDojo data, the author proposes a unique agent learning methodology that employs massive pre-trained video-language models as a learnt reward function. Without requiring a dense shaping reward that has been explicitly created, MinoDojo autonomous agent can perform a wide range of open-ended tasks that are stated in free-form language.
- Is Out-of-Distribution Detection Learnable?
Machine learning (supervised ML) models are frequently trained using the closed-world assumption, which assumes that the distribution of the testing data will resemble that of the training data. This assumption doesn’t hold true when used in real-world activities, which causes a considerable decline in their performance. While this performance loss is acceptable for applications like product recommendations, developing an out-of-distribution (OOD) identification algorithm is crucial to preventing ML systems from making inaccurate predictions in situations where data distribution in real-world activities typically drifts over time (self-driving cars).
In this paper, authors explore the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem, to study the applicability of OOD detection. They first focus on identifying a prerequisite for OOD detection’s learnability. Following that, they attempt to show a number of impossibility theorems regarding the learnability of OOD detection in a handful yet different scenarios.
- Gradient Descent: The Ultimate Optimizer
Gradient descent is a popular optimization approach for training machine learning models and neural networks. The ultimate aim of any machine learning (neural network) method is to optimize parameters, but selecting the ideal step size for an optimizer is difficult since it entails lengthy and error-prone manual work. Many strategies exist for automated hyperparameter optimization; however, they often incorporate additional hyperparameters to govern the hyperparameter optimization process. In this study, MIT CSAIL and Meta researchers offer a unique approach that allows gradient descent optimizers like SGD and Adam to tweak their hyperparameters automatically.
They propose learning the hyperparameters by self-using gradient descent, as well as learning the hyper-hyperparameters via gradient descent, and so on indefinitely. This paper describes an efficient approach for allowing gradient descent optimizers to autonomously adjust their own hyperparameters, which may be layered recursively to many levels. As these gradient-based optimizer towers expand in size, they become substantially less sensitive to the selection of top-level hyperparameters, reducing the load on the user to search for optimal values.
- ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Embodied AI is a developing study field that has been influenced by recent advancements in artificial intelligence, machine learning, and computer vision. This method of computer learning makes an effort to translate this connection to artificial systems. The paper proposes ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR allows researchers to sample arbitrarily huge datasets of diverse, interactive, customisable, and performant virtual environments in order to train and assess embodied agents across navigation, interaction, and manipulation tasks.
According to the authors, models trained on ProcTHOR using only RGB images and without any explicit mapping or human task supervision achieve cutting-edge results in 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the ongoing Habitat2022, AI2-THOR Rearrangement2022, and RoboTHOR challenges. The paper received the Outstanding Paper award at NeurIPS 2022.
- A Commonsense Knowledge Enhanced Network with Retrospective Loss for Emotion Recognition in Spoken Dialog
Emotion Recognition in Spoken Dialog (ERSD) has recently attracted a lot of attention due to the growth of open conversational data. This is due to the fact that excellent speech recognition algorithms have emerged as a result of the integration of emotional states in intelligent spoken human-computer interactions. Additionally, it has been demonstrated that recognizing emotions makes it possible to track the development of human-computer interactions, allowing for dynamic change of conversational strategies and impacting the result (e.g., customer feedback). But the volume of the current ERSD datasets restricts the model’s development.
This ML paper proposes a Commonsense Knowledge Enhanced Network (CKE-Net) with a retrospective loss to carry out dialog modeling, external knowledge integration, and historical state retrospect hierarchically.