Google has recently announced a new approach to Reinforcement Learning algorithms called Reincarnating Reinforcement Learning (RRL). This article provides an overview of the RRL algorithms.
Reinforcement Learning is a kind of machine learning technique that focuses on training intelligent agents with related experiences in a way that they can learn to solve decision-making problems like playing video games, designing hardware chips, and flying stratospheric balloons.
Due to the generality of Reinforcement Learning, researchers focus on RL research to develop intelligent agents that can efficiently learn Tabula Rasa. Generally, the term Tabula Rasa is used to describe the chance for a fresh start. For example, when a student’s family migrates to a different location, they must begin the year at a new school in a completely blank state. This means Tabula Rasa is an opportunity to start again with no historical record.
Tabula Rasa RL systems are typically the exceptions rather than the standards for solving large-scale RL problems. Large-scale RL systems like OpenAI Five have achieved human-level performance on Dota2 after experiencing multiple algorithmic changes during the development cycle. But including the algorithmic changes to the RL systems from scratch, can be very challenging and expensive.
Therefore, the inefficient nature of Tabula Rasa Reinforcement Learning research to train agents from scratch can make it challenging for many researchers to handle computationally demanding problems. For example, the standard benchmark to train a deep RL agent on the 50+Atari 2600 games in ALE for 200M frames needs 1000 + GPU days. As the deep RL algorithms move toward complex problems, the computational barrier to entering RL research will become even higher.
Therefore, to address such inefficiencies of Tabula Rasa, Google has introduced a new algorithm called ‘Reincarnating Reinforcement Learning (RRL). It will also present the complete research about the RRL algorithm at the NeurIPS 2022 conference. In this research, Google has proposed an alternative approach to RL research where prior work like learned models, logged data, policies, and more can be reused or transferred between design interactions of the RL agent or from one agent to another. RL uses prior computation in some cases, but most RL agents are still trained from scratch. However, there has not been an inclusive effort to use prior computational work to train workflow in RL research.
Reincarnating Reinforcement Learning (RRL) is a more computational and efficient workflow based on resuing prior computational work while training new RL agents or improving existing RL agents even in the same environment. RRL can standardize RL research by allowing researchers to handle complex RL problems without requiring excessive computational resources. Moreover, RRL can enable a benchmarking example where researchers continually improve and update existing trained RL agents, specifically on problems that impact the real world, like ballon navigation and chip design. The real-world use cases of RL are likely to be used in scenarios where prior computation is available.
RRL is an alternative research workflow for RL that does not train the RL agents from scratch. Instead, it updates the existing RL agents. Suppose a researcher wants to train an agent named A1 for a particular time but now wishes to experiment with better algorithms. In this case, the Tabula Rasa workflow requires retraining another agent from scratch. In contrast, RRL workflow provides an essential option of transferring the existing agent A1 to another agent and training this agent or simply fine-tuning A1.
Reinforcement Learning assumes that agents interact with the online environment to learn from their own past experiences or records. But these algorithms are very challenging to implement in real-life applications like robotics or autonomous driving because you need to train agents in every situation. However, Google assumes RRL will be helpful when a Reinforcement Learning algorithm is costly and time-consuming, where the prior computation can be brought to practice rather than retraining the agents in RL from scratch.