On Tuesday, DeepMind released a preprint on “Open-Ended Learning Leads to Generally Capable Agents,” outlining its initial efforts toward training an agent capable of playing a variety of games without relying on human interaction data. The team built a large training environment called XLand, that produces multiplayer mini-games within stable, “human-relatable” digital 3D scenarios on its own. This environment enables the development of new learning algorithms that dynamically control how an agent learns and the games on which it trains, allowing for the mass training of AI agents to do tasks of variable complexity.
The objective behind creating XLand is to overcome the limitations of training artificial intelligence models (and robots) using reinforcement learning. The reinforcement learning algorithms learn how to do things by synthesizing input from a huge dataset, identifying patterns, and using those patterns to generate educated guesses about fresh data. Simply put, the algorithm learns to make a series of decisions depending on the feedback it receives from its computational environment. A reinforcement learning model is rewarded for generating good predictions and punished for producing bad ones. Over time, it arrives at the optimal solution by attempting to maximize the cumulative reward.
However, the problem with this form of training is that the models are generally trained in a limited set of scenarios. Hence, if the same models are presented with a slightly different set of environments they may struggle to adapt to these environments nor produce satisfactory outcomes.
Therefore, rather than training agents to do a narrow set of activities, the DeepMind research team has discovered a universe of scenarios that may be produced procedurally. Each AI player’s aim is to maximize prizes, and each game determines the players’ unique awards. Deepmind also used population-based training (PBT) to prevent training dead ends. Population-based training is a neural network training approach that allows an experimenter to quickly select the optimum collection of hyperparameters and models for the job.
Read More: Google’s DeepMind Open Sources 3D Structures of all Proteins
The AI agents in DeepMind’s XLand operate as a basic body in continuously changing digital surroundings that the agent views from a first-person perspective. Simple problems such as “Stand next to the purple cube,” “Bring the yellow cube onto the white corridor,” and linked conditions like “Stand next to the purple cube or in the red hallway,” were among the game tasks generated procedurally in the XLand. These tasks enable the AI agents to train themselves and generate experience by performing them.
The agents sense their surroundings by observing RGB images and receive a text description of their goal, direct feedback on success or failure follows at defined time intervals. There are other AI agents with similar or opposing aims in many of the generated games. Also, an AI agent may interact with interactive elements such as spheres, cubes, and ramps by employing tools that allow it to pick up or freeze them.
According to DeepMind, after five generations, AI agents exhibit continual breakthroughs in learning and skills that were previously unknown. In those five iterations, each AI agent has completed over 200 billion training steps due to 3.4 million distinct assignments and has played nearly 700,000 games in 4,000 different XLand environments.
The AI agents demonstrated general behavior tendencies such as exploration, like changing the state of the environment until they obtained a rewarding condition, after only 30 minutes of intense training on a new task. DeepMind reported that these agents were aware of the fundamentals of their bodies, the passage of time, and the high-level structure of the games they were playing.
Last month DeepMind claimed that reinforcement learning was enough to achieve General AI. Now it also admits that the above feat would not have been possible using the reinforcement learning method alone – thus paving way for zero-shot learning. In its blog, it wrote, “Instead of learning one game at a time, these [systems] would be able to react to completely new conditions and play a whole universe of games and tasks, including ones never seen before.”
For more information visit here.