At Meta AI’s Inside the Lab event on 23 Feb 2022, Yann LeCun, Meta AI’s chief scientist, proposes that AI’s ability to approach human-like capability is a matter of the ability to learn the internal architecture of how the world works. He notes that a teenager can learn to drive in 20 hours. On the other hand, an autonomous driving system requires billions of labeled data for training and millions of reinforcement learning trials. Yet they fall short of human’s capability to drive cars. He proposes a 6 Modules Architecture of Common Sense to Achieve Autonomous Intelligence during the event.
LeCun believes that the next AI revolution will come when AI systems no longer rely on supervised learning. He hypothesizes that humans and nonhuman animals can learn about the world through observation and small amounts of interactions, often called common sense. He also said that AI systems would have to learn from the world itself with minimal help from humans, which can be achieved with common sense.
“Human and nonhuman animals seem able to learn enormous amounts of background knowledge about how the world works through observation and through an incomprehensibly small amount of interactions in a task-independent, unsupervised way,” LeCun says. “It can be hypothesized that this accumulated knowledge may constitute the basis for what is often called common sense.”
Read more: Meta to build a Digital Voice Assistant for Metaverse
LeCun proposed an architecture of six separate, differential modules that can easily compute gradient estimates of the objective function with respect to input and propagate the information to upstream modules. This common-sense architecture can help AI systems to achieve autonomous intelligence. The six modules are configurator, perception, world model, short-term memory, actor, and cost.
Image Source: Facebook AI
The configurator module is for executive control, like executing a given task. It’s also responsible for pre-configuring the perception, world model, cost, and the actor module by modulating the parameters of those modules.
The perception module receives signals from sensors and estimates the current state of the world, but only a small subset of the perceived state of the world is relevant and valuable for a given task.
The world model module has two roles, and it’s the most complex piece of architecture. The first role is to estimate missing information about the state of the world that is not provided by perception to predict the natural evolutions of the world. The second role is to predict plausible future states of the world. The world model module acts as a simulator to the task at hand. It helps represent multiple possible predictions.
The cost module predicts the level of discomfort of the agent and has two submodules: the intrinsic cost and the critic. The former submodule is immutable and computes discomforts like damage to the agent, violation of hard-coded behavioral constraints, etc.). The latter submodule is a trainable module that predicts future values of the intrinsic cost.
The actor module computes proposals for action sequences. “The actor can find an optimal action sequence that minimizes the estimated future cost and output the first action in the optimal sequence, in a fashion similar to classical optimal control,” LeCun says.
The short-term memory module keeps track of the current and predicted world state and associated costs.
The center of this architecture is the predictive world model. Since the real world is not entirely predictable, it is critical to represent it with multiple plausible protections. The challenge is to design a model that can learn the abstract presentations of the world, ignore irrelevant details, and then predict a plausible model.
Meta AI has introduced JEPA or joint embedding predictive architecture that can capture dependencies between two inputs. JEPA can produce informative abstract presentations while eliminating relevant details while predicting the model. The idea is that JEPA will be able to learn the intricacies of the process of the world just as a newborn does by observation.