Creating an intelligent AI is more than finding patterns in data. They must be able to understand human’s intuitive decision-making as we make decisions based on the intentions, beliefs, and desires of others. At the 2021 International Conference on Machine Learning (ICML), researchers from IBM, MIT, and Harvard University released a Common Sense AI dataset. It was a part of a multi-year project with the U.S. Department of Defense’s Defense Advanced Research Projects Agency (DARPA). The Machine Common Sense project aims to develop models of intuitive psychology and see whether AI can reason similar to how we educate human infants.
At ICML, Researchers unveiled AGENT (Action, Goal, Efficiency, coNstraint, uTility), a benchmark that empowers machines to grasp the core concepts of intuitive psychology. The AGENT model comprises a large dataset of 8,400 3D animations categorized under four scenarios: Goal Preferences, Action Efficiency, Unobserved Constraints, and Cost-Reward Trade-offs.
This dataset to train AI modes is similar to how psychologists evaluate an infant’s intuitive ability. Researchers also introduced two baseline machine learning models: BIPaCK and ToMnet-G, based on Bayesian inverse planning and the Theory of Mind neural network.
Commonsense reasoning has been a bottleneck for researchers in both natural language processing and other artificial intelligence techniques. Intuitive psychology gives us the ability to have meaningful social interactions by understanding and reasoning other people’s states of mind. However, ML models lack this power of intuition and require extensive data sets to train AI models. AGENT aims to bridge this gap and build AI that manifests the same common sense as a young child.
“Today’s machine learning models can have superhuman performance. It is still unclear if they understand basic principles that drive human reasoning. For machines to successfully be able to have social interaction like humans do among themselves, they need to develop the ability to understand hidden mental states of humans,” said Abhishek Bhandwaldar, Research Engineer, MIT-IBM AI Lab.
Like other infant studies, this project also has two phases in each trial: familiarization and test. There are 8,400 3D animations lasting between 5.6s to 25.2s and a frame rate of 35 fps. “With these videos, we constructed 3,360 trials, divided into 1,920 training trials, 480 validation trials, and 960 testing trials. All training and validation trials only contain expected test videos,” the researchers said.
Researchers compared the two machine learning algorithms built on traditional human psychology methods on AGENT with human performance. “Overall, we find that BIPaCK achieves a better performance than ToMnet-G, especially in tests of strong generalization,” reads the paper.
This study shows that we can teach AI models how humans make intuitive decisions. It’s also seen that the ML models lack generalization and need pre-training or advanced architectures. Researchers claim AGENT can be used as a diagnostic tool for developing better models of common sense AI.