Many organizations like Meta, OpenAI, and Google are working in the language domain and expanding language models to proximate more complex tasks. Language models generally transform and generate qualitative information, just like humans. Fundamentally, these models interpret data using algorithms that process information in the context of natural language. Once the algorithms work, the models accurately produce new content. Working in the same domain, Google AI has proposed a new artificial intelligence-driven approach, “ReAct,” for large language models. In this research, researchers have combined reasoning and acting advances to enhance the efficiency of language models.
Existing language models usually work using two main techniques: chain-of-thought or pre-trained models. The models that work via chain-of-thought, a standard prompting method that enables a model to decompose the problem into numerous intermediate steps, are very efficient. With this prompting technique, language models with sufficient scale (approximately ~100B parameters) can also effectively solve reasoning problems. However, this technique makes reason-only models unsuitable for external environments and has limited exploring abilities.
On the other hand, others that use pre-trained language models focus on mapping text contexts to actions with the model’s internal knowledge. These models are hence known as act-only models. However, even these models cannot reason or remain consistent in their actions as they learn from what has been fed. If the input is not socially sound and consistent, the model will learn from it and output in the same manner. Consequently, language models are known to exhibit more social bias compared to human-written text.
With ReAct, the researchers show that the Reason+Act (ReAct) paradigm outperforms models with reason-only and act-only paradigms. Especially when it is a large model, optimizing smaller models, and enhancing interpretability, ReAct is very efficient. To set up the ReAct prompting method, the PaLM-540B language model was used to prompt in-context domain-specific examples in the model. While executing reasoning-based tasks like navigating, the reasoning and acting jobs are alternated. For instance, say the prompt is “go to” in a room for navigating, then this command needs a task-solution trajectory that comprises multiple reasoning-action-observation stages.
What sets the ReAct approach apart is that the reasoning traces only need to be sparsely located throughout the trajectory of tasks with a large number of actions. The ReAct model then determines when and how reasoning and action responses will occur asynchronously. The PaLM-540B model was used to generate successful trajectories, which were later used to fine-tune smaller models like PaLM-8/62B.
The researchers evaluated ReAct against four benchmarks to see if it could reduce the extensive need for human annotation. These benchmarks were: HotPotQA (for question answering), Fever (for fact-checking), ALFWorld (text-based gaming), and WebShop (web page navigation. In the context of HotPotQA and Fever, it was observed that the model overcomes standard errors and hallucinations in chain-of-thought reasoning. When it comes to ALFWorld and WebShop, ReAct surpasses reinforcement learning techniques.
ReAct was also investigated using a human inspector control over its reasoning traces so that the researchers could evaluate human interactions with the model. ReAct successfully altered its behavior corresponding to the revisions provided by the human inspector. The model generates a hallucinatory trajectory when any improvised revision is entered, making it highly efficient in human-machine interaction with negligible human involvement.
Google has been actively working on language models, and this new model is yet another stride of success in that direction. As seen in the paper, ReAct makes it feasible to describe a behavior or feedback within a model while it flexibly handles the input and calls for action. Be it multiple-choice question-answering, fact-verification, or interactive decision-making, ReAct exhibits a commendable performance.