Imagine you have come home after hitting the gym, you want to have a smoothie, but you are too tired to make one. You suddenly remember you have to wash utensils from the previous meal, vacuum the floor and cook for dinner–but you are still tired and sore from your intensive workout routine at the gym. Well, Google can help you with it! Google has revealed that it is working on an artificial intelligence (AI) system that can pick up on human communication styles and innately carry out human wishes. Google has also published a robot that is in development that is outfitted with this AI as per its paper ‘Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.’
You may hire a butler like Batman’s Alfred to help you with daily chores (if you can afford one) or ask your robot butler. The problem with robots is that though they are adept at carrying out short hard-coded instructions systematically, they may fail in comprehending ambiguous requests. For instance, if you mention that you are hungry to a robot, it may acknowledge that yet do not know what to do next. However, a robot from Everyday Robots Project, a group under its experimental X labs, can offer you a bag of Doritos from your Kitchen counter without receiving instructions from you about the same. Through the use of millions of web-scraped text pages, the robot’s control software has developed the ability to convert spoken words into a series of physical movements.
With the use of the technology and Google’s AI language model, a robot can now decipher ambiguous human commands and put together a series of responses. That contrasts sharply with the carefully programmed tasks that the majority of robots carry out under strictly regulated conditions, such as fixing windshields on a vehicle manufacturing line. This proves that we are closer to witnessing robots straight out of science fiction.
Google reveals that, unlike virtual assistants like Alexa or Siri, a person doesn’t need to deliver orders using a certain set of previously approved wake-up words for this AI robot. The robot would try to fetch you something to drink if you say “I’m thirsty,” and it should return with a sponge if you say, “Whoops, I just spilled my drink.” This technological feat has been made possible with the use of the most powerful large language model developed by Google. Dubbed the Pathways Language Model (PaLM), this large language model is a dense decoder-only Transformer model with 540 billion parameters that was trained using the Pathways technology, allowing Google to effectively train a single model across several TPU v4 Pods.
PaLM was trained using a combination of English and multilingual datasets, including GitHub code, high-quality web publications, articles from Wikipedia, and chats. Additionally, Google had developed a “lossless” vocabulary that breaks numbers into separate tokens, one for each digit, splits non-vocabulary Unicode characters into bytes, and maintains all whitespace (which is crucial for coding). At the time of its announcement, Google claimed that PaLM performs impressively on a variety of BIG-bench tests for natural language processing and creation. The model, for instance, can recognize cause and effect, comprehend conceptual combinations in certain settings, and even identify a movie from an emoji.
The robot butler was developed by Google researchers using new software that takes advantage of PaLM’s text processing skills to transform a spoken command or phrase into a series of relevant actions that the robot may carry out, such as “open drawer” or “pick up chips.” Google has christened the resulting system PaLM-SayCan, a catchphrase that describes how the model blends the language comprehension skills of LLMs (“Say”) with the “dynamic capabilities grounding” of its robots (that’s “Can” – processing instructions via various actions).
In order for the robot to independently explore a location and recognize objects and locations relevant to a command, it also has hearing and optical sensors.
Per the Everyday Robots, by incorporating a multitude of machine learning algorithms such as reinforcement learning, collaborative learning, and learning from demonstration, the robots have progressively improved their knowledge of their environment and their aptitude for doing common tasks.
Through a separate training phase, where people remotely operated the PaLM-SayCan robot to demonstrate how to perform things like picking up objects, the robot learned its library of physical activities. It can only carry out a certain number of activities inside its surroundings, which helps avoid language model ambiguities from manifesting as wayward behavior. Google claims that this technology is ready to go mainstream as the company researchers have accomplished research undertaking. Instead of testing it in a more controlled lab setting, Google has been trialing it in the employee kitchen area so as to create robots that can be useful in the unexpected turmoil of our everyday lives. This exemplifies the potential for butler robots to adjust to the uncertainty of the real world. The ability of robots to browse the internet and fulfill purchases is already progressing as Google Research, and Everyday Robots collaborate together to integrate the finest language models with robot learning.
However, due to the assistants’ limited ability to respond to orders contextually and the fact that the announcement merely served as a preview of possible capabilities, the robotic butlers are not yet suitable for commercial deployment.
Meanwhile, although Google claims to be pursuing research responsibly, fears about robots becoming surveillance machines or possessing technology that might respond in an inappropriate manner could ultimately cause adoption to stagnate. Google reassures individuals who worry that things can go wrong that they take a proactive approach to this research and adhere to Google’s AI Principles while building helper robots.
According to Google, by incorporating PaLM-SayCan into their robots, the robots were able to chart the right actions to 101 user instructions 84% of the time and carry them out 74% of the time. Despite the fact that these statistics are impressive, the data should be interpreted cautiously. Since we don’t have access to all 101 commands, it’s unclear how limited these directives were. Were these 101 instructions tailored to grasp the complexity of language that a true robot butler would be able to understand? Can they understand complex commands or wishes like, ‘I want an orange soda instead of lime,’ ‘Can you organize the closet,’ or ‘Would you julienne the tomatoes instead of dicing them.’ Can their actions align with human expectations every time? For instance, when asked to ‘put on the TV,’ would it switch on the TV (human intent) or put the TV at some place (machine logical reasoning)?
Some skeptics believe that once an AI system reaches a certain level of complexity and reacts to its surroundings in a manner resembling that of a human, we should consider it to be aware and, maybe, to have rights. The recent controversy around permitting AI-powered robots to be a part of daily human life was brought up during the Moscow Chess Open competition when a chess-playing robot went rogue and assaulted its 7-year-old opponent for not waiting for the robot to make its move.
From an architectural perspective, the majority of contemporary AI systems focus on one job or a narrow band of tasks at a time. In contrast, PaLM-SayCan will be expected to understand the human conversation and deliver expected results that span multiple tasks, including fetching you a bag of chips. Not only that, but PaLM-SayCan must also differentiate between logically carrying out human commands and cognitive-ethical reasoning. If asked to feed chocolate to a dog, would it promptly follow the instructions or use the fact that dogs are allergic to chocolate to refrain from following human commands?