For decades we have been trying to perfect artificial intelligence algorithms and models that can be at par with the cognitive human brain. From parsing data numbers in seconds, finding new patterns to models that can create their own content. In 2020, OpenAI, a research business co-founded by Elon Musk released GPT-3 (Generative Pre-trained Transformer version 3) model, which created huge shockwaves in the natural language processing industry. Trained on 570GB of text information gathered a publicly available dataset known as CommonCrawl along with other texts selected by OpenAI, including the text of Wikipedia, GPT-3 can generate textual output without any supervised training.
While GPT-3’s capacity to synthesize content has been touted as the finest in AI to date, there are a few things to keep in mind. For example, while GPT-3 can produce high-quality text, it can yield incoherent output while forming large phrases and repeating text sequences repeatedly. GPT-3 can also output nonsensical content on occasion. Along with these drawbacks, GPT-3 has the possibility of being used for phishing, spamming, disseminating false information, or other fraudulent actions because of its human-like text generation capacity. Furthermore, the text created by GPT-3 has the biases of the language on which it was trained.
Aligning AI systems with human objectives, intentions, and values has remained a distant dream after years of research and development. Every major AI discipline appears to tackle a portion of the issue of reproducing human intellect while leaving crucial sections unsolved. And when we apply present AI technology to domains where we need intelligent beings to operate with the reason and logic that we demand from humans, there are many grey areas that need to be addressed. For example, Nabla, a Paris-based healthcare firm, developed a chatbot using GPT-3 and tested if it can help people struggling with mental health problems. To their utter shock, they noticed that the model urged a hypothetical suicidal patient to kill themselves.
Recently, OpenAI explained that its goal was to develop a model that can produce content from the resources provided to it, whether it is text prompts or online literature. The company now has unveiled a new version of GPT-3, which it claims eliminates some of the most damaging flaws that marred the previous edition. The revised model, dubbed InstructGPT, is better at following the directions of individuals who use it, resulting in less inappropriate language, disinformation, and overall mistakes—unless expressly ordered not to. OpenAI asserts that InstructGPT is closer to enforcing AI alignment than the previous iterations of GPT-3.
OpenAI recruited 40 humans to evaluate GPT-3’s responses to a variety of prewritten prompts, such as “Write a story about a wise frog called Julius” or “Write a creative ad for the following product to run on Facebook,” in order to train InstructGPT. The team used only prompts submitted through the Playground to an older version of the InstructGPT models, delivered in January 2021. Higher marks were given to responses that they thought were more in keeping with the prompt writer’s apparent intention. In contrast, the responses that contained sexual or violent language, disparaged a specific group of individuals, stated an opinion, and so on were given a lower score.
After collecting the responses, the research team used the feedback as an incentive in reinforcement learning from human feedback (RLHF), which ‘instructed’ InstructGPT to respond to prompts in ways that the judges favored. RLHF was originally created to teach AI how to drive robots and defeat human players in video games, but it’s now being used to fine-tune large language models for NLP tasks like summarizing essays and news stories.
The researchers observed that users of its API preferred InstructGPT over GPT-3 more than 70% of the time on the basis of the prompts provided during experimentation. The researchers also tested different-sized versions of InstructGPT and discovered that, although being more than 100 times smaller, users still favored the replies of a 1.3 billion-parameter InstructGPT model to those from the 175 billion-parameter GPT-3 model.
While the preliminary results look convincing, as they tend to chase the notion that alignment in AI can be achieved by building small language models, there are some limitations. For starters, OpenAI highlighted that InstructGPT has not yet solved The Alignment Problem. While measuring the InstructGPT’s “hallucination rate,” the company’s researchers found that it can make up information half (21%) as often as GPT-3 models (41%). It can also introduce an “alignment tax”: aligning the models only on consumer tasks might cause them to perform poorly on other academic NLP tasks.
Read More: OpenAI Improves the Factual Accuracy Of GPT-3 Language Models
InstructGPT continues to make minor mistakes, resulting in replies that are sometimes irrelevant or incomprehensible. If you offer it a prompt with a falsehood in it, for example, it will accept it as true. It will still occasionally defy an instruction or say something unpleasant, as well as produce violent language and misleading information.
However, for the time being, OpenAI is confident that InstructGPT is a safer bet than GPT-3! Meanwhile, OpenAI believes that RLHF may be used to limit toxicity in a variety of models, not just pure language models. For the time being, RLHF is confined to language models, leaving the toxicity problem in multimodal models unsolved.