OpenAI has developed a new model to study the alignment problem in AI. This model was published in the ‘Recursively Summarizing Books with Human Feedback’ paper, and it can summarize entire books. The OpenAI ML model can summarize entire books by summarizing each chapter or a small portion of the book and then summarizing the summaries. In the end, you get a high-level overview of the entire book.
To have safe general-purpose artificial intelligence in the future, researchers have to ensure that their models work as per the intended intentions. The challenge of designing AI systems that do the right thing is called the alignment problem. This challenge is not about AI trying to figure out the right thing. Rather, it’s about the AI system choosing to do the right thing.
The OpenAI ML model can summarise entire books to find a scalable solution to the alignment problem. The current model has been trained from the GPT-3 language model, primarily fiction books with over 100K words on average. Researchers skipped non-narrative books and chose only narrative texts because their sentences have very low-level descriptions and are harder to summarize.
OpenAI’s new ML model builds on the company’s previous research. It used reinforcement learning from human feedback while training a model that helped align summaries with people’s preferences on short posts and articles. However, this method isn’t successful for larger pieces of text, like books. To build a scalable version, OpenAI’s team combined recursive task decomposition with reinforcement learning, which breaks up a complex task into smaller ones. This breakdown of a laborious task allows humans to evaluate the model’s summaries faster since they will check the summaries of smaller parts of a book. Moreover, recursive task decomposition allows the AI system to summarize a book of any length, from hundreds of pages to thousands.
To compare human-written and model summaries, OpenAI assigned two labelers to read and summarize 40 of the most popular books of 2020, according to Goodreads. Next, the labelers had to rate one another’s summaries besides that of the AI models. On average, human-written summaries received a 6/7 rating. Furthermore, summaries of the model received a 6/7 rating 5% of the time and a 5/7 rating 15% of the time. Some summaries of the model even matched human-written summaries, and the entire set is available here.
They trained the 175B model with standard cross-entropy behavioral cloning (BC) and reinforcement learning (RL). OpenAI’s team evaluated two model sizes: 175B parameters and 6B parameters of the GPT-3 model. For each size, they also evaluated three different modes of training: RL on the whole tree, RL on the first subtree, and BC for the entire tree. Also, for each policy, they generated three summaries to reduce error bars.
OpenAI also tested their model on the BookSum dataset and NarrativeQA Reading Comprehension Challenge as well. On the BookSum dataset, the 175B models beat all non-oracle baselines on ROUGE by 3-4 points, and 6B models are comparable to the baseline on ROUGE. They both significantly outperformed all baselines on BERTScore, including an 11B T5 model.
A great way of checking the accuracy of book summaries is to test whether they can answer questions about the original text. To test this, the model was applied to the NarrativeQA question answering dataset. It comprises question/answer pairs about movie transcripts and full book texts that come from Wikipedia summaries. Researchers checked whether the model’s summary could be used as input (instead of the full book or movie text) for the (QA) model. The depth 1 summaries worked better despite the model not being trained explicitly for question answering.
OpenAI’s primary interest in this work is to empower humans to give feedback to models that are very difficult to evaluate. The researchers comprehend that the lack of human feedback is critical in solving the alignment problem. Unless humans can communicate their values to AI, they can’t take on societally relevant tasks. The research shows that models like abstractive book summarization can be trained using human feedback by leveraging task decomposition. It also found that summaries by the RL models were more efficient than supervised learning and behavioral cloning models.