Researchers at Google AI in ‘Unifying Language Learning Paradigms’ have presented a language pre-training paradigm called Unified Language Learner (UL2) that focuses on improving the performance of language models across datasets and setups worldwide.
Some of the most common paradigms to build and train language models use autoregressive decoder-only architectures such as PaLM or GPT-3, where the model is trained to predict the next word for a given phrase. Whereas other models, such as T5, ST-MoE span corruption-based encoder-decoder architectures. However, there remains an opportunity to create a practical unified framework for pre-training models.
According to the company’s blog, the UL2 forms different objective functions for training language models as denoising tasks, in which the model has to recover missing sub-sequences of a given input.
Furthermore, a novel mixture of denoisers is used during pre-training, which samples from various objectives, each with different configurations. The team then demonstrates the models trained using the framework in various language domains that include models fine-tuned for down-stream tasks and prompt-based few-shot learning.
According to Google AI, UL2 demonstrates superior performance on many fine-tuning and few-shot tasks. UL2 excels in the generation, language understanding, retrieval, long-text understanding, and question-answering tasks. Google AI publicly releases checkpoints of their best-performing UL2 model with 20 billion parameters, which can inspire faster progress in developing better language models in the machine learning community.