Microsoft has launched its Phi-2 small language model (SML), an AI program specialized in text-to-text tasks. Microsoft’s official account on X states that this model is compact enough to operate seamlessly on laptops or mobile devices.
Phi-2, equipped with 2.7 billion parameters (connections between artificial neurons), showcases performance akin to significantly larger models such as Meta’s Llama 2-7B, which contains 7 billion parameters, and Mistral-7B, another model boasting 7 billion parameters.
In Microsoft’s official blog post, Phi-2 is seen as a pursuit of smaller-scale language models to match the capabilities of larger ones. The key strategies of researchers include prioritizing high-quality training data focusing on textbook-quality content and synthetic datasets for common sense reasoning and general knowledge. They also enrich their dataset with meticulously selected web content emphasizing educational value.
Read More: Ranjani Mani is Microsoft’s New AI Director
Moreover, Microsoft innovatively leveraged knowledge transfer from Phi-1.5, a 1.3 billion parameter model, embedding its insight into the 2.7 billion parameter Phi-2. This technique not only expedites training but also substantially elevates Phi-2’s benchmark performance.
Going more into the model’s technicality, Phi-2 operates on a next-word prediction objective and underwent training on a massive 1.4 trillion tokens sourced from web datasets focusing on NLP and coding. Its training spanned 14 days, utilizing 96 A100 GPUs.
Fig 2: Safety scores computed on 13 demographics from ToxiGen. A subset of 6541 sentences are selected and scored between 0 to 1 based on scaled perplexity and sentence toxicity. A higher score indicates the model is less likely to produce toxic sentences compared to benign ones. (Source: Microsoft)
This model, notably a base version, did not undergo alignment through reinforcement learning from human feedback or instruct fine-tuning. Despite this absence of additional refinement, researchers observed Phi-2 displaying improved behavior regarding toxicity and bias compared to existing open-source models that underwent alignment processes. (See Figure 2)