Introducing Phi-2, Microsoft’s Small Language Model

www.analyticsdrift.com Image source: Analytics Drift

[{"selector":"#anim-97366af7-b8b5-4f90-a1f4-13bdf6eb21ad","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-134cbd06-dbb5-4ea7-9759-06b903572bb2","keyframes":{"transform":["translate3d(0px, 210.24685%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-2f003e68-488a-474b-9b9e-8c0635eedd7d","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Microsoft has launched its Phi-2 small language model (SML), an AI program specialized in text-to-text tasks. Image source: Microsoft

[{"selector":"#anim-88b531df-45bf-4766-9084-50536e63cf8b","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-1fb81c58-3f0b-49b6-ae9f-0b1c2429680b","keyframes":{"transform":["translate3d(0px, 219.13575%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-096b8f36-81a9-4f3f-9c74-4cf0b698ac9c","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Microsoft’s official account on X states that this model is compact enough to operate seamlessly on laptops or mobile devices. Image source: Canva

[{"selector":"#anim-ebe900ba-b52d-40b5-baf1-babcf8f5f83c","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-5eb789e1-8adb-43d1-970d-4b0cebf07eae","keyframes":{"transform":["translate3d(0px, 145.06176%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-13008c31-8712-4ba8-9889-c1b86b2e803f","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Phi-2, equipped with 2.7 billion parameters (connections between artificial neurons), showcases performance akin to significantly larger models such as Meta’s Llama 2-7B, which contains 7 billion parameters, and Mistral-7B, another model boasting 7 billion parameters. Image source: Canva

[{"selector":"#anim-b5b84948-90c5-4d8f-a774-51c4b835b9b1","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a84109d2-8116-4220-98ae-9ea67b86e6f0","keyframes":{"transform":["translate3d(0px, 189.48412%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e06d7005-dad9-4d8d-85ac-3b1dbb9703f8","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] In Microsoft's official blog post , Phi-2 is seen as a pursuit of smaller-scale language models to match the capabilities of larger ones. Image source: Microsoft

[{"selector":"#anim-dff139e9-9205-4d16-9672-720b90ce75db","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-815898b0-177d-47ac-ad51-5c3a7f27f6f9","keyframes":{"transform":["translate3d(0px, 149.82076%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-afa27a57-68ea-4d00-94c0-3a2e09bbfc42","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] The key strategies of researchers include prioritizing high-quality training data, focusing on textbook-quality content and synthetic datasets for common sense reasoning and general knowledge. Image source: canva

[{"selector":"#anim-6da7552e-6403-4f36-a215-3edda5abc75b","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-98b12930-943b-4e8d-a09b-f40da9c20f12","keyframes":{"transform":["translate3d(0px, 259.15021%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-b70ad337-4187-4aa6-b092-382a2c6e9013","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] They also enrich their dataset with meticulously selected web content emphasizing educational value. Image source: canva

[{"selector":"#anim-f590324e-18ce-4440-8fe2-97a96716fc12","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-cc9b0543-426c-4b67-9859-e785a5bf7769","keyframes":{"transform":["translate3d(0px, 137.99106%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-ac5b2a4e-3ce7-4ada-87f7-52aa68d01427","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Microsoft innovatively leveraged knowledge transfer from Phi-1.5, a 1.3 billion parameter model, embedding its insight into the 2.7 billion parameter Phi-2. This technique not only expedites training but also substantially elevates Phi-2’s benchmark performance. Image source: Microsoft

[{"selector":"#anim-0d298616-5737-4539-8fa3-6c77b9c6bd2e","keyframes":{"opacity":[0,1]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-9828221e-b042-47b2-a20a-3008cf56b318","keyframes":{"transform":["translate3d(0px, 156.91755%, 0)","translate3d(0px, 0px, 0)"]},"delay":500,"duration":1500,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-29b623ee-81c0-4a6a-832c-234afa63072a","keyframes":{"opacity":[0,1]},"delay":600,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Going more into the model’s technicality, Phi-2 operates on a next-word prediction objective and underwent training on a massive 1.4 trillion tokens sourced from web datasets focusing on NLP and coding. Image source: Github Read more

Instagram

[{"selector":"#anim-2ba44288-5d54-4222-8bc3-cafb7a90490e","keyframes":{"transform":["translate3d(-115.92356%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-990ca711-fe99-4002-a269-ea30567dbe85","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-b1b5cdb3-630c-4b86-a7c0-6db30c33983b","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}] [{"selector":"#anim-6d4f3c45-08f0-41fd-add5-ba6a5f3a3edf","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-a0692d99-a6a7-483c-b333-82140ab4a146","keyframes":{"transform":["translate3d(153.49999%, 0px, 0)","translate3d(0px, 0px, 0)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-b5b96e6a-d51a-439e-b98d-00e3646a4928","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-abba2f9c-8b06-4d41-81f2-b836176a1630","keyframes":{"transform":["scale(0.15)","scale(1)"]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"forwards"}] [{"selector":"#anim-b0a1aa41-01f5-4958-9ad6-7d8d9dd834bf","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-b7b64fa9-53bb-43ed-b1a0-a9a7b09f50d1","keyframes":{"opacity":[0,1]},"delay":200,"duration":1500,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] @analyticsdrift Produced by: Boudhayan Ghosh Designed by: Prathamesh Follow Us Now

Introducing Phi-2, Microsoft’s Small Language Model

Instagram

Follow us on

Don't Miss Out on the

Latest in AI and Data Science