NVIDIA Unveils Mistral-NeMo-Minitron 8B, a Miniaturized Version of MistralAI NeMo 12B Model

www.analyticsdrift.com Image source: Analytics Drift

NVIDIA launches Mistral-NeMo-Minitron 8B model

[{"selector":"#anim-be80ec39-ed04-4cd4-9247-6872dcc02af3","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-6d2dca12-cf26-4016-88a9-5d343b8777ad","keyframes":{"transform":["translate3d(0px, 151.31467%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-54102378-8dd5-45dc-ad21-50d3ac907dc9","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4cb5c094-14b2-4eaa-a179-328e1d05ec53","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] After releasing the Mistral NeMo 12B model, in collaboration with Mistral AI, NVIDIA has now introduced its compressed version, Mistral-NeMo-Minitron 8B, a small language model. Image source: NVIDIIA

MistralAI NeMo 12B: A cutting-edge enterprise AI model

[{"selector":"#anim-4cf74b72-3cca-4449-aaa6-3cee520982c5","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-e6f0827e-dc17-4631-bf90-511d2f11147a","keyframes":{"transform":["translate3d(0px, 172.68094%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-1cc11684-3dff-4575-a6fd-e463a56f7c95","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-d46e7c10-60d3-497f-b537-e81fe1b0397d","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Mistral Nemo 12B, a 12-billion-parameter LLM, was developed for enterprise apps supporting chatbots, summarization, or multilingual tasks. Image source: NVIDIIA

What is a small language model (SLM)?

[{"selector":"#anim-e3da48a0-e444-4f73-9fe1-0930b0c461f3","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-3aee48c7-77a3-48e7-87c5-47f3296cda95","keyframes":{"transform":["translate3d(0px, 172.68094%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-7fa53d7e-84ab-4414-b853-c4064ec53a60","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-613b78ff-1aac-4a73-a1ca-511a3ef08501","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] An SLM is a specialized AI model trained on a smaller dataset than LLMs. It performs specific tasks, such as marketing automation or customer service. Image source: Canva

Pruning and Distillation for Model Optimization

[{"selector":"#anim-f882fcd0-074f-4ef0-a2d9-f6f40cbf89cc","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-cde01fad-6328-40c9-81bf-71b02e6807f4","keyframes":{"transform":["translate3d(0px, 158.94829%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-d017f453-f64c-4056-864a-ef9804c6d583","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-89541555-0966-419c-a7fb-320193181644","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Bryan Catanzaro, NVIDIA's VP for applied deep learning research, said, “We have used pruning to shrink 12 billion parameters to 8 billion. Distillation was used to improve model accuracy.” Image source: NVIDIA

Leveraging NVIDIA RTX and NeMo for optimized performance

[{"selector":"#anim-0306de8f-5d42-495d-8935-71a064381ba5","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-25dd015e-746e-464e-a8ad-19a89b1be290","keyframes":{"transform":["translate3d(0px, 162.00173%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-ea9bdcfe-850b-4d5b-bfc2-9e7ae2c7d817","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-bac5619a-ba77-430a-87b3-289224201ce9","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Mistral-NeMo-Minitron 8B runs on an NVIDIA RTX-powered workstation and uses NVIDIA NeMo, a GenAI application development platform for distillation, to transfer learnings from LLM to SLM. Image source: NVIDIA

Why Mistral-NeMo-Minitron 8B was developed?

[{"selector":"#anim-62632623-f32c-478e-baf9-fccd744a1b38","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-4e1f2b5f-230a-48ca-823f-c9361b21d037","keyframes":{"transform":["translate3d(0px, 171.76352%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-fdc0f30e-2f59-4bda-9ed4-d1b5f5b9965b","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-caae5409-80c6-4c5b-831f-b66a06a9f762","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Small enterprises can deploy SLMs with limited resources, achieving LLM-like accuracy at lower costs. Mistral-NeMo-Minitron 8B was developed with this goal in mind. Image source: NVIDIA

Customization using NVIDIA AI Foundry

[{"selector":"#anim-0c150288-978e-426b-a87f-4cd3ca265abb","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-86f689eb-c871-4d31-8308-9f8dc55a0488","keyframes":{"transform":["translate3d(0px, 171.76352%, 0)","translate3d(0px, 0px, 0)"]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.2, 0.6, 0.0, 1)","fill":"both"}] [{"selector":"#anim-65d15de8-2eaf-4076-8075-79c395fcce68","keyframes":{"opacity":[0,1]},"delay":120,"duration":1300,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] [{"selector":"#anim-f6ac0b86-46eb-45c5-9a69-6a1da08e2722","keyframes":{"opacity":[0,1]},"delay":120,"duration":1200,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}] Developers can further compress the Mistral-NeMo-Minitron 8B model to run it on smartphones using NVIDIA AI Foundry, which facilitates pruning and distillation. Image source: NVIDIA Read more