NVIDIA Introduces a Miniaturized Version of Mistral NeMo 12B

Mistral-NeMo-Minitron 8B is a scaled-down form of the Mistral NeMo 12B model. It was prepared by reducing the parameter size from 12 billion to 8 billion through pruning. The model was then distilled to enhance its accuracy using NVIDIA NeMo, a platform for developing generative AI applications.

By Analytics Drift

August 30, 2024

On August 21, 2024, NVIDIA announced the release of Mistral-NeMo-Minitron 8B, a small language model(SLM), and a miniature version of the earlier released Mistral NeMo 12B model. NVIDIA unveiled the Mistral NeMo 12B model, a cutting-edge LLM, on July 18, 2024. It was developed through a collaboration between NVIDIA and Mistral AI. The model can be deployed in enterprise applications to support chatbots, summarization, and multilingual tasks.

To learn more about Mistral-NeMo-Minitron 8B, click here.

Mistral-NeMo-Minitron 8B, with 8 billion parameters, is a scaled-down version of Mistral NeMo 12B, which has 12 billion parameters. It is a small language model, a specialized AI model trained on datasets smaller than those used for LLMs.

SLMs are usually curated to perform specific tasks like sentiment analysis, basic text generation, and classification. They can run in real-time on workstations and laptops. Small organizations with limited resources for LLM infrastructure can easily deploy SLMs to leverage generative AI capabilities at lower costs.

Mistral-NeMo-Minitron 8B is small enough to run on an NVIDIA RTX-powered workstation. At the same time, it excels across various benchmarks for virtual assistants, chatbots, coding, and education applications.

Bryan Catanzaro, NVIDIA’s vice president of applied deep learning research, said, “We have combined two AI optimization methods in this model. One is pruning to reduce parameters from 12 billion to 8 billion, and the other is distillation to transfer learnings of the Mistral NeMo 12B model to the Mistral-NeMo-Minitron 8B model. This helps the model to deliver accurate results similar to LLM at lower computational costs.”

The model development team first performed the pruning process, which condenses the size of the neural network by removing model weights that contribute the least to its accuracy. The pruned model was then retrained during distillation to compensate for the reduction in accuracy caused by pruning.

The distillation process for the Mistral-NeMo-Minitron 8B model has been performed using NVIDIA NeMo, a platform for developing generative AI applications. Developers can further compress this model for smartphone use by performing distillation and pruning using NVIDIA AI Foundry. This compressed model is built using a fraction of parent models’ training data and infrastructure but offers high accuracy.

NVIDIA has emerged as a significant player among the companies offering AI services. Its products, especially AI chips, are increasingly being adopted for various applications, resulting in a surge in the company’s share value by 170% in the current year. With the launch of Mistral-NeMo-Minitron 8B, NVIDIA’s strategy to diversify its AI services will gain further momentum.

NVIDIA Introduces a Miniaturized Version of Mistral NeMo 12B

LEAVE A REPLY Cancel reply

Most Popular

How Telematics Data Is Driving Greater Efficiency in Freight Transportation

NVIDIA Introduces a Miniaturized Version of Mistral NeMo 12B

Subscribe to our newsletter

RELATED ARTICLES

Yann LeCun Launches AMI Labs

GitHub CEO Thomas Dohmke Resigns to Return to Startup Life

Google Rolls Out Deep Think in Gemini App to Power Ultra‑Reasoning AI

LEAVE A REPLY Cancel reply

Most Popular

How Telematics Data Is Driving Greater Efficiency in Freight Transportation