Meet OpenHathi, the first LLM Chatbot Superior in Hindi

Image source: SaravamAI

Indian AI startup, Sarvam AI, has introduced OpenHathi-Hi-vo.1, representing the inaugural release within the OpenHathi series of large language models.

Image source:

The model expands upon the powerful Llama2-7B and boats performance similar to GPT-3.5 (sometimes even surpassing), specifically tailored for Indic languages.

Image source: Llama

OpenHathi notably expanded the Llama2-7B tokenizer by adding 48,000 more tokens. This is possible as a result of a meticulous two-phase training process.

Image source:

Initially, the focus lies on embedding alignment, a method that strategically aligns the initial random Hindi embeddings.

Image source: Canva

Following this is the bilingual language modeling phase, which educates the model on how to handle different languages attentively across tokens.

Image source: Canva

Sarvam AI’s rigorous assessments cover not just standard Natural Language Generation tasks but also practical, real-world challenges.

Image source: Canva

These evaluations, comparing OpenHathi against GPT-3.5 with GPT-4 as the referee, consistently highlight OpenHathi’s superior performance in Hindi.

Image source:

This collaboration saw Sarvam AI teaming up with academic partners from AI4Bharat, bringing in crucial language resources and benchmarking knowledge.

Image source: Canva

Moreover, the model’s refinement was a result of collaboration with KissanAI, utilizing conversational data derived from a bot engaging with farmers in diverse languages.

Image source: Canva

Pratyush Kumar and Vivek Raghavan, the founders of Sarvam AI, initiated this venture in July 2023. They received $41 million in Series A funding.

Image source: Linkedin

Get the latest updates on AI developments


Join our

Channel Now!

Produced by: Boudhayan Ghosh Designed by: Prathamesh