Tuesday, July 23, 2024
HomeNewsKoo Introduces BERT-based Pretrained Model KooBERT 

Koo Introduces BERT-based Pretrained Model KooBERT 

The model can be used to perform downstream tasks like content classification, toxicity detection, and more for supported Indic languages.

Koo has introduced KooBERT, a masked language model trained on data from the multilingual micro-blogging social media platform Koo India. This BERT based pretrained model was built in collaboration with Koo India and AI4Bharat.

In his LinkedIn post, Head of Machine Learning & AI Harsh Singhal said, “KooBERT is a testament to our commitment to inclusivity, diversity, and multilingualism in Al. Trained on a large corpus of Koos 10+ Indian languages, it’s a significant leap forward in democratizing Al for millions of non-English speakers in India and around the world.”

On the Koo platform, there are microblogs (Koos) which are limited to 400 characters and available in multiple languages, including assamese, Bengali, English, Gujarati, Marathi, Oriya, Punjabi, Tamil, Hindi, Kannada, Malayalam, and Telugu. The model was trained on a dataset that contains multilingual koos from Jan 2020 to Nov 2022 on masked language modeling tasks.

Read More: OpenAI Closes $300 Million Funding Round Between $27-$29 billion Valuation

This model can be used to perform downstream tasks like toxicity detection, content classification, and more for supported Indic languages. It can also be used with sentence-transformers library for the creation of multilingual vector embeddings for other uses.

As with any machine learning model, KooBERT does have limitations and biases. This model has been trained on Koo Social Media data and may not generalize well for other domains. It is also possible that the model may have biases in the data it was trained on, which can affect its predictions. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Sahil Pawar
Sahil Pawar
I am a graduate with a bachelor's degree in statistics, mathematics, and physics. I have been working as a content writer for almost 3 years and have written for a plethora of domains. Besides, I have a vested interest in fashion and music.


Please enter your comment!
Please enter your name here

Most Popular