Koo Introduces BERT-based Pretrained Model KooBERT

The model can be used to perform downstream tasks like content classification, toxicity detection, and more for supported Indic languages.

By Sahil Pawar

May 23, 2023

Koo BERT based pretrained model KooBERT — Image Credits: Koo

Koo has introduced KooBERT, a masked language model trained on data from the multilingual micro-blogging social media platform Koo India. This BERT based pretrained model was built in collaboration with Koo India and AI4Bharat.

In his LinkedIn post, Head of Machine Learning & AI Harsh Singhal said, “KooBERT is a testament to our commitment to inclusivity, diversity, and multilingualism in Al. Trained on a large corpus of Koos 10+ Indian languages, it’s a significant leap forward in democratizing Al for millions of non-English speakers in India and around the world.”

On the Koo platform, there are microblogs (Koos) which are limited to 400 characters and available in multiple languages, including assamese, Bengali, English, Gujarati, Marathi, Oriya, Punjabi, Tamil, Hindi, Kannada, Malayalam, and Telugu. The model was trained on a dataset that contains multilingual koos from Jan 2020 to Nov 2022 on masked language modeling tasks.

This model can be used to perform downstream tasks like toxicity detection, content classification, and more for supported Indic languages. It can also be used with sentence-transformers library for the creation of multilingual vector embeddings for other uses.

As with any machine learning model, KooBERT does have limitations and biases. This model has been trained on Koo Social Media data and may not generalize well for other domains. It is also possible that the model may have biases in the data it was trained on, which can affect its predictions.

Koo Introduces BERT-based Pretrained Model KooBERT

LEAVE A REPLY Cancel reply

Most Popular

Data Structures: A Beginner’s Guide to Organizing Information Efficiently

Unlocking Tomorrow: The Future of Artificial Intelligence and Its Impact on Our Lives

Unlocking the Power of Amazon Cloud Services: A Comprehensive Guide to Boost Your Business

Koo Introduces BERT-based Pretrained Model KooBERT

Subscribe to our newsletter

RELATED ARTICLES

Grok 4: xAI’s Boldest AI Model Yet Brings Voice, Vision, and Reasoning to the Forefront

Perplexity’s Comet Browser Redefines AI-Powered Browsing with Agentic Search

Gemini Adds AI Magic: Turn Your Photos Into Videos with Google’s Latest Tool

LEAVE A REPLY Cancel reply

Most Popular

Data Structures: A Beginner’s Guide to Organizing Information Efficiently

Unlocking Tomorrow: The Future of Artificial Intelligence and Its Impact on Our Lives

Unlocking the Power of Amazon Cloud Services: A Comprehensive Guide to Boost Your Business