Home Blog Page 16

Top Robots in India in 2024

The Indian government has continuously invested in cutting-edge technologies to promote robotics and enhance industrial productivity. In 2021, the sales of industrial robots in India surged by 54% and reached a new record, with 4,945 units installed. As a result, the World Robotics Report, released by the International Federation of Robotics (IFR), ranked India 10th in annual robot installations in 2022.

In addition to industries, robots are used extensively in many other sectors, such as medicine, education, agriculture, and hospitality. Let us look at some of the top robots that are changing the picture of robotics in India in 2024. 

Top Robots in India in 2024

Here is a list of top 10 robots in India in 2024:

1. IRIS

IRIS is India’s first AI-powered humanoid teacher robot. Developed by Makerlabs, it was first deployed at a school in Kerala in March 2024. The robot is created using generative AI technologies and delivers educational content from preschool to high school through an Android app. 

IRIS has a 4-wheel chassis and 5 degrees of freedom, allowing it to move freely and demonstrate various learning activities. It uses visual aids, games, and quizzes to make learning more interactive. It also has a voice assistant that can comprehensively answer questions asked by students.                            

2. KARMI-Bot

KARMI-Bot is a robot built by Asimov, a Kerala-based robotics company. It is used extensively in the healthcare sector to protect healthcare workers from viral infections. The robot can navigate in isolation wards independently to deliver food and medical kits to infected patients. This minimizes direct interaction between medical staff and patients. 

KARMI-Bot has a transportation speed of 1m/sec and can be monitored remotely through video streaming. It was used in some government hospitals in India during the COVID-19 outbreak. Asimov also developed a variant of the KARMI-Bot called KARMI-CLEAN to facilitate the disinfection of large areas using UV light. 

3. Manav

Manav is India’s first 3D-printed humanoid robot, developed by A-SET Training and Research Institute, Delhi. It is a two-foot-tall humanoid designed primarily for research purposes. The outer body of Manav is designed using the 3D printing technique and is made up of acrylonitrile butadiene styrene (ABS). 

Manav has two degrees of freedom in the head and neck to facilitate sideways and up-and-down movement. It supports WiFi and Bluetooth connectivity and uses binocular vision processing to gain an in-depth perspective. It can also walk, talk, and dance in response to human voice commands. 

4. Athena

Athena is a surveillance robot developed by an Indian company named Kody Technolabs. It is highly vigilant and can be used for round-the-clock security of spaces such as shops, malls, or industries. The robot has powerful vision because its ultra-high-definition cameras enable it to see even in low-light conditions.

Athena has cognitive security capabilities allow it to differentiate threats from harmless instances by analyzing its surroundings. It has facial recognition technology and can identify suspects or offenders. It also has an instant alert mechanism with a two-way intercom system that helps prevent crime before it can be committed. 

5. DRDO Daksh

Daksh is a remote-controlled robot created by the Defence Research and Development Organization (DRDO). It can safely track, handle, and destroy hazardous objects such as bombs. Daksh is fully automated and can navigate through staircases, steep slopes, and narrow corridors to reach the target object. 

After reaching its target, it lifts the suspicious object and scans it using a portable X-ray device. If it is a bomb, Daksh defuses it with its water jet disrupter. It also has a shotgun that can break locked doors and scan explosives in vehicles. With such a high functionality, Daksh can be dubbed an anti-terror robot that can help our nation fight terrorism. 

6. Mitra

Mitra is a humanoid robot created by Invento Robotics, a Bengaluru-based startup. It was launched in 2017 at the Global Entrepreneurship Summit. The robot became famous as it greeted Indian PM Narendra Modi and former US President Donald Trump at this summit. Mitra is five feet tall and was created with high-performance capacity within a year. 

Since its appearance, Mitra has found applications in various areas, such as banks, movie theatres, malls, airports, and hospitals. After its success, Invento Robotics also launched upgraded versions, Mitra 2 and Mirta 3. These variants have more robust facial recognition features and can interact effectively with humans. 

7. RADA

RADA is an AI-powered robot developed by Vistara Airlines, a joint venture of Tata Sons and Singapore Airlines, to assist airport passengers. It was first deployed at Delhi’s Indira Gandhi International Airport in 2018. RADA can scan boarding passes and provide passengers with information about the weather conditions of the destination city and real-time flight status. 

RADA can rotate 360 degrees as it is built on a chassis of four wheels. It also has three built-in cameras that enable it to interact cognitively with the air passengers using its effective voice technology. 

8. SSi Mantra

SSi Mantra is a robotic surgeon developed by SS Innovations. It was designed to make surgeries efficient and cost-effective. The first telesurgery performed by SSi Mantra was a robotic cholecystectomy conducted over a distance of five kilometers.  

Since its launch, SSI Mantra has assisted surgeons in several surgeries and has become a popular healthcare robotic system in other countries. SSI Innovations recently unveiled SSI Mantra 3, an upgraded version of its predecessors. It has a 3D HD headset and 4K vision to enable surgeons to conduct and monitor surgeries efficiently from a distance.

9. IRA

IRA (Interactive Robotic Assistant) is a humanoid robot developed by Asimov Robotics. HDFC Bank first deployed it to assist bank staff in serving customers. The robot can greet customers and guide them to the relevant counters to perform their desired banking operations. 

The bank later launched IRA 2.0, an upgraded version of IRA in collaboration with Invento Markerspaces and Senseforth Technologies. It can answer banking FAQs and has voice-based navigation capabilities to guide customers through various counters. It also recognizes customers using its facial recognition algorithm. 

10. PuduBot

PuduBot is a robot developed by Pudu Robotics, a service robotics company. It is used for smart delivery services in the hospitality sector, makes intelligent voice announcements while marketing, and provides essential amenities to patients in the healthcare sector. 

PuduBots can immensely improve the operational efficiency of any organization by taking care of goods and services that are delivered. This allows workers to direct their efforts on product development and marketing. They are durable bots as they can work 24 hours with just four hours of battery charging. Thus, PuduoBot is a highly reliable and durable solution that can enhance the operability of any industry. 

11. BRABO

BRABO, a short name for ‘Brave Robot,’ is an articulated robot developed by Tata Group with MSMEs as the focus domain. It was designed by TAL, styled by TAL Elxsi, manufactured by Tata AutoComp, and financed by Tata Capital. BRABO was launched in 2017 and is the first ‘Made in India’ robot.

Articulated robots have rotary joints and can mimic human arm movements. Thus, BRABO can perform various industrial tasks, such as sorting with a vision system, press and machine tending, picking, packing, sealing, and welding. It is used extensively in electronics, logistics, food packaging, and the pharmaceutical industry. 

12. Milagrow Robots

Milagrow Humantech, a robotics company in India, manufactures Milagrow Robots at affordable prices. It provides a wide range of products, namely Milagrow iMap, Window Seagull, and RoboTiger, for various domestic purposes, such as floor cleaning, lawn mowing, and pool cleaning. 

Milagrow also manufactures educational robots that enhance students’ learning experiences by helping them learn STEM concepts and cognitive skills. Its body massaging robots facilitate health care. 

Future of Robotics in India

With the emergence of artificial intelligence, the Indian robotics industry is expected to be dominated by AI-powered robots in 2024. Generative AI, a subset of AI, is used globally to program robots. It will help developers focus more on research and development instead of investing much time in coding. 

Predictive AI helps analyze robot performance trajectory, saving time and resources required for infrastructure maintenance. In the coming time, you will also see increased human-robot collaboration through cobots that assist humans in performing repetitive or hazardous tasks in industries. 

Mobile manipulator, or MoMa, is a combination of mobile elements of robotic systems such as wheels and robotic arms as manipulators. It excels in operations and infrastructure maintenance in heavy industries. 

Another significant trend that will see growth is the assimilation of humanoid robots into daily life. According to research by Goldman Sachs, the humanoid robot market may increase to $38 million by 2035. They have cited rapid enhancement in AI and the affordability of robotic components as the main reasons behind this positive trend. 

Way Forward

India’s robotic landscape is growing rapidly across various sectors, reflecting the country’s increasing aptitude for innovation. The twelve robots featured in this article highlight the numerous ways in which robots can be integrated into our lives. 

From defense to medicine to education, robots are becoming drivers of human and industrial development in India. With continuous AI and machine learning advancements, India will incorporate even more sophisticated robotic systems in different facets of life and industry.

Advertisement

OpenAI’s Partnership with the U.S. AI Safety Institute

OpenAI partners with the U.S. AI Safety Institute to provide early access to its next-generation AI model. This collaboration aims to address safety risks with AI by prioritizing responsible practices in AI development. 

OpenAI has taken a significant step to improve AI safety by providing the U.S. AI Safety Institute early access to its next AI model. OpenAI CEO Sam Altman recently announced on X that the organization will work with the government’s executive body for better safety evaluation. 

In May, OpenAI faced criticisms over its internal security system protocols. The organization dissolved its safety team, which made headlines because lawmakers raised concerns about it.

Reports said that the company prioritized launching new features over safety. These incidents led to the resignations of important OpenAI employees Jan Leike and Ilya Sutskever.

In response to all these criticisms and allegations, OpenAI said it would remove the clauses from the company’s guidelines that forbade employees from speaking. The company also plans to set up a security commission. 

The company had previously committed to dedicating 20% of its funds to security, but it has not been fulfilled. Sam Altman has pledged to complete this commitment and stated that restrictive terms were removed for all new and old employees. 

OpenAI increased its budget spending on government policies and legislatures compared to the previous year. The previous whole year’s budget was $260,000. In 2024, the half-year budget was $800,000.

The news of the partnership came when the new proposed bill, “The Future of Innovation Act,” was passed. This bill makes the U.S. AI Security Institute responsible for making rules and regulations for the safety of AI. The executive body under the Commerce Department will now work with OpenAI to improve AI security in the future.

This collaboration marks a significant step in addressing AI safety concerns. It highlights the commitment of both OpenAI and the U.S. AI Safety Institute to promote responsible AI development presently and also in the future.

Advertisement

New Course Explores Dual Encoder Models for Semantic Search

Vectara and Ofer Mendelevitch collaborate to offer a short course on enhancing search relevance using advanced embedding techniques.

Semantic search is a technology that enables search engines to understand the underlying intent behind the queries. It is quickly transforming information retrieval by delivering more relevant and accurate search results. 

However, conventional semantic search often falls short in language learning model (LLM) applications, which rely on a single embedding model. This approach retrieves results that resemble the question rather than relevant answers.

To circumvent this, Vectara and Ofer Mendelevitch have partnered to offer a short course, Embedding Models: From Architecture to Implementation, for all data science enthusiasts. 

The course teaches building, training, and deploying dual encoder models using separate embedding models for questions and answers. This significantly improves matching questions with appropriate answers, enhancing overall search relevance.

Read More: Building High-Quality Datasets with LLMs 

The course also covers the concept of word embeddings and their evolution to BERT, where embeddings consider the surrounding context of each word. Learners will also gain hands-on experience using contrastive loss to build a dual encoder model with one encoder trained to embed questions and the other responses.

Lastly, the participants will learn to analyze the impact of dual encoders on search relevance and compare it to retrieval processes using single encoders. This course provides a valuable opportunity for anyone looking to advance their understanding of embedding models and their application in modern search systems.
The best part is that applicants can enroll in this course for free. Here’s the link to apply now!

Advertisement

NVIDIA’s fVDB Transforms Spatial Intelligence for Next-Gen AI

Announced at the SIGGRAPH conference, NVIDIA introduces cutting-edge technology for creating high-fidelity real-time 3D modeling for autonomous systems and climate research.

At SIGGRAPH 2024, NVIDIA introduced fVDB, an advanced deep-learning framework designed to construct incredibly detailed and expansive AI-ready virtual representations of the real world. Built upon the foundation of OpenVDB, an industry-standard library for simulating and rendering complex volumetric data, fVDB has taken a significant leap forward in 3D generative modeling. 

This innovation has opened new doors for industries relying on accurate digital models to train their generative physical AI for spatial intelligence. fVDB effectively converts raw environmental data collected by LiDAR and neural radiance fields (NeRFs) into large-scale virtual replicas that can be rendered in real-time.

With applications spanning autonomous vehicles, optimizing urban city infrastructure, and disaster management, fVDB has a crucial role in transforming robotics and advanced scientific research.  

NVIDIA’s research team put in tremendous effort to develop fVDB. This framework is already being used to power high-precision models of complex real-world environments for NVIDIA Research, DRIVE, and Omniverse projects.

fVDB facilitates high-performance deep learning applications by integrating NVIDIA-powered AI operators, including convolution, pooling, and meshing into NanoVDB, a GPU-optimized data structure for 3D simulations. This enables the development of sophisticated neural networks tailored for spatial intelligence tasks, such as large-scale point cloud reconstruction and 3D generative modeling.

Key features of fVDB include:

  • Larger Scale: It can handle four times larger environments than previous frameworks.
  • Faster Performance: fVDB achieves 3.5 times faster processing speeds than its predecessors.
  • Interoperability: The framework seamlessly handles massive real-world datasets, converting VDB files into full-sized 3D environments.
  • Enhanced Functionality: With ten times more operators than previous frameworks, fVDB simplifies processes that once required multiple deep-learning libraries.

Read more: Harnessing the Future: The Intersection of AI and Online Visibility.

NVIDIA is committed to making fVDB accessible to a wide range of users. The framework will soon be available as NVIDIA NIM inference microservices, enabling seamless integration into OpenUSD workflows and the NVIDIA Omniverse platform. 

The upcoming microservices include:

  • fVDB Mesh Generation NIM: For generating digital 3D environments of the real world.
  • fVDB NeRF-XL NIM: To create large-scale NeRFs within the OpenUSD framework using Omniverse Cloud APIs.
  • fVDB Physics Super-Res NIM: It will be useful in performing super-resolution to create high-resolution physics simulations using OpenUSD.

These microservices will be crucial for generating AI-compatible OpenUSD geometry within the NVIDIA Omniverse platform, which is designed for industrial digitalization and generative physical AI applications.

NVIDIA’s commitment to advancing the OpenVDB is evident through its efforts to enhance this open-source library. In 2020, the company introduced NanoVDB and provided GPU support to OpenVDB, boosting performance and simplifying development. This paved the way for real-time simulation and rendering. 

In 2022, NVIDIA launched NeuralVDB, which expanded NanoVDB’s capabilities by incorporating ML to compress the memory footprint of VDB volumes by up to 100 times. The addition allowed developers, creators, and other users to interact comfortably with extremely large datasets.

NVIDIA is making fVDB available through an early access program for its PyTorch extension. It will also be integrated into the OpenVDB GitHub repository, ensuring easy accessibility to this unique technology.
To better understand fVDB and its potential impact, watch NVIDIA’s founder and CEO, Jensen Huang’s fireside chats at SIGGRAPH. These videos provide further insights into how accelerated computing and generative AI drive innovation and create new opportunities across industries.

Advertisement

Google Launched Gemini 1.5 Flash: Evolving AI Interactions

Google upgraded Gemini by launching 1.5 Flash, which includes a wide array of languages, faster performance, and quick responses. This upgrade will prove advantageous for teenagers and mobile app users.

A few months back, Google announced the release of Gemini 1.5 Flash. This chatbot’s latest iteration promises significant improvements in speed and performance. This feature upgrade is designed to enhance user experience by providing quicker responses. 

Although Gemini 1.5 Flash is a lighter version than Gemini 1.5 Pro, it became a notable upgrade because of its text summarization, image processing, and real-time analytics capabilities. 

With these new features, Gemini now helps users work with diverse data types, such as images, voice, text, PDFs, and others, within a single framework. It supports the handling of images through visual recognition, voice through audio-to-text conversion, and other data types with advanced capabilities.

The most important feature of 1.5 Flash is its accessibility. It is now accessible to users in more than 240 countries and supports more than 40 languages, ensuring that many users can benefit from its advancements. 

Besides these linguistic and global reach innovations, Gemini 1.5 Flash is also accessible to teenagers. Earlier, as per Google’s policies, there were age restrictions for specific users. Now, teens can also use this version for their school/college subjects and projects. 

Google designed Gemini with a strong focus on responsibility and user safety. It has also introduced security policies to ensure the safe use of AI by teenagers and set policies to handle sensitive topics appropriately.

Google’s Gemini 1.5 Flash offers significant benefits in the current digital age. Offering a 1.5 Flash model to free users shows Google’s vision of making AI accessible to maximum people. This move supports innovation and boosts efficiency across various fields. 

The enhanced features, such as accessibility and performance of Gemini 1.5 Flash, will ensure that Google is ready to set new standards in the field of AI.

Advertisement

Google DeepMind Welcomes 2 Billion Parameter Gemma 2 Model

DeepMind took a major leap forward in AI innovation by launching a 2 billion-parameter model for its Gemma 2 family.

As the demand for advanced AI grows, there is a need for models that balance high performance with accessibility across various platforms. Many existing models are too resource-intensive for widespread use, limiting their application to high-end infrastructure.

To address this gap, Google DeepMind has introduced the Gemma 2 2B model to deliver outsized results. This article highlights the significance of the new addition to the Gemma 2 model family. 

Inside Google DeepMind: A Short Glimpse into the Future of Technology

Google DeepMind, a subsidiary of Google, is a cutting-edge AI research lab renowned for deep learning and reinforcement learning tasks. It gained global recognition in 2016 when its AlphaGo program defeated world champions in the Go game. Following this notable achievement, DeepMind has continued to innovate with a series of AI language models, including Gato, Sparrow, Chinchilla, Gemini, and more.  

Gemma: A Game Changer in AI-Language Models

On February 21st, 2024, DeepMind’s Gemma launched with a 7 billion parameter size suitable for desktop computers and small servers. Gemma is a family of lightweight, open-source large language models built on the same research and technology used to create Google Gemini. It is a text-to-text, decoder-only AI model available in English and comes with open weights for instruction-tuned and pre-trained versions. 

Read More: Harnessing the Future: The Intersection of AI and Online Visibility 

Gemma’s second generation, Gemma 2, was released in June. It includes two sizes: 9 billion (9B) parameters for higher-end desktop PCs and 27 billion (27B) parameters for large servers or server clusters. 

To mark a leap forward in AI innovation, DeepMind announced a new 2 billion (2B) parameter version of the Gemma 2 model on July 31st, 2024. The Gemma 2 series’ 2B parameter model is designed for CPU usage and on-device applications. It has a more compact parameter size than the 9B and 27B versions. Still, it can deliver best-in-class performance for various text generation tasks, including question answering, summarization, and reasoning. 

Top-Tier Performance of 2B Gemma 2 Parameter Model 

The 2B Gemma 2 parameter model offers powerful capabilities for the generative AI field. Here are some key highlights:

  • Flexible: The Gemma 2 model, with its 2 billion parameters, can run efficiently on a wide range of hardware platforms. These include data centers, local workstations, laptops, edge computing devices, and cloud platforms with Vertex AI and Google Kubernetes Engine (GKE). 
  • Integration for Streamlined Development: Gemma 2 2B allows you to integrate seamlessly with Keras, Hugging Face, NVIDIA Nemo, Ollama, and Gemma.cpp. It will soon support MediaPipe. 
  • Exceptional Performance: The company claims that Gemma 2 2B outperforms all GPT-3.5 models on the LMSYS Chatbot Arena leaderboard, a benchmark for evaluating AI chatbot performance. 
  • Open Standard: The 2B model is available under commercial-friendly Gemma terms for commercial and research use. 
  • Easily Accessible: The 2B Gemma 2 model’s lightweight design allows it to operate on the free tier of the NVIDIA T4 deep learning accelerator in Google Colab. This makes advanced AI accessible for experimentation and development without requiring high-end hardware. 
  • Improved Efficiency: Gemma 2 2B has been optimized using NVIDIA’s TensorRT-LLM library to improve efficiency and speed during inference. 
  • Continuous Learning through Distillation: The 2B model leverages knowledge distillation, learning from larger models by mimicking their behavior. This allows the new parameter model to achieve impressive performance despite its smaller size. 

A Quick Look At Gemma 2B Model Training, Preprocessing, and Evaluation

The dataset for training Gemma 2 models includes web documents, mathematical text, code, and more. The 2B parameter model was trained on 2 trillion tokens using Tensor Processing Unit (TPU) hardware, JAX, and ML Pathways. To ensure quality, rigid preprocessing methods, such as CSAM filtering and sensitive data filtering, were applied. 

The 2B model was evaluated based on text generation benchmarks, such as MMLU, BoolQ, MATH, HumanEval, and more. It was also assessed for ethics and safety using structured evaluations and internal red-teaming testing methods. 

Gemma 2B Model Intended Usage

  • Text Generation: The 2B model helps in creating various types of content, including poems, scripts, code, marketing materials, email drafts, and so on.
  • Text Summarization: The 2B Gemma 2 model can produce concise summaries for research papers, articles, text corpus, or reports. 
  • Chatbots and Conversational AI: Enhance conversational interfaces for customer service, virtual assistants, and interactive applications.
  • NLP Research: The 2B model provides a foundation for researchers to test Natural Language Processing (NLP) techniques, develop algorithms, and advance the field.
  • Language Learning Tools: The model facilitates interactive language learning, including grammar correct and writing practice.
  • Knowledge Exploration: The Gemma 2B model enables researchers to analyze large text collections and generate summaries or answer specific questions.

New Additions to Gemma 2 Model

DeepMind is adding two new models to the Gemma 2 family. Let’s take a brief look at them:

  • ShieldGemma: It consists of safety classifiers designed to identify and manage harmful content in AI model inputs and outputs. ShieldGemma is available in various sizes; it targets hate speech, harassment, sexually explicit material, and dangerous content.
  • Gemma Scope: Gemma Scope is focused on transparency. It features a collection of sparse autoencoders (SAEs), specialized neural networks that clarify the complex inner workings of the Gemma 2 models. These SAEs help users understand how the models process information and make decisions. There are more than 400 freely scalable SAEs covering all layers of the Gemma 2 2B model.

How to Get Started?

To get started, download Gemma 2 2B from Kaggle, Hugging Face, and Vertex AI Model Garden, or try its features through Google AI Studio

Key Takeaways

Google DeepMind has upgraded the Gemma 2 model with a new 2 billion parameter version. Released on July 31st, 2024, this model is designed for on-device applications, offering efficient performance in tasks like text generation, summarization, and reasoning. It operates well on diverse hardware platforms, including local workstations and cloud services. The Gemma 2 2B model is optimized with NVIDIA’s TensorRT-LLM library and utilizes model distillation for improving performance.

Advertisement

OpenAI Enhances ChatGPT with Advanced Voice Mode: Talk and Explore

OpenAI’s new voice mode feature transforms ChatGPT for intuitive, real-time interactions with interruption capabilities. It is a significant advancement that enhances the generative AI experience. 

On July 31st, 2024, OpenAI introduced a notable update to its widely adopted generative AI technology, ChatGPT. Once known for its ability to respond to text prompts, ChatGPT now offers more natural, real-time voice conversations with its advanced voice mode feature.

The new voice mode capabilities allow you to talk to ChatGPT and get instant responses to your voice prompts without delay. This feature detects and responds based on your emotions and non-verbal cues.

To improve the flow of the conversation, you can even interrupt ChatGPT responses while it is speaking. These significant advancements make your conversations with ChatGPT feel more genuine and engaging than before.  

The new voice mode feature started rolling out to only a small group of ChatGPT-4o (Plus) users. However, Mira Murati, OpenAI’s Chief Technology Officer, announced on X that ChatGPT’s voice mode will be made available to all Plus users very soon.


Read More: OpenAI Unveils DALL-E 3, Latest Version of its Text-to-image Tool DALL-E

The company has tested ChatGPT-4o’s voice features with over 100 external experts across 45 languages. Advanced voice mode is currently available on Android and iOS chat apps. A comprehensive report on GPT-4o’s capabilities, limitations, and safety assessment is expected to be released in the first week of August. 

The company postponed the launch of the realistic voice conversation feature from late June to July. OpenAI said the delay was to improve the AI model’s ability to recognize and decline some content while enhancing user experience and preparing its infrastructure for wider use. 

As the AI industry grows, OpenAI’s effort to integrate advanced voice features aligns with its strategy to stay ahead in the competitive generative market.

Advertisement

Meta Unveils SAM 2 to Enhance AI-enabled Object Segmentation Experience

SAM 2 is trained on the SA-V dataset that contains 51,000 real-world videos and more than 600,000 masklets. This dataset also consists of annotations for whole and partial objects to overcome challenges such as object occlusion, disappearance, or reappearance.

On July 29, 2024, Meta announced the release of its new AI-powered Segment Anything Model 2 (SAM 2) for object segmentation in images and videos. Backed by the success of its predecessor, SAM, which was designed for image segmentation, SAM 2 can detect and segment objects in images and videos.

To know more about Meta’s SAM 2, read here.

Object segmentation is a computer vision technique that separates images and video frames into distinct groups of pixels or segments to identify objects. It is most commonly used for image processing in self-driving vehicles, remote sensing, medical imaging, and document scanning. 

Released under the Apache 2.0 license, users can prompt SAM 2 to segment any object in images and videos, including those it has not seen previously. It has been trained on the SA-V dataset and contains 51,000 real-world videos and more than 600,000 masklets. The ability to track fast-moving and dynamic objects makes SAM 2 suitable for object segmentation in videos. 

Read More: Safeguarding Digital Spaces: The Imperative of Image Moderation

Meta announced that SAM 2 will revolutionize image and video-based content creation as it will simplify editing by automating segmentation using artificial intelligence. It is six times faster than its predecessor and will give users a better immersive experience in augmented reality (AR) and virtual reality (VR) applications.

Keeping up with its vision of open-source AI, Meta has open-sourced SAM 2 and the SA-V dataset on which the model was trained. 

SAM was first introduced in 2023 as an AI model for image object segmentation. It was trained on the SA-1B dataset, which contains 1.1 billion segmentation masks collected from nearly 11 million licensed and secure images.

Since its launch, SAM has become highly popular as a segmentation tool in content creation, medicine, marine sciences, and satellite imagery. The success of SAM motivated Meta to unveil its upgraded version.

The AI landscape is advancing rapidly, and the release of SAM 2 will provide a much-needed push toward developing more efficient media processing tools. Meta’s vision of open-source AI has further raised the expectations of having easier access to more sophisticated AI solutions in the future.

Advertisement

Torchchat a PyTorch’s Library Transforming LLM Inference Across Different Devices

Torchchat, an advancement from PyTorch, enhances capabilities for deploying large language models such as Llama across various devices.

PyTorch introduced Torchchat, a cutting-edge library designed to revolutionize the deployment of large language models (LLMs) like Llama 3 and 3.1. It supports deployment across multiple platforms, including laptops, desktops, and mobile devices.

Torchchat extends its support for additional environments, models, and execution modes and offers functions for export, quantization, and evaluation in an intuitive manner. It delivers a comprehensive solution for developing local inference systems.

This development enables PyTorch to provide a more versatile and comprehensive toolkit for AI deployment. Torchchat provides a well-structured LLM deployment approach that is organized into three key areas. 

For Python, Torchchat features a REST API accessible through a Python CLI or web browser, simplifying developers’ management and interaction with LLMs. In a C++ environment, Torchchat creates high-performance desktop binary using PyTorch’s AOTInductor backend. For mobile devices, it exports .pte binaries for efficient on-device inference.

Read More: Zuckerberg announces PyTorch Foundation to Accelerate Progress in AI Research.

Torchchat has impressive performance metrics across various device configurations. 

On laptops like MacBook Pro M1 Max, Torchchat achieves upto 17.15 tokens per second for Llama 2 using MPS Eager mode with int4 data type. This demonstrates Torchchat’s efficiency on premium laptops. 

On desktops with an A100 GPU on Linux, Torchchat reaches speeds of up to 135.16 tokens per second for Llama 3 in int4 mode. It leverages CUDA for optimal performance on powerful desktop systems. 

For mobile devices, Torchchat delivers over 8 tokens per second on devices like Samsung Galaxy S23 and iPhone. Torchchat also uses 4-bit GPTQ through ExecuTorch, bringing advanced AI capabilities to mobile platforms. 

These performance metrics highlight Torchchat’s capabilities of efficiently running LLMs across various devices, ensuring that advanced AI technologies are accessible and effective on different platforms.

Advertisement

Bollywood Singer Arijit Singh Wins Copyright Case Against AI platforms.

The growing misuse of artificial intelligence technologies and the exploitation of famous personalities’ identities for self-centered motives lead to economic and social losses.

On July 31st, 2024, the Bombay High Court granted interim relief to Bollywood singer Arijit Singh in his copyright lawsuit against artificial intelligence platforms copying his voice.

Arijit’s name, image, voice, and personality traits were being compromised without authorization. Justice Rl Chagla noticed these traits are protectable under personality and publicity rights.

Discussions were conducted about the misuse of artificial intelligence technologies, which can take away individuals’ control over their image and likeness. This issue makes it difficult for people to stop others from using their identity for malicious purposes.

Read More: Swaayatt Robots Raises $4M

The Bombay High Court noted that AI technologies threaten the jobs of original artists. Using a performer’s identity to attract mass attention to websites and events puts individuals’ rights in danger.

Apart from this incident, Arijit’s identity was used unauthorizedly by various other sources. A pub in Bangalore advertised an event using his name and photos, and a business owner sold merchandise online with the artist’s photo.

Singh’s lawyer argued that the artist has exclusive rights to control how his personal information is used. He also stated that the unauthorized use of Arijit’s name could harm his reputation and violate his moral rights under Section 38-B of the Copyright Act, 1957.

Advertisement