NVIDIA releases Jarvis and NeMo 1.0 for developers and researchers looking to build state-of-the-art machine learning-based solutions for pre-trained models. At NVIDIA GTC 2020 and NVIDIA GTC 2021, NeMo and Jarvis were among the top announcements that evoked interest from natural language processing enthusiasts.
While NVIDIA Jarvis is a speech recognition model that has an accuracy of 90 percent out of the box, NVIDIA NeMo 1.0 is an open-source toolkit for developing Automatic Speech Recognition (ASR).
Jarvis can be fine-tuned with NVIDIA Transfer Learning Toolkit (TLT), a tool to build production-ready machine learning models without coding, to support a wide range of domain-specific conversations in healthcare, telecommunications, finance, automobile, and neuroscience. According to NVIDIA Jarvis was trained on noisy data, multiple sampling rates including 8khz for call centers, variety of accents, and dialogue, all of which contribute to the model’s accuracy.
Today, Jarvis supports English, Japanese, German, and more to allow the development of products that can transcribe in different languages in real-time. NVIDIA is committed to adding support of other languages in Jarvis. While you can download Jarvis from the catalog, you need pre-requisites like access to NVIDIA Ampere architecture-based GPU and NVIDIA GPU Cloud account.
To further support Conversational AI, NLP, and Text-to-Speech (TTS) workflows, NVIDIA has released NeMo 1.0 for enabling the building of new NLP-based models on top of existing ones. Coupled with PyTorch, PyTorch Lightning, and Hydra frameworks, NeMo 1.0 allows researchers to effectively leverage one of the widely used deep learning frameworks.
NeMo 1.0 includes support of bidirectional machine translation from five languages — English, Spanish, Russian, Mandarin, German, and French. It also comes with new CitriNet and Conformer-CTC ASR models. With NeMo, developers can export most of the models to NVIDIA Jarvis for production deployment.
Check out NeMo 1.0 on GitHub.