NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models) and make it easier to create new conversational AI models.
A NeMo model is composed of building blocks called neural modules. The inputs and outputs of these modules are strongly typed with neural types that can automatically perform the semantic checks between the modules.
NeMo Megatron is an end-to-end platform that delivers high training efficiency across thousands of GPUs and makes it practical for enterprises to deploy large-scale NLP. It provides capabilities to curate training data, train large-scale models up to trillions of parameters and deploy them in inference.
It performs data curation tasks such as formatting, filtering, deduplication, and blending that can otherwise take months. It includes state-of-the-art parallelization techniques such as tensor parallelism, pipeline parallelism, sequence parallelism, and selective activation recomputation, to scale models efficiently.
These models can be exported to the NVIDIA Triton™ Inference Server to run large-scale NLP models on multiple GPUs and multiple nodes.
NeMo Megaton is optimized to run on DGX system on-prem as well as in the cloud.
NeMo is built on top of PyTorch and PyTorch Lightning, providing an easy path for researchers to develop and integrate with modules with which they are already comfortable. PyTorch and PyTorch lightning are open-source python libraries that provide modules to compose models.
To provide the researcher's flexibility to customize the models/modules easily, NeMo integrated with the Hydra framework. Hydra is a popular framework that simplifies the development of complex conversational AI models.
NeMo is available as an open-source so that researchers can contribute to and build on it.
To leverage the NVIDIA AI platform and deploy NeMo speech models in production with NVIDIA Riva, developers should export NeMo models to a format compatible with Riva and then execute Riva build and deploy commands for creating an optimized skill that can run in real-time.
The documentation includes detailed instructions for exporting and deploying NeMo models to Riva.