What is NeMo Framework?


NVIDIA NeMo framework, part of NVIDIA AI Enterprise, is an end-to-end toolkit for building, customizing, and deploying state-of-the-art AI models at scale anywhere — on any cloud or on-premises. It includes data curation tools and pretrained models, training, customization and inference frameworks, and guardrail toolkit.

NeMo collections include Large Language Models (LLM), Multimodal Models (MM), Computer Vision (CV), Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Neural Machine Translation (NMT). Collections consist of modules that users can customize with their own data. Every module can be customized, extended, and composed to create new AI models. NeMo supports language and multimodal models based on Transformer architectures (GPT, T5, BERT, MoE, RETRO, etc.) as well as Diffusion architecture (Stable Diffusion etc.) including a rich community model support with Llama 2, Falcon, CLIP, Stable Diffusion, Imagen, LLAVA, Canary and others.


Model Training and Alignment


Transformer-based LLM and multimodal models can leverage NVIDIA Megatron Core for scaling training for models with billions of parameters across thousands of GPUs. Megatron Core includes state-of-the-art parallelization techniques such as tensor, pipeline and sequence parallelism, and selective activation recomputation for optimal performance.

The NeMo framework offers customization techniques to refine pretrained LLMs including p-tuning, LoRA, and supervised fine-tuning (SFT). NeMo LLMs can be aligned with state-of-the-art methods such as SteerLM, direct preference optimization (DPO) and reinforcement learning from human feedback (RLHF) through NVIDIA NeMo Aligner.

NeMo framework also supports training and customizing of speech AI models (ASR, Speech Classification, Speaker Recognition, Speaker Diarization, and TTS) in a reproducible manner. It also comes with a large set of speech AI tools including NeMo Forced Aligner, Speech Data Processor and Speech Data Explorer.

NVIDIA Framework Launcher is a tool for launching end-to-end training jobs on Slurm-based public or private clouds.


Deployment and NVIDIA AI Enterprise


NeMo LLM and multimodal models can be deployed and optimized with NVIDIA NIM. ASR and TTS models customized with NeMo are optimized for inference and deployment with NVIDIA Riva.

NeMo framework is open-source, distributed through GitHub. Researchers can contribute and build on top of it. It is also available as a docker container on the NGC catalog and through an NVIDIA AI Enterprise subscription for production deployments.

NVIDIA AI Enterprise is a secure, enterprise-grade end-to-end software platform with automated distributed data processing, large-scale training and deployment on premises or in the cloud.