Overview of BioNeMo

BioNeMo is a software ecosystem produced by NVIDIA for the development and deployment of life sciences-oriented artificial intelligence models. The main components of BioNeMo are:

BioNeMo Recipes

BioNeMo Recipes are self-contained, reproducible training recipes for biomolecular and language models. Each recipe bundles a HuggingFace-compatible model definition, training scripts, configuration, and sample data into a single directory that can be run independently. Recipes are composed of

Models

HuggingFace-compatible model definitions with TransformerEngine layers:

AMPLIFY -- protein representation learning
ESM-2 -- protein representation learning
Geneformer -- single-cell gene expression
Llama 3 -- general-purpose language model
Mixtral -- mixture-of-experts language model
Qwen -- general-purpose language model

Training Recipes

Complete training environments with scripts, configs, and sample data:

esm2_native_te -- ESM-2 pretraining with native FSDP + TransformerEngine
esm2_accelerate_te -- ESM-2 pretraining with HF Accelerate + TransformerEngine
esm2_peft_te -- ESM-2 parameter-efficient fine-tuning
geneformer_native_te_mfsdp_fp8 -- Geneformer pretraining with FP8
llama3_native_te -- Llama 3 pretraining with native FSDP
fp8_analysis -- FP8 precision analysis tools
vit -- Vision Transformer reference recipe

Megatron recipes

Megatron training recipes are for models that benefit from larger scale 5D parallelism, or users who would like examples of training with the megatron framework.

evo2_megatron -- Evo2 DNA model with Megatron-Bridge based training for 5D parallelism support
eden_megatron -- Eden DNA model with Megatron-Bridge based training for 5D parallelism support

BioNeMo Sub-packages

Lightweight, pip-installable Python packages that provide reusable building blocks for training and data processing:

bionemo-core -- shared interfaces, data-loading helpers, and checkpoint management
bionemo-moco -- modular components for building diffusion and flow-matching generative models
bionemo-noodles -- fast FASTA/FASTQ parsing via a Python wrapper around the Rust noodles library
bionemo-scdl -- dataset classes optimized for single-cell data
bionemo-size-aware-batching -- memory-aware mini-batch construction for variable-length inputs
bionemo-webdatamodule -- a PyTorch Lightning DataModule for streaming WebDataset files

BioNeMo NIMs

BioNeMo NIMs are easy-to-use, enterprise-ready inference microservices with built-in API endpoints. NIMs are engineered for scalable, self- or cloud-hosted deployment of optimized, production-grade biomolecular foundation models.

Use the recipes and sub-packages when you need to train, fine-tune, or customize models. Use NIMs when you need production-ready inference against existing models.

Get notified of new releases, bug fixes, critical security updates, and more for biopharma. Subscribe.

BioNeMo User Success Stories

Enhancing Biologics Discovery and Development With Generative AI - Amgen leverages BioNeMo and DGX Cloud to train large language models (LLMs) on proprietary protein sequence data, predicting protein properties and designing biologics with enhanced capabilities. By using BioNeMo, Amgen achieved faster training and up to 100X faster post-training analysis, accelerating the drug discovery process.

Cognizant to apply generative AI to enhance drug discovery for pharmaceutical clients with NVIDIA BioNeMo - Cognizant leverages BioNeMo to enhance drug discovery for pharmaceutical clients using generative AI technology. This collaboration enables researchers to rapidly analyze vast datasets, predict interactions between drug compounds, and create new development pathways, aiming to improve productivity, reduce costs, and accelerate the development of life-saving treatments.

Cadence and NVIDIA Unveil Groundbreaking Generative AI and Accelerated Compute-Driven Innovations - Cadence's Orion molecular design platform will integrate with BioNeMo generative AI tool to accelerate therapeutic design and shorten time to trusted results in drug discovery. The combined platform will enable pharmaceutical companies to quickly generate and assess design hypotheses across various therapeutic modalities using on-demand GPU access.

Find more user stories on NVIDIA's Customer Stories and Technical Blog sites.