Overview of BioNeMo
BioNeMo is a software ecosystem produced by NVIDIA for the development and deployment of life sciences-oriented artificial intelligence models. The main components of BioNeMo are:
BioNeMo Recipes
BioNeMo Recipes are self-contained, reproducible training recipes for biomolecular and language models. Each recipe bundles a HuggingFace-compatible model definition, training scripts, configuration, and sample data into a single directory that can be run independently. Recipes are composed of
Models
HuggingFace-compatible model definitions with TransformerEngine layers:
- AMPLIFY -- protein representation learning
- ESM-2 -- protein representation learning
- Geneformer -- single-cell gene expression
- Llama 3 -- general-purpose language model
- Mixtral -- mixture-of-experts language model
- Qwen -- general-purpose language model
Training Recipes
Complete training environments with scripts, configs, and sample data:
- esm2_native_te -- ESM-2 pretraining with native FSDP + TransformerEngine
- esm2_accelerate_te -- ESM-2 pretraining with HF Accelerate + TransformerEngine
- esm2_peft_te -- ESM-2 parameter-efficient fine-tuning
- geneformer_native_te_mfsdp_fp8 -- Geneformer pretraining with FP8
- llama3_native_te -- Llama 3 pretraining with native FSDP
- fp8_analysis -- FP8 precision analysis tools
- vit -- Vision Transformer reference recipe
Megatron recipes
Megatron training recipes are for models that benefit from larger scale 5D parallelism, or users who would like examples of training with the megatron framework.
- evo2_megatron -- Evo2 DNA model with Megatron-Bridge based training for 5D parallelism support
- eden_megatron -- Eden DNA model with Megatron-Bridge based training for 5D parallelism support
BioNeMo Sub-packages
Lightweight, pip-installable Python packages that provide reusable building blocks for training and data processing:
- bionemo-core -- shared interfaces, data-loading helpers, and checkpoint management
- bionemo-moco -- modular components for building diffusion and flow-matching generative models
- bionemo-noodles -- fast FASTA/FASTQ parsing via a Python wrapper around the Rust noodles library
- bionemo-scdl -- dataset classes optimized for single-cell data
- bionemo-size-aware-batching -- memory-aware mini-batch construction for variable-length inputs
- bionemo-webdatamodule -- a PyTorch Lightning DataModule for streaming WebDataset files
BioNeMo NIMs
BioNeMo NIMs are easy-to-use, enterprise-ready inference microservices with built-in API endpoints. NIMs are engineered for scalable, self- or cloud-hosted deployment of optimized, production-grade biomolecular foundation models.
Use the recipes and sub-packages when you need to train, fine-tune, or customize models. Use NIMs when you need production-ready inference against existing models.
Get notified of new releases, bug fixes, critical security updates, and more for biopharma. Subscribe.
BioNeMo User Success Stories
Enhancing Biologics Discovery and Development With Generative AI - Amgen leverages BioNeMo and DGX Cloud to train large language models (LLMs) on proprietary protein sequence data, predicting protein properties and designing biologics with enhanced capabilities. By using BioNeMo, Amgen achieved faster training and up to 100X faster post-training analysis, accelerating the drug discovery process.
Cognizant to apply generative AI to enhance drug discovery for pharmaceutical clients with NVIDIA BioNeMo - Cognizant leverages BioNeMo to enhance drug discovery for pharmaceutical clients using generative AI technology. This collaboration enables researchers to rapidly analyze vast datasets, predict interactions between drug compounds, and create new development pathways, aiming to improve productivity, reduce costs, and accelerate the development of life-saving treatments.
Cadence and NVIDIA Unveil Groundbreaking Generative AI and Accelerated Compute-Driven Innovations - Cadence's Orion molecular design platform will integrate with BioNeMo generative AI tool to accelerate therapeutic design and shorten time to trusted results in drug discovery. The combined platform will enable pharmaceutical companies to quickly generate and assess design hypotheses across various therapeutic modalities using on-demand GPU access.
Find more user stories on NVIDIA's Customer Stories and Technical Blog sites.