NOTE It takes about 10 minutes to deploy this notebook as a Launchable. As of this writing, we are working on a free tier so a credit card may be required. You can reach out to your NVIDIA rep for credits.

ESM-2 Inference¶

This tutorial serves as a demo for ESM2 Inference using a CSV file with sequences column. To pre-train the ESM2 model please refer to ESM-2 Pretraining tutorial.

NOTE Some of the cells below generate long text output. We're using

%%capture --no-display --no-stderr cell_output

to suppress this output. Comment or delete this line in the cells below to restore full output.

Setup and Assumptions¶

In this tutorial, we will demonstrate how to download ESM2 checkpoint, create a CSV file with protein sequences, and infer a ESM-2 model.

All commands should be executed inside the BioNeMo docker container, which has all ESM-2 dependencies pre-installed. For more information on how to build or pull the BioNeMo2 container, refer to the Initialization Guide.

Import Required Libraries¶

In [1]:

Copied!

%%capture --no-display --no-stderr cell_output

import os
import shutil
import warnings

import pandas as pd
import torch

warnings.filterwarnings("ignore")
warnings.simplefilter("ignore")
%%capture --no-display --no-stderr cell_output

import os
import shutil
import warnings

import pandas as pd
import torch

warnings.filterwarnings("ignore")
warnings.simplefilter("ignore")

Work Directory¶

Set the work directory to store data and results:

NOTE We set the following to clean up the work directory created by this notebook

cleanup : bool = True

In [2]:

Copied!

cleanup: bool = True
cleanup: bool = True

In [3]:

Copied!





work_dir = "/workspace/bionemo2/esm2_inference_tutorial"

if cleanup and os.path.exists(work_dir):
    shutil.rmtree(work_dir)

if not os.path.exists(work_dir):
    os.makedirs(work_dir)
    print(f"Directory '{work_dir}' created.")
else:
    print(f"Directory '{work_dir}' already exists.")
work_dir = "/workspace/bionemo2/esm2_inference_tutorial"

if cleanup and os.path.exists(work_dir):
    shutil.rmtree(work_dir)

if not os.path.exists(work_dir):
    os.makedirs(work_dir)
    print(f"Directory '{work_dir}' created.")
else:
    print(f"Directory '{work_dir}' already exists.")

Directory '/workspace/bionemo2/esm2_inference_tutorial' created.

Download Model Checkpoints¶

The following code will download the pre-trained model, esm2n/650m:2.0, from the NGC registry:

In [4]:

Copied!

from bionemo.core.data.load import load

checkpoint_path = load("esm2/650m:2.0")
print(checkpoint_path)
from bionemo.core.data.load import load

checkpoint_path = load("esm2/650m:2.0")
print(checkpoint_path)

/home/ubuntu/.cache/bionemo/0798767e843e3d54315aef91934d28ae7d8e93c2849d5fcfbdf5fac242013997-esm2_650M_nemo2.tar.gz.untar

Data¶

We use the InMemoryProteinDataset class to load the protein sequence data from a .csv file. This data file should at least have a sequences column and can optionally have a labels column used for fine-tuning applications. Here is an example of how to create your own inference input data using a list of sequences in python:

In [5]:

Copied!





artificial_sequence_data = [
    "TLILGWSDKLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI",
    "LYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "GRFNVWLGGNESKIRQVLKAVKEIGVSPTLFAVYEKN",
    "DELTALGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "KLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI",
    "LFGAIGNAISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP",
    "LGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "LYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "ISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP",
    "SGSKASSDSQDANQCCTSCEDNAPATSYCVECSEPLCETCVEAHQRVKYTKDHTVRSTGPAKT",
]

# Create a DataFrame
df = pd.DataFrame(artificial_sequence_data, columns=["sequences"])

# Save the DataFrame to a CSV file
data_path = os.path.join(work_dir, "sequences.csv")
df.to_csv(data_path, index=False)
artificial_sequence_data = [
    "TLILGWSDKLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI",
    "LYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "GRFNVWLGGNESKIRQVLKAVKEIGVSPTLFAVYEKN",
    "DELTALGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "KLGSLLNQLAIANESLGGGTIAVMAERDKEDMELDIGKMEFDFKGTSVI",
    "LFGAIGNAISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP",
    "LGGLLHDIGKPVQRAGLYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "LYSGDHSTQGARFLRDLAENTGRAEYELLSLF",
    "ISAIHGQSAVEELVDAFVGGARISSAFPYSGDTYYLPKP",
    "SGSKASSDSQDANQCCTSCEDNAPATSYCVECSEPLCETCVEAHQRVKYTKDHTVRSTGPAKT",
]

# Create a DataFrame
df = pd.DataFrame(artificial_sequence_data, columns=["sequences"])

# Save the DataFrame to a CSV file
data_path = os.path.join(work_dir, "sequences.csv")
df.to_csv(data_path, index=False)

Run Inference¶

Similar to PyTorch Lightning, ESM-2 Inference takes advantage of some key classes:

MegatronStrategy - To launch and setup parallelism for NeMo and Megatron-LM.
Trainer - To configure training configurations and logging.
ESMFineTuneDataModule - To load sequence data for both fine-tuning and inference.
ESM2Config - To configure the ESM-2 model as BionemoLightningModule.

Please refer to ESM-2 Pretraining and ESM-2 Fine-Tuning tutorials for detailed description of these classes.

To run inference on the data created in the previous step, we can use the infer_esm2 executable which calls bionemo-framework/sub-packages/bionemo-esm2/src/bionemo/esm2/scripts/infer_esm2.py. We can get a full description of inference arguments by providing --help in the following command:

In [6]:

Copied!

! infer_esm2 --help
! infer_esm2 --help

usage: infer_esm2 [-h] --checkpoint-path CHECKPOINT_PATH --data-path DATA_PATH
                  --results-path RESULTS_PATH
                  [--precision {fp16,bf16,fp32,bf16-mixed,fp32-mixed,16-mixed,fp16-mixed,16,32}]
                  [--num-gpus NUM_GPUS] [--num-nodes NUM_NODES]
                  [--micro-batch-size MICRO_BATCH_SIZE]
                  [--pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE]
                  [--tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE]
                  [--prediction-interval {epoch,batch}] [--include-hiddens]
                  [--include-input-ids] [--include-embeddings]
                  [--include-logits] [--config-class CONFIG_CLASS]
                  [--lora-checkpoint-path LORA_CHECKPOINT_PATH]

Infer ESM2.

options:
  -h, --help            show this help message and exit
  --checkpoint-path CHECKPOINT_PATH
                        Path to the ESM2 pretrained checkpoint
  --data-path DATA_PATH
                        Path to the CSV file containing sequences and label
                        columns
  --results-path RESULTS_PATH
                        Path to the results directory.
  --precision {fp16,bf16,fp32,bf16-mixed,fp32-mixed,16-mixed,fp16-mixed,16,32}
                        Precision type to use for training.
  --num-gpus NUM_GPUS   Number of GPUs to use for training. Default is 1.
  --num-nodes NUM_NODES
                        Number of nodes to use for training. Default is 1.
  --micro-batch-size MICRO_BATCH_SIZE
                        Micro-batch size. Global batch size is inferred from
                        this.
  --pipeline-model-parallel-size PIPELINE_MODEL_PARALLEL_SIZE
                        Pipeline model parallel size. Default is 1.
  --tensor-model-parallel-size TENSOR_MODEL_PARALLEL_SIZE
                        Tensor model parallel size. Default is 1.
  --prediction-interval {epoch,batch}
                        Intervals to write DDP predictions into disk
  --include-hiddens     Include hiddens in output of inference
  --include-input-ids   Include input_ids in output of inference
  --include-embeddings  Include embeddings in output of inference
  --include-logits      Include per-token logits in output.
  --config-class CONFIG_CLASS
                        Model configs link model classes with losses, and
                        handle model initialization (including from a prior
                        checkpoint). This is how you can fine-tune a model.
                        First train with one config class that points to one
                        model class and loss, then implement and provide an
                        alternative config class that points to a variant of
                        that model and alternative loss. In the future this
                        script should also provide similar support for picking
                        different data modules for fine-tuning with different
                        data types. Choices: dict_keys(['ESM2Config',
                        'ESM2FineTuneSeqConfig', 'ESM2FineTuneTokenConfig'])
  --lora-checkpoint-path LORA_CHECKPOINT_PATH
                        Path to the lora states to restore from.

The hidden states (which are usually the output of each layer in a neural network) can be obtained by using --include-hiddens argument when calling the inference function of ESM-2 in BioNeMo Framework.

The hidden states can be converted into fixed-size vector embeddings. This is done by removing the hidden state vectors corresponding to padding tokens, then averaging across the rest. This process is often used when the goal is to create a single vector representation from the hidden states of a model, which can be used for various sequence-level downstream tasks such as classification (e.g. subcellular localization) or regression (e.g. melting temperature prediction). To obtain the embedding results we can use --include-embeddings argument.

By passing the hidden state of an amino acid sequence through the BERT language model head, we can obtain output logits at each position and transform them into probabilities. This can happen by using --include-logits argument. Logits here are the raw, unnormalized scores that represent the likelihood of each class and are not probabilities themselves; they can be any real number, including negative values.

Now lets call infer_esm2 executable with relevant arguments to compute and optionally return embeddings, hiddens and logits.

In [7]:

Copied!





%%capture --no-display --no-stderr cell_output

! infer_esm2 --checkpoint-path {checkpoint_path} \
             --data-path {data_path} \
             --results-path {work_dir} \
             --micro-batch-size 3 \
             --num-gpus 1 \
             --precision "bf16-mixed" \
             --include-hiddens \
             --include-embeddings \
             --include-logits \
             --include-input-ids
%%capture --no-display --no-stderr cell_output

! infer_esm2 --checkpoint-path {checkpoint_path} \
             --data-path {data_path} \
             --results-path {work_dir} \
             --micro-batch-size 3 \
             --num-gpus 1 \
             --precision "bf16-mixed" \
             --include-hiddens \
             --include-embeddings \
             --include-logits \
             --include-input-ids

Inference Results¶

Inference predictions are stored into .pt files for each device. Since we only used one device to run the inference (--num-gpus 1) in the previous step, the results were written to {work_dir}/predictions__rank_0.pt under the work directory of this notebook (defined above). The .pt file containes a dictionary of {'result_key': torch.Tensor} that be loaded with PyTorch:

In [8]:

Copied!

results = torch.load(f"{work_dir}/predictions__rank_0.pt")

for key, val in results.items():
    if val is not None:
        print(f"{key}\t{val.shape}")
results = torch.load(f"{work_dir}/predictions__rank_0.pt")

for key, val in results.items():
    if val is not None:
        print(f"{key}\t{val.shape}")

token_logits	torch.Size([1024, 10, 128])
hidden_states	torch.Size([10, 1024, 1280])
input_ids	torch.Size([10, 1024])
embeddings	torch.Size([10, 1280])

In this example data a python dict with the following keys ['token_logits', 'hidden_states', 'input_ids', 'embeddings']. Logits (token_logits) tensor has a dimension of [sequence, batch, hidden] to improve the training performance. We will transpose the first two dimension in the following to have batch-first shape like the rest of the output tensors.

In [9]:

Copied!

logits = results["token_logits"].transpose(0, 1)  # s, b, h  -> b, s, h
print(logits.shape)
logits = results["token_logits"].transpose(0, 1)  # s, b, h  -> b, s, h
print(logits.shape)

torch.Size([10, 1024, 128])

The last dimension of token_logits is 128, with the first 33 positions corresponding to the amino acid vocabulary, followed by 95 paddings. We use the tokenizer.vocab_size to filter out the paddings and only keep the 33 vocab positions.

In [10]:

Copied!

from bionemo.esm2.data.tokenizer import get_tokenizer

tokenizer = get_tokenizer()

tokens = tokenizer.all_tokens
print(f"There are {tokenizer.vocab_size} unique tokens: {tokens}.")

aa_logits = logits[..., : tokenizer.vocab_size]  # filter out the 95 paddings and only keep 33 vocab positions
print(f"Logits shape after removing the paddings in hidden dimension: {aa_logits.shape}")
from bionemo.esm2.data.tokenizer import get_tokenizer

tokenizer = get_tokenizer()

tokens = tokenizer.all_tokens
print(f"There are {tokenizer.vocab_size} unique tokens: {tokens}.")

aa_logits = logits[..., : tokenizer.vocab_size]  # filter out the 95 paddings and only keep 33 vocab positions
print(f"Logits shape after removing the paddings in hidden dimension: {aa_logits.shape}")

There are 33 unique tokens: ['<cls>', '<pad>', '<eos>', '<unk>', 'L', 'A', 'G', 'V', 'S', 'E', 'R', 'T', 'I', 'D', 'P', 'K', 'Q', 'N', 'F', 'Y', 'M', 'H', 'W', 'C', 'X', 'B', 'U', 'Z', 'O', '.', '-', '<null_1>', '<mask>'].
Logits shape after removing the paddings in hidden dimension: torch.Size([10, 1024, 33])

Let's set aside the tokens corresponding to the 20 known amino acids.

In [11]:

Copied!

aa_tokens = ["L", "A", "G", "V", "S", "E", "R", "T", "I", "D", "P", "K", "Q", "N", "F", "Y", "M", "H", "W", "C"]

aa_indices = [i for i, token in enumerate(tokens) if token in aa_tokens]
extra_indices = [i for i, token in enumerate(tokens) if token not in aa_tokens]
aa_tokens = ["L", "A", "G", "V", "S", "E", "R", "T", "I", "D", "P", "K", "Q", "N", "F", "Y", "M", "H", "W", "C"]

aa_indices = [i for i, token in enumerate(tokens) if token in aa_tokens]
extra_indices = [i for i, token in enumerate(tokens) if token not in aa_tokens]

The sequence dimension in this example (1024) is representing the max sequence length wich includes paddings, EOS, and BOS. To filter the relevant amino acid information we can use the input sequence IDs in the results to create a mask that can be used to extract the relevant information in aa_logits

In [12]:

Copied!

input_ids = results["input_ids"]  # b, s
# mask where non-amino acid tokens are True
mask = torch.isin(input_ids, torch.tensor(extra_indices))
input_ids = results["input_ids"]  # b, s
# mask where non-amino acid tokens are True
mask = torch.isin(input_ids, torch.tensor(extra_indices))

Inference with LoRA¶

Inference with LoRA is supported. This requires the original model weights along with the additional LoRA weights.

In [13]:

Copied!





# Load the base model and PEFT weights

checkpoint_path = load("esm2/8m:2.0")
lora_checkpoint_path = load("esm2/esm2_lora_weights:1.1") / "weights"

# Perform inference with LoRA
! infer_esm2 --checkpoint-path {checkpoint_path} \
             --data-path {data_path} \
             --results-path {work_dir} \
             --micro-batch-size 3 \
             --num-gpus 1 \
             --include-hiddens \
             --include-embeddings \
             --include-logits \
             --include-input-ids \
             --lora-checkpoint-path {lora_checkpoint_path}
# Load the base model and PEFT weights

checkpoint_path = load("esm2/8m:2.0")
lora_checkpoint_path = load("esm2/esm2_lora_weights:1.1") / "weights"

# Perform inference with LoRA
! infer_esm2 --checkpoint-path {checkpoint_path} \
             --data-path {data_path} \
             --results-path {work_dir} \
             --micro-batch-size 3 \
             --num-gpus 1 \
             --include-hiddens \
             --include-embeddings \
             --include-logits \
             --include-input-ids \
             --lora-checkpoint-path {lora_checkpoint_path}

[INFO     | pytorch_lightning.utilities.rank_zero]: GPU available: True (cuda), used: True
[INFO     | pytorch_lightning.utilities.rank_zero]: TPU available: False, using: 0 TPU cores
[INFO     | pytorch_lightning.utilities.rank_zero]: HPU available: False, using: 0 HPUs
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Fixing mis-match between ddp-config & mcore-optimizer config
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has data parallel group : [0]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has combined group of data parallel and context parallel : [0]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] All data parallel group ranks with context parallel combined: [[0]]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Ranks 0 has data parallel rank: 0
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has context parallel group: [0]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] All context parallel group ranks: [[0]]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Ranks 0 has context parallel rank: 0
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has model parallel group: [0]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] All model parallel group ranks: [[0]]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has tensor model parallel group: [0]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] All tensor model parallel group ranks: [[0]]
[NeMo I 2025-04-25 04:47:42 nemo_logging:393] Rank 0 has tensor model parallel rank: 0
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Rank 0 has embedding group: [0]
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] All pipeline model parallel group ranks: [[0]]
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Rank 0 has pipeline model parallel rank 0
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] All embedding group ranks: [[0]]
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Rank 0 has embedding rank: 0
[INFO     | pytorch_lightning.utilities.rank_zero]: ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Setting up ModelTransform for stage: TrainerFn.PREDICTING
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Found model_transform attribute on pl_module
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Set model_transform to: <function _call_counter.<locals>.wrapper at 0x70af0b61c220>
[WARNING  | /workspaces/bionemo-framework/sub-packages/bionemo-llm/src/bionemo/llm/model/config.py]: Loading /home/ubuntu/.cache/bionemo/2957b2c36d5978d0f595d6f1b72104b312621cf0329209086537b613c1c96d16-esm2_hf_converted_8m_checkpoint.tar.gz.untar
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Padded vocab_size: 128, original vocab_size: 33, dummy tokens: 95.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2025-04-25 04:47:43 nemo_logging:405] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.
[NeMo I 2025-04-25 04:47:43 nemo_logging:393]  > number of parameters on (tensor, pipeline) model parallel rank (0 ,0): 7542848
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.0.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.0.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.0.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.0.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.1.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.1.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.1.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.1.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.2.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.2.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.2.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.2.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.3.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.3.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.3.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.3.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.4.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.4.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.4.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.4.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.5.self_attention.linear_proj
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.5.self_attention.linear_qkv
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.5.mlp.linear_fc1
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Adding lora to: 0.module.module.module.encoder.layers.5.mlp.linear_fc2
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] After applying model_transform:
    
  | Name   | Type | Params | Mode
---------------------------------------
0 | module | DDP  | 8.5 M  | eval
---------------------------------------
1.1 M     Trainable params
7.4 M     Non-trainable params
8.5 M     Total params
34.104    Total estimated model params size (MB)
121       Modules in train mode
133       Modules in eval mode
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Loading adapters from /home/ubuntu/.cache/bionemo/ffce714478009c353f20e8c1ad8e49638128f0cc936ebe1d44c161ac89831dcb-finetuned_peft_weights.tar.gz.untar/weights
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Initializing model parallel
[NeMo I 2025-04-25 04:47:43 nemo_logging:393]  > number of parameters on (tensor, pipeline) model parallel rank (0 ,0): 8525888
[NeMo I 2025-04-25 04:47:43 nemo_logging:393]  > number of trainable parameters: 1086528 (12.74% of total)
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Using <megatron.core.dist_checkpointing.strategies.fully_parallel.FullyParallelLoadStrategyWrapper object at 0x70af081e3f50> dist-ckpt load strategy.
[NeMo I 2025-04-25 04:47:43 nemo_logging:393] Global Checkpoint Load : Rank : 0 : Start time : 1745556463.638s : Time spent in load_checkpoint: 0.029s
[NeMo W 2025-04-25 04:47:43 nemo_logging:405] MegatronOptimizerModule not found in trainer callbacks. finalize_model_grads is not properly set up for PEFT.

In [14]:

Copied!

results = torch.load(f"{work_dir}/predictions__rank_0.pt")

for key, val in results.items():
    if val is not None:
        print(f"{key}\t{val.shape}")
results = torch.load(f"{work_dir}/predictions__rank_0.pt")

for key, val in results.items():
    if val is not None:
        print(f"{key}\t{val.shape}")

token_logits	torch.Size([1024, 10, 128])
hidden_states	torch.Size([10, 1024, 320])
input_ids	torch.Size([10, 1024])
embeddings	torch.Size([10, 320])

DDP Inference Support¶

Although this tutorial is utilizing one devive to run the inference, distributed inference is supported for ESM2 in BioNeMo Framework. One can simply set the the --num-gpus n to run distributed inference on n devices. The output predictions will be written into predictions__rank_<0...n-1>.pt under the --results-path provided. Moreover, by optionally including input token IDs with --include-input-ids we can snure 1:1 mapping between input sequences and output predictions.

The following snippet can be used to load and collate the predictions into a single dictionary.

import glob
from bionemo.llm.lightning import batch_collator

collated_preditions = batch_collator([torch.load(path) for path in glob.glob(f"{work_dir}/predictions__rank_*.pt")])
for key, val in collated_preditions.items():
    if val is not None:
        print(f'{key}\t{val.shape}')

# token_logits	torch.Size([1024, 10, 128])
# hidden_states	torch.Size([10, 1024, 1280])
# input_ids     torch.Size([10, 1024])
# embeddings	torch.Size([10, 1280])

For more in-depth example of inference and converting logits to probabilities please refer to ESM-2 Mutant Design Tutorial