BioNeMo - Geneformer inferencing for single cell downstream tasks¶
This tutorial showcases how to run the BioNeMo container, pre-train a geneformer model, and use it for inferencing downstream single cell tasks. At the end of this tutorial, a user will learn:
- launching the BioNeMo container
- Download data from czi to use for pre-training and inference.
- Convert AnnData files into the sparse SCDL memmap format used by BioNeMo
- Kick-off pretraining with a custom single cell dataset
- Restore the pre-trained model and perform inference with the same czi dataset.
Prerequisites:¶
- BioNeMo Framework container is running (refer to the Getting Started section)
Running the BioNeMo container¶
This example has been built by launching the container in a local machine with 2 x A6000 RTX GPUs. Refer to specific instructions for [remote and multi-node launch]
Once the container is launched, navigate to http://0.0.0.0:8888, http://localhost:8888, or the IP address of the workstation/node. A JupyterLab instance should show up.
Copy this code and input files into JupyterLab¶
In the launched JupyterLab, run the codes in a Jupyter notebook as provided in the code cells below.
Getting example single cell data and setting it up for inference¶
First, we must acquire single cell training data for inference. To do this we will install the cellxgene-census api and download a small dataset. We use the example provided by the czi api examples page to download a single h5ad file. Generally, our workflow expects a collection of h5ad files to be used for pre-training. In this case, we restrict to 100k cells from a single dataset to keep training time and downloading time small.
!pip install cellxgene-census
DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/looseversion-1.3.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/opt_einsum-3.4.0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/nvfuser-0.2.23a0+6627725-py3.12-linux-x86_64.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/lightning_utilities-0.12.0.dev0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/lightning_thunder-0.2.0.dev0-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 DEPRECATION: Loading egg at /usr/local/lib/python3.12/dist-packages/dill-0.3.9-py3.12.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: cellxgene-census in /usr/local/lib/python3.12/dist-packages (1.16.2) Requirement already satisfied: tiledbsoma!=1.14.1,>=1.12.3 in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (1.16.0) Requirement already satisfied: anndata in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (0.11.3) Requirement already satisfied: numpy<2.0,>=1.23 in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (1.26.4) Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (2.32.3) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (4.12.2) Requirement already satisfied: s3fs>=2021.06.1 in /usr/local/lib/python3.12/dist-packages (from cellxgene-census) (2024.10.0) Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /usr/local/lib/python3.12/dist-packages (from s3fs>=2021.06.1->cellxgene-census) (2.13.3) Requirement already satisfied: fsspec==2024.10.0.* in /usr/local/lib/python3.12/dist-packages (from s3fs>=2021.06.1->cellxgene-census) (2024.10.0) Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.12/dist-packages (from s3fs>=2021.06.1->cellxgene-census) (3.11.10) Requirement already satisfied: attrs>=22.2 in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (24.2.0) Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (2.2.2) Requirement already satisfied: pyarrow in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (17.0.0) Requirement already satisfied: scanpy>=1.9.2 in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.10.4) Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.14.1) Requirement already satisfied: somacore==1.0.28 in /usr/local/lib/python3.12/dist-packages (from tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.0.28) Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.12/dist-packages (from somacore==1.0.28->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.6) Requirement already satisfied: shapely in /usr/local/lib/python3.12/dist-packages (from somacore==1.0.28->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (2.0.7) Requirement already satisfied: array-api-compat!=1.5,>1.4 in /usr/local/lib/python3.12/dist-packages (from anndata->cellxgene-census) (1.11.1) Requirement already satisfied: h5py>=3.7 in /usr/local/lib/python3.12/dist-packages (from anndata->cellxgene-census) (3.13.0) Requirement already satisfied: natsort in /usr/local/lib/python3.12/dist-packages (from anndata->cellxgene-census) (8.4.0) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from anndata->cellxgene-census) (24.2) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->cellxgene-census) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->cellxgene-census) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->cellxgene-census) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->cellxgene-census) (2024.12.14) Requirement already satisfied: botocore<1.34.163,>=1.34.70 in /usr/local/lib/python3.12/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2021.06.1->cellxgene-census) (1.34.151) Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /usr/local/lib/python3.12/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2021.06.1->cellxgene-census) (1.17.2) Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /usr/local/lib/python3.12/dist-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2021.06.1->cellxgene-census) (0.12.0) Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (2.4.4) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (1.3.2) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (1.5.0) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (6.1.0) Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (0.2.1) Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2021.06.1->cellxgene-census) (1.18.3) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (2023.4) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (2024.2) Requirement already satisfied: joblib in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.4.2) Requirement already satisfied: legacy-api-wrap>=1.4 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.4.1) Requirement already satisfied: matplotlib>=3.6 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (3.10.0) Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (3.4.2) Requirement already satisfied: numba>=0.56 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.61.0) Requirement already satisfied: patsy!=1.0.0 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.0.1) Requirement already satisfied: pynndescent>=0.5 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.5.13) Requirement already satisfied: scikit-learn>=1.1 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.6.1) Requirement already satisfied: seaborn>=0.13 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.13.2) Requirement already satisfied: session-info in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.0.0) Requirement already satisfied: statsmodels>=0.13 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.14.4) Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (4.67.1) Requirement already satisfied: umap-learn!=0.5.0,>=0.5 in /usr/local/lib/python3.12/dist-packages (from scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.5.7) Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/local/lib/python3.12/dist-packages (from botocore<1.34.163,>=1.34.70->aiobotocore<3.0.0,>=2.5.4->s3fs>=2021.06.1->cellxgene-census) (1.0.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.3.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (4.55.3) Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.4.8) Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (11.1.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (3.2.1) Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /usr/local/lib/python3.12/dist-packages (from numba>=0.56->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.44.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (1.17.0) Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn>=1.1->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (3.5.0) Requirement already satisfied: stdlib-list in /usr/local/lib/python3.12/dist-packages (from session-info->scanpy>=1.9.2->tiledbsoma!=1.14.1,>=1.12.3->cellxgene-census) (0.11.1) [notice] A new release of pip is available: 24.3.1 -> 25.0.1 [notice] To update, run: python -m pip install --upgrade pip
# Below are paths required for setting up pre-training and inference.
tutorial_data_dir = "/workspace/bionemo2/data/singlecell_tutorial/download_anndata"
train_tutorial_data_dir = "/workspace/bionemo2/data/singlecell_tutorial/download_anndata/train"
val_tutorial_data_dir = "/workspace/bionemo2/data/singlecell_tutorial/download_anndata/val"
test_tutorial_data_dir = "/workspace/bionemo2/data/singlecell_tutorial/download_anndata/test"
train_tutorial_processed_dir = "/workspace/bionemo2/data/singlecell_tutorial/processed_data/train"
val_tutorial_processed_dir = "/workspace/bionemo2/data/singlecell_tutorial/processed_data/val"
test_tutorial_processed_dir = "/workspace/bionemo2/data/singlecell_tutorial/processed_data/test"
tutorial_output_dir = "/workspace/bionemo2/data/singlecell_tutorial/inference_output"
tutorial_output_inference_pickle = f"{tutorial_output_dir}/human_covid19_bcells_from_scratch.pkl"
demo_data_train_download_path = f"{train_tutorial_data_dir}/human_covid19_bcells.h5ad"
demo_data_val_download_path = f"{val_tutorial_data_dir}/human_covid19_bcells.h5ad"
demo_data_test_download_path = f"{test_tutorial_data_dir}/human_covid19_bcells.h5ad"
!mkdir -p {train_tutorial_data_dir}
!mkdir -p {val_tutorial_data_dir}
!mkdir -p {test_tutorial_data_dir}
!mkdir -p {train_tutorial_processed_dir}
!mkdir -p {val_tutorial_processed_dir}
!mkdir -p {test_tutorial_processed_dir}
!mkdir -p {tutorial_output_dir}
import cellxgene_census
frac_train = 0.8
frac_val = 0.1
frac_test = 0.1
with cellxgene_census.open_soma(census_version="2023-12-15") as census:
filter1 = (
"cell_type == 'B cell' and tissue_general == 'lung' and disease == 'COVID-19' and is_primary_data == True"
)
adata = cellxgene_census.get_anndata(
census=census,
organism="Homo sapiens",
obs_value_filter=filter1,
)
n_train = int(adata.shape[0] * frac_train)
n_val = int(adata.shape[0] * frac_val)
n_test = adata.shape[0] - n_train - n_val
# Create some splits, bad practice since ordering may be a thing but let's just take ranges for this demo.
adata_train = adata[0:n_train].copy()
adata_val = adata[n_train : (n_train + n_val)].copy()
adata_test = adata[(n_train + n_val) :].copy()
adata_train.write(demo_data_train_download_path)
adata_val.write(demo_data_val_download_path)
adata_test.write(demo_data_test_download_path)
/usr/local/lib/python3.12/dist-packages/optuna/study/_optimize.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from optuna import progress_bar as pbar_module
!rm -rf {train_tutorial_processed_dir}
!rm -rf {val_tutorial_processed_dir}
!rm -rf {test_tutorial_processed_dir}
# Create training data processed directory
!convert_h5ad_to_scdl \
--data-path {train_tutorial_data_dir} \
--save-path {train_tutorial_processed_dir}
# Create validation data processed directory
!convert_h5ad_to_scdl \
--data-path {val_tutorial_data_dir} \
--save-path {val_tutorial_processed_dir}
# Create test data processed directory
!convert_h5ad_to_scdl \
--data-path {test_tutorial_data_dir} \
--save-path {test_tutorial_processed_dir}
!ls -laht {train_tutorial_processed_dir}
total 8.0K drwxr-xr-x 5 jomitchell domain-users 4.0K Mar 11 18:57 .. drwxr-xr-x 2 jomitchell domain-users 4.0K Mar 11 18:57 .
Pretraining¶
Now that we have converted the h5ad files to scdl memmapped files we can begin training. We will kickoff training.
Check the full recipe/config file in pretrain-recipe-short.yaml
for a complete list of arguments and config parameters.
# See where the processed data is stored
{train_tutorial_processed_dir}
{'/workspace/bionemo2/data/singlecell_tutorial/processed_data/train'}
# Create the recipe file
!bionemo-geneformer-recipe --recipe geneformer_10m_shortpretrain_recipe --dest pretrain-recipe-short.yaml --result-dir /workspace/bionemo2/results --data-path /workspace/bionemo2/data/singlecell_tutorial/processed_data/
[NeMo W 2025-03-11 20:30:23 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:163: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:239: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:985: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1044: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:25: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:61: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:757: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:835: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [NeMo I 2025-03-11 20:30:26 nemo_logging:393] Saved configuration to args.dest='pretrain-recipe-short.yaml'
# Run pretraining using the short recipe
!python /workspace/bionemo2/sub-packages/bionemo-geneformer/src/bionemo/geneformer/run/main.py \
--config /workspace/bionemo2/docs/docs/user-guide/examples/bionemo-geneformer/pretrain-recipe-short.yaml
[NeMo W 2025-03-11 20:30:42 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:163: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:239: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:985: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1044: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:25: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:61: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:757: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:835: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [NeMo W 2025-03-11 20:30:45 nemo_logging:405] Tokenizer vocab file: /workspace/bionemo2/data/singlecell_tutorial/processed_data/train/geneformer.vocab already exists. Overwriting... [NeMo I 2025-03-11 20:30:45 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Resource already exists, skipping download: https://huggingface.co/ctheodoris/Geneformer/resolve/main/geneformer/gene_dictionaries_30m/gene_name_id_dict_gc30M.pkl?download=true [NeMo I 2025-03-11 20:30:45 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Resource already exists, skipping download: https://huggingface.co/ctheodoris/Geneformer/resolve/main/geneformer/gene_dictionaries_30m/gene_median_dictionary_gc30M.pkl?download=true [NeMo I 2025-03-11 20:30:45 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] *************** Preprocessing Finished ************ [INFO | pytorch_lightning.utilities.rank_zero]: Trainer already configured with model summary callbacks: [<class 'lightning.pytorch.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback. [INFO | pytorch_lightning.utilities.rank_zero]: GPU available: True (cuda), used: True [INFO | pytorch_lightning.utilities.rank_zero]: TPU available: False, using: 0 TPU cores [INFO | pytorch_lightning.utilities.rank_zero]: HPU available: False, using: 0 HPUs [NeMo W 2025-03-11 20:30:45 nemo_logging:405] User-set tensorboard is currently turned off. Internally one may still be set by NeMo2. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Experiments will be logged at /workspace/bionemo2/results/geneformer-10m/dev [NeMo W 2025-03-11 20:30:45 nemo_logging:405] "update_logger_directory" is True. Overwriting tensorboard logger "save_dir" to /workspace/bionemo2/results [NeMo W 2025-03-11 20:30:45 nemo_logging:405] "update_logger_directory" is True. Overwriting wandb logger "save_dir" to /workspace/bionemo2/results/geneformer-10m [NeMo W 2025-03-11 20:30:45 nemo_logging:405] The Trainer already contains a ModelCheckpoint callback. This will be overwritten. [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has data parallel group : [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has combined group of data parallel and context parallel : [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All data parallel group ranks with context parallel combined: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Ranks 0 has data parallel rank: 0 [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has context parallel group: [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All context parallel group ranks: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Ranks 0 has context parallel rank: 0 [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has model parallel group: [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has tensor model parallel group: [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All tensor model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has tensor model parallel rank: 0 [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has pipeline model parallel group: [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has embedding group: [0] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All pipeline model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has pipeline model parallel rank 0 [NeMo I 2025-03-11 20:30:45 nemo_logging:393] All embedding group ranks: [[0]] [NeMo I 2025-03-11 20:30:45 nemo_logging:393] Rank 0 has embedding rank: 0 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [INFO | pytorch_lightning.utilities.rank_zero]: ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 1 processes ---------------------------------------------------------------------------------------------------- wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id 1. wandb: Tracking run with wandb version 0.19.8 wandb: W&B syncing is set to `offline` in this directory. wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /workspace/bionemo2/results/geneformer-10m/dev/checkpoints exists and is not empty. [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/models/bert/bert_layer_specs.py:79: UserWarning: Attribute bert_layer_specs.bert_layer_with_transformer_engine_spec is on a deprecation track and will be removed in future releases. Please migrate to bert_layer_specs.get_bert_layer_with_transformer_engine_spec(). warnings.warn( [NeMo I 2025-03-11 20:30:47 nemo_logging:393] Padded vocab_size: 25472, original vocab_size: 25429, dummy tokens: 43. LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3] [NeMo I 2025-03-11 20:30:47 nemo_logging:393] Copying Trainer's 'max_steps' (500) to LR scheduler's 'max_steps'. [NeMo I 2025-03-11 20:30:47 num_microbatches_calculator:228] setting number of microbatches to constant 1 [NeMo I 2025-03-11 20:30:47 nemo_logging:393] > number of parameters on (tensor, pipeline) model parallel rank (0 ,0): 10300032 [NeMo I 2025-03-11 20:30:47 utils:302] Setting up DistributedDataParallel with config DistributedDataParallelConfig(grad_reduce_in_fp32=False, overlap_grad_reduce=True, overlap_param_gather=False, align_param_gather=False, use_distributed_optimizer=True, num_distributed_optimizer_instances=1, check_for_nan_in_grad=True, check_for_large_grads=False, bucket_size=40000000, average_in_collective=True, fp8_param_gather=False) [NeMo I 2025-03-11 20:30:47 utils:323] Number of buckets for gradient all-reduce / reduce-scatter: 1 Params for bucket 1 (10300032 elements): module.output_layer.bias module.encoder.layers.3.mlp.linear_fc1.layer_norm_bias module.encoder.layers.0.mlp.linear_fc1.weight module.encoder.layers.4.self_attention.linear_qkv.weight module.encoder.layers.1.mlp.linear_fc1.weight module.encoder.layers.1.self_attention.linear_proj.bias module.encoder.layers.5.mlp.linear_fc2.weight module.encoder.layers.3.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.0.self_attention.linear_qkv.bias module.encoder.layers.3.self_attention.linear_proj.bias module.encoder.layers.1.self_attention.linear_qkv.bias module.encoder.layers.5.mlp.linear_fc1.layer_norm_weight module.encoder.layers.2.mlp.linear_fc1.layer_norm_bias module.encoder.layers.0.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.3.self_attention.linear_qkv.weight module.encoder.layers.0.mlp.linear_fc2.bias module.encoder.layers.4.mlp.linear_fc2.weight module.encoder.layers.2.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.2.self_attention.linear_proj.bias module.encoder.final_layernorm.weight module.encoder.layers.5.mlp.linear_fc2.bias module.encoder.layers.5.mlp.linear_fc1.bias module.encoder.layers.0.mlp.linear_fc1.layer_norm_bias module.encoder.layers.4.mlp.linear_fc1.layer_norm_weight module.encoder.layers.1.mlp.linear_fc1.layer_norm_bias module.encoder.layers.2.self_attention.linear_qkv.weight module.encoder.layers.0.self_attention.linear_proj.weight module.encoder.layers.4.self_attention.linear_proj.bias module.encoder.layers.3.mlp.linear_fc2.weight module.encoder.layers.4.mlp.linear_fc2.bias module.encoder.layers.4.mlp.linear_fc1.bias module.encoder.layers.0.self_attention.linear_proj.bias module.lm_head.dense.weight module.encoder.layers.3.mlp.linear_fc1.layer_norm_weight module.encoder.layers.0.self_attention.linear_qkv.weight module.encoder.layers.5.self_attention.linear_proj.weight module.encoder.layers.1.self_attention.linear_qkv.weight module.encoder.layers.5.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.2.mlp.linear_fc2.bias module.encoder.layers.2.mlp.linear_fc2.weight module.encoder.layers.3.mlp.linear_fc2.bias module.encoder.layers.3.mlp.linear_fc1.bias module.encoder.layers.2.mlp.linear_fc1.layer_norm_weight module.lm_head.dense.bias module.encoder.layers.5.mlp.linear_fc1.weight module.encoder.layers.4.self_attention.linear_proj.weight module.encoder.layers.1.self_attention.linear_proj.weight module.encoder.layers.4.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.1.mlp.linear_fc2.weight module.embedding.word_embeddings.weight module.encoder.layers.5.self_attention.linear_qkv.bias module.encoder.layers.2.mlp.linear_fc1.bias module.encoder.layers.0.mlp.linear_fc1.layer_norm_weight module.encoder.layers.1.mlp.linear_fc1.layer_norm_weight module.encoder.layers.4.mlp.linear_fc1.weight module.encoder.layers.3.self_attention.linear_proj.weight module.encoder.layers.1.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.0.mlp.linear_fc1.bias module.lm_head.layer_norm.weight module.encoder.layers.3.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.4.self_attention.linear_qkv.bias module.encoder.layers.1.mlp.linear_fc1.bias module.embedding.position_embeddings.weight module.encoder.layers.5.mlp.linear_fc1.layer_norm_bias module.encoder.layers.3.mlp.linear_fc1.weight module.encoder.layers.2.self_attention.linear_proj.weight module.encoder.layers.5.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.2.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.1.mlp.linear_fc2.bias module.lm_head.layer_norm.bias module.encoder.layers.3.self_attention.linear_qkv.bias module.encoder.layers.0.mlp.linear_fc2.weight module.encoder.layers.5.self_attention.linear_proj.bias module.encoder.layers.4.mlp.linear_fc1.layer_norm_bias module.encoder.final_layernorm.bias module.encoder.layers.5.self_attention.linear_qkv.weight module.encoder.layers.2.mlp.linear_fc1.weight module.encoder.layers.0.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.4.self_attention.linear_qkv.layer_norm_weight module.encoder.layers.1.self_attention.linear_qkv.layer_norm_bias module.encoder.layers.2.self_attention.linear_qkv.bias [NeMo I 2025-03-11 20:30:47 utils:302] Setting up optimizer with config OptimizerConfig(optimizer='adam', lr=0.001, min_lr=None, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.01, fp16=False, bf16=True, params_dtype=torch.bfloat16, use_precision_aware_optimizer=False, main_grads_dtype=torch.float32, main_params_dtype=torch.float32, exp_avg_dtype=torch.float32, exp_avg_sq_dtype=torch.float32, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.999, adam_eps=1e-08, sgd_momentum=0.9, use_distributed_optimizer=True, overlap_param_gather_with_optimizer_step=False, optimizer_cpu_offload=False, optimizer_offload_fraction=0.0, use_torch_optimizer_for_cpu_offload=False, overlap_cpu_optimizer_d2h_h2d=False, pin_cpu_grads=True, pin_cpu_params=True, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=False, timers=None, config_logger_dir='') ┏━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃ ┡━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩ │ 0 │ module │ DDP │ 10.3 M │ train │ │ 1 │ module.module │ Float16Module │ 10.3 M │ train │ │ 2 │ module.module.module │ MegatronBioBertMod… │ 10.3 M │ train │ │ 3 │ module.module.module.embedding │ LanguageModelEmbed… │ 7.0 M │ train │ │ 4 │ module.module.module.encoder │ TransformerBlock │ 3.2 M │ train │ │ 5 │ module.module.module.lm_head │ BertLMHead │ 66.3 K │ train │ │ 6 │ module.module.module.output_layer │ ColumnParallelLine… │ 25.5 K │ train │ └───┴───────────────────────────────────┴─────────────────────┴────────┴───────┘ Trainable params: 10.3 M Non-trainable params: 0 Total params: 10.3 M Total estimated model params size (MB): 41 Modules in train mode: 134 Modules in eval mode: 0 [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=63` in the `DataLoader` to improve performance. Sanity checking Validation: iteration 1/2 Sanity checking Validation: iteration 2/2 [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('global_batch_size', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices. [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices. [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=63` in the `DataLoader` to improve performance. [NeMo W 2025-03-11 20:30:50 rerun_state_machine:1264] Implicit initialization of Rerun State Machine! [NeMo W 2025-03-11 20:30:50 rerun_state_machine:239] RerunStateMachine initialized in mode RerunMode.DISABLED Training epoch 0, iteration 0/499 | lr: 0 | global_batch_size: 8 | global_step: 0 | reduced_train_loss: 10.2 Training epoch 0, iteration 1/499 | lr: 0.0002 | global_batch_size: 8 | global_step: 1 | reduced_train_loss: 10.22 | consumed_samples: 16 Training epoch 0, iteration 2/499 | lr: 0.0004 | global_batch_size: 8 | global_step: 2 | reduced_train_loss: 10.16 | consumed_samples: 24 Training epoch 0, iteration 3/499 | lr: 0.0006 | global_batch_size: 8 | global_step: 3 | reduced_train_loss: 10.15 | consumed_samples: 32 Training epoch 0, iteration 4/499 | lr: 0.0008 | global_batch_size: 8 | global_step: 4 | reduced_train_loss: 10.1 | consumed_samples: 40 Training epoch 0, iteration 5/499 | lr: 0.001 | global_batch_size: 8 | global_step: 5 | reduced_train_loss: 10.03 | consumed_samples: 48 Training epoch 0, iteration 6/499 | lr: 0.001 | global_batch_size: 8 | global_step: 6 | reduced_train_loss: 9.936 | consumed_samples: 56 Training epoch 0, iteration 7/499 | lr: 0.001 | global_batch_size: 8 | global_step: 7 | reduced_train_loss: 9.892 | consumed_samples: 64 Training epoch 0, iteration 8/499 | lr: 0.0009999 | global_batch_size: 8 | global_step: 8 | reduced_train_loss: 9.755 | consumed_samples: 72 Training epoch 0, iteration 9/499 | lr: 0.0009998 | global_batch_size: 8 | global_step: 9 | reduced_train_loss: 9.874 | consumed_samples: 80 Training epoch 0, iteration 10/499 | lr: 0.0009997 | global_batch_size: 8 | global_step: 10 | reduced_train_loss: 9.638 | consumed_samples: 88 Training epoch 0, iteration 11/499 | lr: 0.0009996 | global_batch_size: 8 | global_step: 11 | reduced_train_loss: 9.563 | consumed_samples: 96 Training epoch 0, iteration 12/499 | lr: 0.0009995 | global_batch_size: 8 | global_step: 12 | reduced_train_loss: 9.439 | consumed_samples: 104 Training epoch 0, iteration 13/499 | lr: 0.0009993 | global_batch_size: 8 | global_step: 13 | reduced_train_loss: 9.354 | consumed_samples: 112 Training epoch 0, iteration 14/499 | lr: 0.0009991 | global_batch_size: 8 | global_step: 14 | reduced_train_loss: 9.363 | consumed_samples: 120 Training epoch 0, iteration 15/499 | lr: 0.0009989 | global_batch_size: 8 | global_step: 15 | reduced_train_loss: 9.414 | consumed_samples: 128 Training epoch 0, iteration 16/499 | lr: 0.0009987 | global_batch_size: 8 | global_step: 16 | reduced_train_loss: 9.712 | consumed_samples: 136 Training epoch 0, iteration 17/499 | lr: 0.0009984 | global_batch_size: 8 | global_step: 17 | reduced_train_loss: 9.278 | consumed_samples: 144 Training epoch 0, iteration 18/499 | lr: 0.0009981 | global_batch_size: 8 | global_step: 18 | reduced_train_loss: 9.324 | consumed_samples: 152 Training epoch 0, iteration 19/499 | lr: 0.0009978 | global_batch_size: 8 | global_step: 19 | reduced_train_loss: 9.349 | consumed_samples: 160 Training epoch 0, iteration 20/499 | lr: 0.0009975 | global_batch_size: 8 | global_step: 20 | reduced_train_loss: 9.126 | consumed_samples: 168 Training epoch 0, iteration 21/499 | lr: 0.0009972 | global_batch_size: 8 | global_step: 21 | reduced_train_loss: 9.195 | consumed_samples: 176 Training epoch 0, iteration 22/499 | lr: 0.0009968 | global_batch_size: 8 | global_step: 22 | reduced_train_loss: 9.149 | consumed_samples: 184 Training epoch 0, iteration 23/499 | lr: 0.0009964 | global_batch_size: 8 | global_step: 23 | reduced_train_loss: 9.092 | consumed_samples: 192 Training epoch 0, iteration 24/499 | lr: 0.000996 | global_batch_size: 8 | global_step: 24 | reduced_train_loss: 9.053 | consumed_samples: 200 Training epoch 0, iteration 25/499 | lr: 0.0009956 | global_batch_size: 8 | global_step: 25 | reduced_train_loss: 9.113 | consumed_samples: 208 Training epoch 0, iteration 26/499 | lr: 0.0009951 | global_batch_size: 8 | global_step: 26 | reduced_train_loss: 9.053 | consumed_samples: 216 Training epoch 0, iteration 27/499 | lr: 0.0009947 | global_batch_size: 8 | global_step: 27 | reduced_train_loss: 9.014 | consumed_samples: 224 Training epoch 0, iteration 28/499 | lr: 0.0009942 | global_batch_size: 8 | global_step: 28 | reduced_train_loss: 9.143 | consumed_samples: 232 Training epoch 0, iteration 29/499 | lr: 0.0009936 | global_batch_size: 8 | global_step: 29 | reduced_train_loss: 9.279 | consumed_samples: 240 Training epoch 0, iteration 30/499 | lr: 0.0009931 | global_batch_size: 8 | global_step: 30 | reduced_train_loss: 9.261 | consumed_samples: 248 Training epoch 0, iteration 31/499 | lr: 0.0009925 | global_batch_size: 8 | global_step: 31 | reduced_train_loss: 9.094 | consumed_samples: 256 Training epoch 0, iteration 32/499 | lr: 0.000992 | global_batch_size: 8 | global_step: 32 | reduced_train_loss: 9.157 | consumed_samples: 264 Training epoch 0, iteration 33/499 | lr: 0.0009914 | global_batch_size: 8 | global_step: 33 | reduced_train_loss: 9.072 | consumed_samples: 272 Training epoch 0, iteration 34/499 | lr: 0.0009907 | global_batch_size: 8 | global_step: 34 | reduced_train_loss: 9.077 | consumed_samples: 280 Training epoch 0, iteration 35/499 | lr: 0.0009901 | global_batch_size: 8 | global_step: 35 | reduced_train_loss: 9.081 | consumed_samples: 288 Training epoch 0, iteration 36/499 | lr: 0.0009894 | global_batch_size: 8 | global_step: 36 | reduced_train_loss: 9.219 | consumed_samples: 296 Training epoch 0, iteration 37/499 | lr: 0.0009887 | global_batch_size: 8 | global_step: 37 | reduced_train_loss: 9.05 | consumed_samples: 304 Training epoch 0, iteration 38/499 | lr: 0.000988 | global_batch_size: 8 | global_step: 38 | reduced_train_loss: 9.129 | consumed_samples: 312 Training epoch 0, iteration 39/499 | lr: 0.0009873 | global_batch_size: 8 | global_step: 39 | reduced_train_loss: 8.964 | consumed_samples: 320 Training epoch 0, iteration 40/499 | lr: 0.0009865 | global_batch_size: 8 | global_step: 40 | reduced_train_loss: 9.11 | consumed_samples: 328 Training epoch 0, iteration 41/499 | lr: 0.0009857 | global_batch_size: 8 | global_step: 41 | reduced_train_loss: 9.192 | consumed_samples: 336 Training epoch 0, iteration 42/499 | lr: 0.0009849 | global_batch_size: 8 | global_step: 42 | reduced_train_loss: 9.07 | consumed_samples: 344 Training epoch 0, iteration 43/499 | lr: 0.0009841 | global_batch_size: 8 | global_step: 43 | reduced_train_loss: 9.102 | consumed_samples: 352 Training epoch 0, iteration 44/499 | lr: 0.0009833 | global_batch_size: 8 | global_step: 44 | reduced_train_loss: 8.867 | consumed_samples: 360 Training epoch 0, iteration 45/499 | lr: 0.0009824 | global_batch_size: 8 | global_step: 45 | reduced_train_loss: 8.885 | consumed_samples: 368 Training epoch 0, iteration 46/499 | lr: 0.0009815 | global_batch_size: 8 | global_step: 46 | reduced_train_loss: 9.076 | consumed_samples: 376 Training epoch 0, iteration 47/499 | lr: 0.0009806 | global_batch_size: 8 | global_step: 47 | reduced_train_loss: 8.848 | consumed_samples: 384 Training epoch 0, iteration 48/499 | lr: 0.0009797 | global_batch_size: 8 | global_step: 48 | reduced_train_loss: 9.141 | consumed_samples: 392 Training epoch 0, iteration 49/499 | lr: 0.0009787 | global_batch_size: 8 | global_step: 49 | reduced_train_loss: 9.034 | consumed_samples: 400 Training epoch 0, iteration 50/499 | lr: 0.0009778 | global_batch_size: 8 | global_step: 50 | reduced_train_loss: 8.954 | consumed_samples: 408 Training epoch 0, iteration 51/499 | lr: 0.0009768 | global_batch_size: 8 | global_step: 51 | reduced_train_loss: 8.871 | consumed_samples: 416 Training epoch 0, iteration 52/499 | lr: 0.0009758 | global_batch_size: 8 | global_step: 52 | reduced_train_loss: 9.038 | consumed_samples: 424 Training epoch 0, iteration 53/499 | lr: 0.0009747 | global_batch_size: 8 | global_step: 53 | reduced_train_loss: 8.948 | consumed_samples: 432 Training epoch 0, iteration 54/499 | lr: 0.0009737 | global_batch_size: 8 | global_step: 54 | reduced_train_loss: 8.997 | consumed_samples: 440 Training epoch 0, iteration 55/499 | lr: 0.0009726 | global_batch_size: 8 | global_step: 55 | reduced_train_loss: 9.07 | consumed_samples: 448 Training epoch 0, iteration 56/499 | lr: 0.0009715 | global_batch_size: 8 | global_step: 56 | reduced_train_loss: 9.043 | consumed_samples: 456 Training epoch 0, iteration 57/499 | lr: 0.0009704 | global_batch_size: 8 | global_step: 57 | reduced_train_loss: 8.786 | consumed_samples: 464 Training epoch 0, iteration 58/499 | lr: 0.0009693 | global_batch_size: 8 | global_step: 58 | reduced_train_loss: 9.051 | consumed_samples: 472 Training epoch 0, iteration 59/499 | lr: 0.0009681 | global_batch_size: 8 | global_step: 59 | reduced_train_loss: 8.978 | consumed_samples: 480 Training epoch 0, iteration 60/499 | lr: 0.0009669 | global_batch_size: 8 | global_step: 60 | reduced_train_loss: 9.03 | consumed_samples: 488 Training epoch 0, iteration 61/499 | lr: 0.0009657 | global_batch_size: 8 | global_step: 61 | reduced_train_loss: 9.033 | consumed_samples: 496 Training epoch 0, iteration 62/499 | lr: 0.0009645 | global_batch_size: 8 | global_step: 62 | reduced_train_loss: 8.892 | consumed_samples: 504 Training epoch 0, iteration 63/499 | lr: 0.0009633 | global_batch_size: 8 | global_step: 63 | reduced_train_loss: 8.979 | consumed_samples: 512 Training epoch 0, iteration 64/499 | lr: 0.000962 | global_batch_size: 8 | global_step: 64 | reduced_train_loss: 8.886 | consumed_samples: 520 Training epoch 0, iteration 65/499 | lr: 0.0009607 | global_batch_size: 8 | global_step: 65 | reduced_train_loss: 8.889 | consumed_samples: 528 Training epoch 0, iteration 66/499 | lr: 0.0009594 | global_batch_size: 8 | global_step: 66 | reduced_train_loss: 8.986 | consumed_samples: 536 Training epoch 0, iteration 67/499 | lr: 0.0009581 | global_batch_size: 8 | global_step: 67 | reduced_train_loss: 8.949 | consumed_samples: 544 Training epoch 0, iteration 68/499 | lr: 0.0009568 | global_batch_size: 8 | global_step: 68 | reduced_train_loss: 8.966 | consumed_samples: 552 Training epoch 0, iteration 69/499 | lr: 0.0009554 | global_batch_size: 8 | global_step: 69 | reduced_train_loss: 8.76 | consumed_samples: 560 Training epoch 0, iteration 70/499 | lr: 0.000954 | global_batch_size: 8 | global_step: 70 | reduced_train_loss: 8.899 | consumed_samples: 568 Training epoch 0, iteration 71/499 | lr: 0.0009526 | global_batch_size: 8 | global_step: 71 | reduced_train_loss: 8.761 | consumed_samples: 576 Training epoch 0, iteration 72/499 | lr: 0.0009512 | global_batch_size: 8 | global_step: 72 | reduced_train_loss: 8.817 | consumed_samples: 584 Training epoch 0, iteration 73/499 | lr: 0.0009497 | global_batch_size: 8 | global_step: 73 | reduced_train_loss: 8.963 | consumed_samples: 592 Training epoch 0, iteration 74/499 | lr: 0.0009483 | global_batch_size: 8 | global_step: 74 | reduced_train_loss: 8.893 | consumed_samples: 600 Training epoch 0, iteration 75/499 | lr: 0.0009468 | global_batch_size: 8 | global_step: 75 | reduced_train_loss: 8.835 | consumed_samples: 608 Training epoch 0, iteration 76/499 | lr: 0.0009453 | global_batch_size: 8 | global_step: 76 | reduced_train_loss: 9.061 | consumed_samples: 616 Training epoch 0, iteration 77/499 | lr: 0.0009438 | global_batch_size: 8 | global_step: 77 | reduced_train_loss: 8.866 | consumed_samples: 624 Training epoch 0, iteration 78/499 | lr: 0.0009422 | global_batch_size: 8 | global_step: 78 | reduced_train_loss: 8.981 | consumed_samples: 632 Training epoch 0, iteration 79/499 | lr: 0.0009407 | global_batch_size: 8 | global_step: 79 | reduced_train_loss: 8.91 | consumed_samples: 640 Training epoch 0, iteration 80/499 | lr: 0.0009391 | global_batch_size: 8 | global_step: 80 | reduced_train_loss: 8.661 | consumed_samples: 648 Training epoch 0, iteration 81/499 | lr: 0.0009375 | global_batch_size: 8 | global_step: 81 | reduced_train_loss: 8.677 | consumed_samples: 656 Training epoch 0, iteration 82/499 | lr: 0.0009359 | global_batch_size: 8 | global_step: 82 | reduced_train_loss: 8.889 | consumed_samples: 664 Training epoch 0, iteration 83/499 | lr: 0.0009342 | global_batch_size: 8 | global_step: 83 | reduced_train_loss: 8.864 | consumed_samples: 672 Training epoch 0, iteration 84/499 | lr: 0.0009326 | global_batch_size: 8 | global_step: 84 | reduced_train_loss: 8.712 | consumed_samples: 680 Training epoch 0, iteration 85/499 | lr: 0.0009309 | global_batch_size: 8 | global_step: 85 | reduced_train_loss: 8.97 | consumed_samples: 688 Training epoch 0, iteration 86/499 | lr: 0.0009292 | global_batch_size: 8 | global_step: 86 | reduced_train_loss: 8.945 | consumed_samples: 696 Training epoch 0, iteration 87/499 | lr: 0.0009275 | global_batch_size: 8 | global_step: 87 | reduced_train_loss: 8.979 | consumed_samples: 704 Training epoch 0, iteration 88/499 | lr: 0.0009258 | global_batch_size: 8 | global_step: 88 | reduced_train_loss: 8.766 | consumed_samples: 712 Training epoch 0, iteration 89/499 | lr: 0.000924 | global_batch_size: 8 | global_step: 89 | reduced_train_loss: 8.86 | consumed_samples: 720 Training epoch 0, iteration 90/499 | lr: 0.0009222 | global_batch_size: 8 | global_step: 90 | reduced_train_loss: 8.928 | consumed_samples: 728 Training epoch 0, iteration 91/499 | lr: 0.0009204 | global_batch_size: 8 | global_step: 91 | reduced_train_loss: 8.563 | consumed_samples: 736 Training epoch 0, iteration 92/499 | lr: 0.0009186 | global_batch_size: 8 | global_step: 92 | reduced_train_loss: 8.892 | consumed_samples: 744 Training epoch 0, iteration 93/499 | lr: 0.0009168 | global_batch_size: 8 | global_step: 93 | reduced_train_loss: 8.851 | consumed_samples: 752 Training epoch 0, iteration 94/499 | lr: 0.000915 | global_batch_size: 8 | global_step: 94 | reduced_train_loss: 8.94 | consumed_samples: 760 Training epoch 0, iteration 95/499 | lr: 0.0009131 | global_batch_size: 8 | global_step: 95 | reduced_train_loss: 8.961 | consumed_samples: 768 Training epoch 0, iteration 96/499 | lr: 0.0009112 | global_batch_size: 8 | global_step: 96 | reduced_train_loss: 8.927 | consumed_samples: 776 Training epoch 0, iteration 97/499 | lr: 0.0009093 | global_batch_size: 8 | global_step: 97 | reduced_train_loss: 8.928 | consumed_samples: 784 Training epoch 0, iteration 98/499 | lr: 0.0009074 | global_batch_size: 8 | global_step: 98 | reduced_train_loss: 8.887 | consumed_samples: 792 Training epoch 0, iteration 99/499 | lr: 0.0009055 | global_batch_size: 8 | global_step: 99 | reduced_train_loss: 8.816 | consumed_samples: 800 [INFO | pytorch_lightning.utilities.rank_zero]: Epoch 0, global step 99: 'reduced_train_loss' reached 8.81633 (best 8.81633), saving model to '/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0.ckpt' as top 2 [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [NeMo I 2025-03-11 20:30:58 nemo_logging:393] Using FullyParallelSaveStrategyWrapper(torch_dist, 1) dist-ckpt save strategy. [NeMo W 2025-03-11 20:31:07 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:163: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:239: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:985: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1044: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:25: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:61: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:757: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:835: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [NeMo I 2025-03-11 20:31:10 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 99 : Start time: 1741725058.995s : Save duration: 11.516s [NeMo I 2025-03-11 20:31:13 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0.ckpt [NeMo I 2025-03-11 20:31:13 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 99 : Start time: 1741725073.552s : Save duration: 0.050s [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0-last.ckpt [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Successfully saved checkpoint from iteration 99 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0.ckpt [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Async checkpoint save for step 100 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Successfully saved checkpoint from iteration 99 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0-last.ckpt [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Async checkpoint save for step 100 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=0.00-step=99-consumed_samples=800.0-last.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:16 nemo_logging:393] Async finalization time took 0.048 s Validation: iteration 1/2 Validation: iteration 2/2 Validation: iteration 3/2 Validation: iteration 4/2 Validation: iteration 5/2 Validation: iteration 6/2 Validation: iteration 7/2 Validation: iteration 8/2 Training epoch 0, iteration 100/499 | lr: 0.0009035 | global_batch_size: 8 | global_step: 100 | reduced_train_loss: 8.874 | consumed_samples: 808 | val_loss: 8.996 Training epoch 0, iteration 101/499 | lr: 0.0009015 | global_batch_size: 8 | global_step: 101 | reduced_train_loss: 8.698 | consumed_samples: 816 | val_loss: 8.996 Training epoch 0, iteration 102/499 | lr: 0.0008995 | global_batch_size: 8 | global_step: 102 | reduced_train_loss: 8.859 | consumed_samples: 824 | val_loss: 8.996 Training epoch 0, iteration 103/499 | lr: 0.0008975 | global_batch_size: 8 | global_step: 103 | reduced_train_loss: 8.825 | consumed_samples: 832 | val_loss: 8.996 Training epoch 0, iteration 104/499 | lr: 0.0008955 | global_batch_size: 8 | global_step: 104 | reduced_train_loss: 8.981 | consumed_samples: 840 | val_loss: 8.996 Training epoch 0, iteration 105/499 | lr: 0.0008935 | global_batch_size: 8 | global_step: 105 | reduced_train_loss: 8.89 | consumed_samples: 848 | val_loss: 8.996 Training epoch 0, iteration 106/499 | lr: 0.0008914 | global_batch_size: 8 | global_step: 106 | reduced_train_loss: 8.848 | consumed_samples: 856 | val_loss: 8.996 Training epoch 0, iteration 107/499 | lr: 0.0008893 | global_batch_size: 8 | global_step: 107 | reduced_train_loss: 8.858 | consumed_samples: 864 | val_loss: 8.996 Training epoch 0, iteration 108/499 | lr: 0.0008872 | global_batch_size: 8 | global_step: 108 | reduced_train_loss: 8.841 | consumed_samples: 872 | val_loss: 8.996 Training epoch 0, iteration 109/499 | lr: 0.0008851 | global_batch_size: 8 | global_step: 109 | reduced_train_loss: 8.904 | consumed_samples: 880 | val_loss: 8.996 Training epoch 0, iteration 110/499 | lr: 0.000883 | global_batch_size: 8 | global_step: 110 | reduced_train_loss: 8.881 | consumed_samples: 888 | val_loss: 8.996 Training epoch 0, iteration 111/499 | lr: 0.0008809 | global_batch_size: 8 | global_step: 111 | reduced_train_loss: 8.785 | consumed_samples: 896 | val_loss: 8.996 Training epoch 0, iteration 112/499 | lr: 0.0008787 | global_batch_size: 8 | global_step: 112 | reduced_train_loss: 8.718 | consumed_samples: 904 | val_loss: 8.996 Training epoch 0, iteration 113/499 | lr: 0.0008765 | global_batch_size: 8 | global_step: 113 | reduced_train_loss: 8.827 | consumed_samples: 912 | val_loss: 8.996 Training epoch 0, iteration 114/499 | lr: 0.0008743 | global_batch_size: 8 | global_step: 114 | reduced_train_loss: 8.746 | consumed_samples: 920 | val_loss: 8.996 Training epoch 0, iteration 115/499 | lr: 0.0008721 | global_batch_size: 8 | global_step: 115 | reduced_train_loss: 8.902 | consumed_samples: 928 | val_loss: 8.996 Training epoch 0, iteration 116/499 | lr: 0.0008699 | global_batch_size: 8 | global_step: 116 | reduced_train_loss: 8.507 | consumed_samples: 936 | val_loss: 8.996 Training epoch 0, iteration 117/499 | lr: 0.0008676 | global_batch_size: 8 | global_step: 117 | reduced_train_loss: 8.645 | consumed_samples: 944 | val_loss: 8.996 Training epoch 0, iteration 118/499 | lr: 0.0008654 | global_batch_size: 8 | global_step: 118 | reduced_train_loss: 8.812 | consumed_samples: 952 | val_loss: 8.996 Training epoch 0, iteration 119/499 | lr: 0.0008631 | global_batch_size: 8 | global_step: 119 | reduced_train_loss: 8.86 | consumed_samples: 960 | val_loss: 8.996 Training epoch 0, iteration 120/499 | lr: 0.0008608 | global_batch_size: 8 | global_step: 120 | reduced_train_loss: 8.89 | consumed_samples: 968 | val_loss: 8.996 Training epoch 0, iteration 121/499 | lr: 0.0008585 | global_batch_size: 8 | global_step: 121 | reduced_train_loss: 8.709 | consumed_samples: 976 | val_loss: 8.996 Training epoch 0, iteration 122/499 | lr: 0.0008562 | global_batch_size: 8 | global_step: 122 | reduced_train_loss: 9.014 | consumed_samples: 984 | val_loss: 8.996 Training epoch 0, iteration 123/499 | lr: 0.0008538 | global_batch_size: 8 | global_step: 123 | reduced_train_loss: 8.737 | consumed_samples: 992 | val_loss: 8.996 Training epoch 0, iteration 124/499 | lr: 0.0008515 | global_batch_size: 8 | global_step: 124 | reduced_train_loss: 8.687 | consumed_samples: 1000 | val_loss: 8.996 Training epoch 0, iteration 125/499 | lr: 0.0008491 | global_batch_size: 8 | global_step: 125 | reduced_train_loss: 8.77 | consumed_samples: 1008 | val_loss: 8.996 Training epoch 0, iteration 126/499 | lr: 0.0008467 | global_batch_size: 8 | global_step: 126 | reduced_train_loss: 8.628 | consumed_samples: 1016 | val_loss: 8.996 Training epoch 0, iteration 127/499 | lr: 0.0008443 | global_batch_size: 8 | global_step: 127 | reduced_train_loss: 8.718 | consumed_samples: 1024 | val_loss: 8.996 Training epoch 0, iteration 128/499 | lr: 0.0008419 | global_batch_size: 8 | global_step: 128 | reduced_train_loss: 8.777 | consumed_samples: 1032 | val_loss: 8.996 Training epoch 0, iteration 129/499 | lr: 0.0008395 | global_batch_size: 8 | global_step: 129 | reduced_train_loss: 8.786 | consumed_samples: 1040 | val_loss: 8.996 Training epoch 0, iteration 130/499 | lr: 0.000837 | global_batch_size: 8 | global_step: 130 | reduced_train_loss: 8.79 | consumed_samples: 1048 | val_loss: 8.996 Training epoch 0, iteration 131/499 | lr: 0.0008346 | global_batch_size: 8 | global_step: 131 | reduced_train_loss: 8.72 | consumed_samples: 1056 | val_loss: 8.996 Training epoch 0, iteration 132/499 | lr: 0.0008321 | global_batch_size: 8 | global_step: 132 | reduced_train_loss: 8.834 | consumed_samples: 1064 | val_loss: 8.996 Training epoch 0, iteration 133/499 | lr: 0.0008296 | global_batch_size: 8 | global_step: 133 | reduced_train_loss: 8.801 | consumed_samples: 1072 | val_loss: 8.996 Training epoch 0, iteration 134/499 | lr: 0.0008271 | global_batch_size: 8 | global_step: 134 | reduced_train_loss: 8.897 | consumed_samples: 1080 | val_loss: 8.996 Training epoch 0, iteration 135/499 | lr: 0.0008246 | global_batch_size: 8 | global_step: 135 | reduced_train_loss: 8.705 | consumed_samples: 1088 | val_loss: 8.996 Training epoch 0, iteration 136/499 | lr: 0.0008221 | global_batch_size: 8 | global_step: 136 | reduced_train_loss: 8.761 | consumed_samples: 1096 | val_loss: 8.996 Training epoch 0, iteration 137/499 | lr: 0.0008195 | global_batch_size: 8 | global_step: 137 | reduced_train_loss: 8.762 | consumed_samples: 1104 | val_loss: 8.996 Training epoch 0, iteration 138/499 | lr: 0.0008169 | global_batch_size: 8 | global_step: 138 | reduced_train_loss: 8.767 | consumed_samples: 1112 | val_loss: 8.996 Training epoch 0, iteration 139/499 | lr: 0.0008144 | global_batch_size: 8 | global_step: 139 | reduced_train_loss: 8.737 | consumed_samples: 1120 | val_loss: 8.996 Training epoch 0, iteration 140/499 | lr: 0.0008118 | global_batch_size: 8 | global_step: 140 | reduced_train_loss: 8.767 | consumed_samples: 1128 | val_loss: 8.996 Training epoch 0, iteration 141/499 | lr: 0.0008092 | global_batch_size: 8 | global_step: 141 | reduced_train_loss: 8.763 | consumed_samples: 1136 | val_loss: 8.996 Training epoch 0, iteration 142/499 | lr: 0.0008066 | global_batch_size: 8 | global_step: 142 | reduced_train_loss: 8.752 | consumed_samples: 1144 | val_loss: 8.996 Training epoch 0, iteration 143/499 | lr: 0.0008039 | global_batch_size: 8 | global_step: 143 | reduced_train_loss: 8.763 | consumed_samples: 1152 | val_loss: 8.996 Training epoch 0, iteration 144/499 | lr: 0.0008013 | global_batch_size: 8 | global_step: 144 | reduced_train_loss: 8.75 | consumed_samples: 1160 | val_loss: 8.996 Training epoch 0, iteration 145/499 | lr: 0.0007986 | global_batch_size: 8 | global_step: 145 | reduced_train_loss: 8.859 | consumed_samples: 1168 | val_loss: 8.996 Training epoch 0, iteration 146/499 | lr: 0.000796 | global_batch_size: 8 | global_step: 146 | reduced_train_loss: 8.851 | consumed_samples: 1176 | val_loss: 8.996 Training epoch 0, iteration 147/499 | lr: 0.0007933 | global_batch_size: 8 | global_step: 147 | reduced_train_loss: 8.772 | consumed_samples: 1184 | val_loss: 8.996 Training epoch 0, iteration 148/499 | lr: 0.0007906 | global_batch_size: 8 | global_step: 148 | reduced_train_loss: 8.784 | consumed_samples: 1192 | val_loss: 8.996 Training epoch 0, iteration 149/499 | lr: 0.0007879 | global_batch_size: 8 | global_step: 149 | reduced_train_loss: 8.637 | consumed_samples: 1200 | val_loss: 8.996 Training epoch 0, iteration 150/499 | lr: 0.0007851 | global_batch_size: 8 | global_step: 150 | reduced_train_loss: 8.702 | consumed_samples: 1208 | val_loss: 8.996 Training epoch 0, iteration 151/499 | lr: 0.0007824 | global_batch_size: 8 | global_step: 151 | reduced_train_loss: 8.524 | consumed_samples: 1216 | val_loss: 8.996 Training epoch 0, iteration 152/499 | lr: 0.0007797 | global_batch_size: 8 | global_step: 152 | reduced_train_loss: 8.797 | consumed_samples: 1224 | val_loss: 8.996 Training epoch 0, iteration 153/499 | lr: 0.0007769 | global_batch_size: 8 | global_step: 153 | reduced_train_loss: 8.727 | consumed_samples: 1232 | val_loss: 8.996 Training epoch 0, iteration 154/499 | lr: 0.0007741 | global_batch_size: 8 | global_step: 154 | reduced_train_loss: 8.757 | consumed_samples: 1240 | val_loss: 8.996 Training epoch 0, iteration 155/499 | lr: 0.0007714 | global_batch_size: 8 | global_step: 155 | reduced_train_loss: 8.768 | consumed_samples: 1248 | val_loss: 8.996 Training epoch 0, iteration 156/499 | lr: 0.0007686 | global_batch_size: 8 | global_step: 156 | reduced_train_loss: 8.805 | consumed_samples: 1256 | val_loss: 8.996 Training epoch 0, iteration 157/499 | lr: 0.0007657 | global_batch_size: 8 | global_step: 157 | reduced_train_loss: 8.787 | consumed_samples: 1264 | val_loss: 8.996 Training epoch 0, iteration 158/499 | lr: 0.0007629 | global_batch_size: 8 | global_step: 158 | reduced_train_loss: 8.684 | consumed_samples: 1272 | val_loss: 8.996 Training epoch 0, iteration 159/499 | lr: 0.0007601 | global_batch_size: 8 | global_step: 159 | reduced_train_loss: 8.518 | consumed_samples: 1280 | val_loss: 8.996 Training epoch 0, iteration 160/499 | lr: 0.0007573 | global_batch_size: 8 | global_step: 160 | reduced_train_loss: 8.745 | consumed_samples: 1288 | val_loss: 8.996 Training epoch 0, iteration 161/499 | lr: 0.0007544 | global_batch_size: 8 | global_step: 161 | reduced_train_loss: 8.697 | consumed_samples: 1296 | val_loss: 8.996 Training epoch 0, iteration 162/499 | lr: 0.0007515 | global_batch_size: 8 | global_step: 162 | reduced_train_loss: 8.616 | consumed_samples: 1304 | val_loss: 8.996 Training epoch 0, iteration 163/499 | lr: 0.0007487 | global_batch_size: 8 | global_step: 163 | reduced_train_loss: 8.707 | consumed_samples: 1312 | val_loss: 8.996 Training epoch 0, iteration 164/499 | lr: 0.0007458 | global_batch_size: 8 | global_step: 164 | reduced_train_loss: 8.794 | consumed_samples: 1320 | val_loss: 8.996 Training epoch 0, iteration 165/499 | lr: 0.0007429 | global_batch_size: 8 | global_step: 165 | reduced_train_loss: 8.714 | consumed_samples: 1328 | val_loss: 8.996 Training epoch 0, iteration 166/499 | lr: 0.00074 | global_batch_size: 8 | global_step: 166 | reduced_train_loss: 8.735 | consumed_samples: 1336 | val_loss: 8.996 Training epoch 0, iteration 167/499 | lr: 0.0007371 | global_batch_size: 8 | global_step: 167 | reduced_train_loss: 8.578 | consumed_samples: 1344 | val_loss: 8.996 Training epoch 0, iteration 168/499 | lr: 0.0007341 | global_batch_size: 8 | global_step: 168 | reduced_train_loss: 8.877 | consumed_samples: 1352 | val_loss: 8.996 Training epoch 0, iteration 169/499 | lr: 0.0007312 | global_batch_size: 8 | global_step: 169 | reduced_train_loss: 8.65 | consumed_samples: 1360 | val_loss: 8.996 Training epoch 0, iteration 170/499 | lr: 0.0007283 | global_batch_size: 8 | global_step: 170 | reduced_train_loss: 8.895 | consumed_samples: 1368 | val_loss: 8.996 Training epoch 0, iteration 171/499 | lr: 0.0007253 | global_batch_size: 8 | global_step: 171 | reduced_train_loss: 8.821 | consumed_samples: 1376 | val_loss: 8.996 Training epoch 0, iteration 172/499 | lr: 0.0007223 | global_batch_size: 8 | global_step: 172 | reduced_train_loss: 8.754 | consumed_samples: 1384 | val_loss: 8.996 Training epoch 0, iteration 173/499 | lr: 0.0007193 | global_batch_size: 8 | global_step: 173 | reduced_train_loss: 8.696 | consumed_samples: 1392 | val_loss: 8.996 Training epoch 0, iteration 174/499 | lr: 0.0007164 | global_batch_size: 8 | global_step: 174 | reduced_train_loss: 8.816 | consumed_samples: 1400 | val_loss: 8.996 Training epoch 0, iteration 175/499 | lr: 0.0007134 | global_batch_size: 8 | global_step: 175 | reduced_train_loss: 8.761 | consumed_samples: 1408 | val_loss: 8.996 Training epoch 0, iteration 176/499 | lr: 0.0007104 | global_batch_size: 8 | global_step: 176 | reduced_train_loss: 8.411 | consumed_samples: 1416 | val_loss: 8.996 Training epoch 0, iteration 177/499 | lr: 0.0007073 | global_batch_size: 8 | global_step: 177 | reduced_train_loss: 8.532 | consumed_samples: 1424 | val_loss: 8.996 Training epoch 0, iteration 178/499 | lr: 0.0007043 | global_batch_size: 8 | global_step: 178 | reduced_train_loss: 8.684 | consumed_samples: 1432 | val_loss: 8.996 Training epoch 0, iteration 179/499 | lr: 0.0007013 | global_batch_size: 8 | global_step: 179 | reduced_train_loss: 8.628 | consumed_samples: 1440 | val_loss: 8.996 Training epoch 0, iteration 180/499 | lr: 0.0006982 | global_batch_size: 8 | global_step: 180 | reduced_train_loss: 8.808 | consumed_samples: 1448 | val_loss: 8.996 Training epoch 0, iteration 181/499 | lr: 0.0006952 | global_batch_size: 8 | global_step: 181 | reduced_train_loss: 8.668 | consumed_samples: 1456 | val_loss: 8.996 Training epoch 0, iteration 182/499 | lr: 0.0006921 | global_batch_size: 8 | global_step: 182 | reduced_train_loss: 8.677 | consumed_samples: 1464 | val_loss: 8.996 Training epoch 0, iteration 183/499 | lr: 0.0006891 | global_batch_size: 8 | global_step: 183 | reduced_train_loss: 8.744 | consumed_samples: 1472 | val_loss: 8.996 Training epoch 0, iteration 184/499 | lr: 0.000686 | global_batch_size: 8 | global_step: 184 | reduced_train_loss: 8.445 | consumed_samples: 1480 | val_loss: 8.996 Training epoch 0, iteration 185/499 | lr: 0.0006829 | global_batch_size: 8 | global_step: 185 | reduced_train_loss: 8.605 | consumed_samples: 1488 | val_loss: 8.996 Training epoch 0, iteration 186/499 | lr: 0.0006798 | global_batch_size: 8 | global_step: 186 | reduced_train_loss: 8.701 | consumed_samples: 1496 | val_loss: 8.996 Training epoch 0, iteration 187/499 | lr: 0.0006767 | global_batch_size: 8 | global_step: 187 | reduced_train_loss: 8.774 | consumed_samples: 1504 | val_loss: 8.996 Training epoch 0, iteration 188/499 | lr: 0.0006736 | global_batch_size: 8 | global_step: 188 | reduced_train_loss: 8.406 | consumed_samples: 1512 | val_loss: 8.996 Training epoch 0, iteration 189/499 | lr: 0.0006705 | global_batch_size: 8 | global_step: 189 | reduced_train_loss: 8.471 | consumed_samples: 1520 | val_loss: 8.996 Training epoch 0, iteration 190/499 | lr: 0.0006674 | global_batch_size: 8 | global_step: 190 | reduced_train_loss: 8.52 | consumed_samples: 1528 | val_loss: 8.996 Training epoch 0, iteration 191/499 | lr: 0.0006642 | global_batch_size: 8 | global_step: 191 | reduced_train_loss: 8.649 | consumed_samples: 1536 | val_loss: 8.996 Training epoch 0, iteration 192/499 | lr: 0.0006611 | global_batch_size: 8 | global_step: 192 | reduced_train_loss: 8.708 | consumed_samples: 1544 | val_loss: 8.996 Training epoch 0, iteration 193/499 | lr: 0.000658 | global_batch_size: 8 | global_step: 193 | reduced_train_loss: 8.686 | consumed_samples: 1552 | val_loss: 8.996 Training epoch 0, iteration 194/499 | lr: 0.0006548 | global_batch_size: 8 | global_step: 194 | reduced_train_loss: 8.659 | consumed_samples: 1560 | val_loss: 8.996 Training epoch 0, iteration 195/499 | lr: 0.0006517 | global_batch_size: 8 | global_step: 195 | reduced_train_loss: 8.589 | consumed_samples: 1568 | val_loss: 8.996 Training epoch 0, iteration 196/499 | lr: 0.0006485 | global_batch_size: 8 | global_step: 196 | reduced_train_loss: 8.588 | consumed_samples: 1576 | val_loss: 8.996 Training epoch 0, iteration 197/499 | lr: 0.0006453 | global_batch_size: 8 | global_step: 197 | reduced_train_loss: 8.663 | consumed_samples: 1584 | val_loss: 8.996 Training epoch 0, iteration 198/499 | lr: 0.0006421 | global_batch_size: 8 | global_step: 198 | reduced_train_loss: 8.551 | consumed_samples: 1592 | val_loss: 8.996 Training epoch 0, iteration 199/499 | lr: 0.000639 | global_batch_size: 8 | global_step: 199 | reduced_train_loss: 8.745 | consumed_samples: 1600 | val_loss: 8.996 [INFO | pytorch_lightning.utilities.rank_zero]: Epoch 0, global step 199: 'reduced_train_loss' reached 8.74529 (best 8.74529), saving model to '/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0.ckpt' as top 2 [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [NeMo I 2025-03-11 20:31:25 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 199 : Start time: 1741725085.450s : Save duration: 0.052s [NeMo I 2025-03-11 20:31:28 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0.ckpt [NeMo I 2025-03-11 20:31:28 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 199 : Start time: 1741725088.561s : Save duration: 0.052s [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0-last.ckpt [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Successfully saved checkpoint from iteration 199 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0.ckpt [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Async checkpoint save for step 200 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Successfully saved checkpoint from iteration 199 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0-last.ckpt [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Async checkpoint save for step 200 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=9.00-step=199-consumed_samples=1600.0-last.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:31 nemo_logging:393] Async finalization time took 0.073 s Validation: iteration 1/2 Validation: iteration 2/2 Validation: iteration 3/2 Validation: iteration 4/2 Validation: iteration 5/2 Validation: iteration 6/2 Validation: iteration 7/2 Validation: iteration 8/2 Training epoch 0, iteration 200/499 | lr: 0.0006358 | global_batch_size: 8 | global_step: 200 | reduced_train_loss: 8.536 | consumed_samples: 1608 | val_loss: 8.654 Training epoch 0, iteration 201/499 | lr: 0.0006326 | global_batch_size: 8 | global_step: 201 | reduced_train_loss: 8.639 | consumed_samples: 1616 | val_loss: 8.654 Training epoch 0, iteration 202/499 | lr: 0.0006294 | global_batch_size: 8 | global_step: 202 | reduced_train_loss: 8.474 | consumed_samples: 1624 | val_loss: 8.654 Training epoch 0, iteration 203/499 | lr: 0.0006262 | global_batch_size: 8 | global_step: 203 | reduced_train_loss: 8.569 | consumed_samples: 1632 | val_loss: 8.654 Training epoch 0, iteration 204/499 | lr: 0.000623 | global_batch_size: 8 | global_step: 204 | reduced_train_loss: 8.602 | consumed_samples: 1640 | val_loss: 8.654 Training epoch 0, iteration 205/499 | lr: 0.0006198 | global_batch_size: 8 | global_step: 205 | reduced_train_loss: 8.558 | consumed_samples: 1648 | val_loss: 8.654 Training epoch 0, iteration 206/499 | lr: 0.0006165 | global_batch_size: 8 | global_step: 206 | reduced_train_loss: 8.558 | consumed_samples: 1656 | val_loss: 8.654 Training epoch 0, iteration 207/499 | lr: 0.0006133 | global_batch_size: 8 | global_step: 207 | reduced_train_loss: 8.734 | consumed_samples: 1664 | val_loss: 8.654 Training epoch 0, iteration 208/499 | lr: 0.0006101 | global_batch_size: 8 | global_step: 208 | reduced_train_loss: 8.554 | consumed_samples: 1672 | val_loss: 8.654 Training epoch 0, iteration 209/499 | lr: 0.0006068 | global_batch_size: 8 | global_step: 209 | reduced_train_loss: 8.573 | consumed_samples: 1680 | val_loss: 8.654 Training epoch 0, iteration 210/499 | lr: 0.0006036 | global_batch_size: 8 | global_step: 210 | reduced_train_loss: 8.756 | consumed_samples: 1688 | val_loss: 8.654 Training epoch 0, iteration 211/499 | lr: 0.0006004 | global_batch_size: 8 | global_step: 211 | reduced_train_loss: 8.879 | consumed_samples: 1696 | val_loss: 8.654 Training epoch 0, iteration 212/499 | lr: 0.0005971 | global_batch_size: 8 | global_step: 212 | reduced_train_loss: 8.627 | consumed_samples: 1704 | val_loss: 8.654 Training epoch 0, iteration 213/499 | lr: 0.0005939 | global_batch_size: 8 | global_step: 213 | reduced_train_loss: 8.534 | consumed_samples: 1712 | val_loss: 8.654 Training epoch 0, iteration 214/499 | lr: 0.0005906 | global_batch_size: 8 | global_step: 214 | reduced_train_loss: 8.522 | consumed_samples: 1720 | val_loss: 8.654 Training epoch 0, iteration 215/499 | lr: 0.0005873 | global_batch_size: 8 | global_step: 215 | reduced_train_loss: 8.634 | consumed_samples: 1728 | val_loss: 8.654 Training epoch 0, iteration 216/499 | lr: 0.0005841 | global_batch_size: 8 | global_step: 216 | reduced_train_loss: 8.584 | consumed_samples: 1736 | val_loss: 8.654 Training epoch 0, iteration 217/499 | lr: 0.0005808 | global_batch_size: 8 | global_step: 217 | reduced_train_loss: 8.55 | consumed_samples: 1744 | val_loss: 8.654 Training epoch 0, iteration 218/499 | lr: 0.0005775 | global_batch_size: 8 | global_step: 218 | reduced_train_loss: 8.553 | consumed_samples: 1752 | val_loss: 8.654 Training epoch 0, iteration 219/499 | lr: 0.0005743 | global_batch_size: 8 | global_step: 219 | reduced_train_loss: 8.617 | consumed_samples: 1760 | val_loss: 8.654 Training epoch 0, iteration 220/499 | lr: 0.000571 | global_batch_size: 8 | global_step: 220 | reduced_train_loss: 8.647 | consumed_samples: 1768 | val_loss: 8.654 Training epoch 0, iteration 221/499 | lr: 0.0005677 | global_batch_size: 8 | global_step: 221 | reduced_train_loss: 8.698 | consumed_samples: 1776 | val_loss: 8.654 Training epoch 0, iteration 222/499 | lr: 0.0005644 | global_batch_size: 8 | global_step: 222 | reduced_train_loss: 8.655 | consumed_samples: 1784 | val_loss: 8.654 Training epoch 0, iteration 223/499 | lr: 0.0005611 | global_batch_size: 8 | global_step: 223 | reduced_train_loss: 8.578 | consumed_samples: 1792 | val_loss: 8.654 Training epoch 0, iteration 224/499 | lr: 0.0005578 | global_batch_size: 8 | global_step: 224 | reduced_train_loss: 8.612 | consumed_samples: 1800 | val_loss: 8.654 Training epoch 0, iteration 225/499 | lr: 0.0005545 | global_batch_size: 8 | global_step: 225 | reduced_train_loss: 8.533 | consumed_samples: 1808 | val_loss: 8.654 Training epoch 0, iteration 226/499 | lr: 0.0005513 | global_batch_size: 8 | global_step: 226 | reduced_train_loss: 8.62 | consumed_samples: 1816 | val_loss: 8.654 Training epoch 0, iteration 227/499 | lr: 0.000548 | global_batch_size: 8 | global_step: 227 | reduced_train_loss: 8.335 | consumed_samples: 1824 | val_loss: 8.654 Training epoch 0, iteration 228/499 | lr: 0.0005447 | global_batch_size: 8 | global_step: 228 | reduced_train_loss: 8.584 | consumed_samples: 1832 | val_loss: 8.654 Training epoch 0, iteration 229/499 | lr: 0.0005414 | global_batch_size: 8 | global_step: 229 | reduced_train_loss: 8.636 | consumed_samples: 1840 | val_loss: 8.654 Training epoch 0, iteration 230/499 | lr: 0.0005381 | global_batch_size: 8 | global_step: 230 | reduced_train_loss: 8.377 | consumed_samples: 1848 | val_loss: 8.654 Training epoch 0, iteration 231/499 | lr: 0.0005348 | global_batch_size: 8 | global_step: 231 | reduced_train_loss: 8.669 | consumed_samples: 1856 | val_loss: 8.654 Training epoch 0, iteration 232/499 | lr: 0.0005315 | global_batch_size: 8 | global_step: 232 | reduced_train_loss: 8.668 | consumed_samples: 1864 | val_loss: 8.654 Training epoch 0, iteration 233/499 | lr: 0.0005282 | global_batch_size: 8 | global_step: 233 | reduced_train_loss: 8.537 | consumed_samples: 1872 | val_loss: 8.654 Training epoch 0, iteration 234/499 | lr: 0.0005248 | global_batch_size: 8 | global_step: 234 | reduced_train_loss: 8.463 | consumed_samples: 1880 | val_loss: 8.654 Training epoch 0, iteration 235/499 | lr: 0.0005215 | global_batch_size: 8 | global_step: 235 | reduced_train_loss: 8.401 | consumed_samples: 1888 | val_loss: 8.654 Training epoch 0, iteration 236/499 | lr: 0.0005182 | global_batch_size: 8 | global_step: 236 | reduced_train_loss: 8.504 | consumed_samples: 1896 | val_loss: 8.654 Training epoch 0, iteration 237/499 | lr: 0.0005149 | global_batch_size: 8 | global_step: 237 | reduced_train_loss: 8.512 | consumed_samples: 1904 | val_loss: 8.654 Training epoch 0, iteration 238/499 | lr: 0.0005116 | global_batch_size: 8 | global_step: 238 | reduced_train_loss: 8.405 | consumed_samples: 1912 | val_loss: 8.654 Training epoch 0, iteration 239/499 | lr: 0.0005083 | global_batch_size: 8 | global_step: 239 | reduced_train_loss: 8.407 | consumed_samples: 1920 | val_loss: 8.654 Training epoch 0, iteration 240/499 | lr: 0.000505 | global_batch_size: 8 | global_step: 240 | reduced_train_loss: 8.565 | consumed_samples: 1928 | val_loss: 8.654 Training epoch 0, iteration 241/499 | lr: 0.0005017 | global_batch_size: 8 | global_step: 241 | reduced_train_loss: 8.495 | consumed_samples: 1936 | val_loss: 8.654 Training epoch 0, iteration 242/499 | lr: 0.0004984 | global_batch_size: 8 | global_step: 242 | reduced_train_loss: 8.532 | consumed_samples: 1944 | val_loss: 8.654 Training epoch 0, iteration 243/499 | lr: 0.0004951 | global_batch_size: 8 | global_step: 243 | reduced_train_loss: 8.517 | consumed_samples: 1952 | val_loss: 8.654 Training epoch 0, iteration 244/499 | lr: 0.0004918 | global_batch_size: 8 | global_step: 244 | reduced_train_loss: 8.572 | consumed_samples: 1960 | val_loss: 8.654 Training epoch 0, iteration 245/499 | lr: 0.0004885 | global_batch_size: 8 | global_step: 245 | reduced_train_loss: 8.539 | consumed_samples: 1968 | val_loss: 8.654 Training epoch 0, iteration 246/499 | lr: 0.0004852 | global_batch_size: 8 | global_step: 246 | reduced_train_loss: 8.594 | consumed_samples: 1976 | val_loss: 8.654 Training epoch 0, iteration 247/499 | lr: 0.0004818 | global_batch_size: 8 | global_step: 247 | reduced_train_loss: 8.521 | consumed_samples: 1984 | val_loss: 8.654 Training epoch 0, iteration 248/499 | lr: 0.0004785 | global_batch_size: 8 | global_step: 248 | reduced_train_loss: 8.543 | consumed_samples: 1992 | val_loss: 8.654 Training epoch 0, iteration 249/499 | lr: 0.0004752 | global_batch_size: 8 | global_step: 249 | reduced_train_loss: 8.5 | consumed_samples: 2000 | val_loss: 8.654 Training epoch 0, iteration 250/499 | lr: 0.0004719 | global_batch_size: 8 | global_step: 250 | reduced_train_loss: 8.494 | consumed_samples: 2008 | val_loss: 8.654 Training epoch 0, iteration 251/499 | lr: 0.0004686 | global_batch_size: 8 | global_step: 251 | reduced_train_loss: 8.763 | consumed_samples: 2016 | val_loss: 8.654 Training epoch 0, iteration 252/499 | lr: 0.0004653 | global_batch_size: 8 | global_step: 252 | reduced_train_loss: 8.454 | consumed_samples: 2024 | val_loss: 8.654 Training epoch 0, iteration 253/499 | lr: 0.000462 | global_batch_size: 8 | global_step: 253 | reduced_train_loss: 8.443 | consumed_samples: 2032 | val_loss: 8.654 Training epoch 0, iteration 254/499 | lr: 0.0004587 | global_batch_size: 8 | global_step: 254 | reduced_train_loss: 8.373 | consumed_samples: 2040 | val_loss: 8.654 Training epoch 0, iteration 255/499 | lr: 0.0004555 | global_batch_size: 8 | global_step: 255 | reduced_train_loss: 8.615 | consumed_samples: 2048 | val_loss: 8.654 Training epoch 0, iteration 256/499 | lr: 0.0004522 | global_batch_size: 8 | global_step: 256 | reduced_train_loss: 8.408 | consumed_samples: 2056 | val_loss: 8.654 Training epoch 0, iteration 257/499 | lr: 0.0004489 | global_batch_size: 8 | global_step: 257 | reduced_train_loss: 8.483 | consumed_samples: 2064 | val_loss: 8.654 Training epoch 0, iteration 258/499 | lr: 0.0004456 | global_batch_size: 8 | global_step: 258 | reduced_train_loss: 8.558 | consumed_samples: 2072 | val_loss: 8.654 Training epoch 0, iteration 259/499 | lr: 0.0004423 | global_batch_size: 8 | global_step: 259 | reduced_train_loss: 8.492 | consumed_samples: 2080 | val_loss: 8.654 Training epoch 0, iteration 260/499 | lr: 0.000439 | global_batch_size: 8 | global_step: 260 | reduced_train_loss: 8.471 | consumed_samples: 2088 | val_loss: 8.654 Training epoch 0, iteration 261/499 | lr: 0.0004357 | global_batch_size: 8 | global_step: 261 | reduced_train_loss: 8.65 | consumed_samples: 2096 | val_loss: 8.654 Training epoch 0, iteration 262/499 | lr: 0.0004325 | global_batch_size: 8 | global_step: 262 | reduced_train_loss: 8.322 | consumed_samples: 2104 | val_loss: 8.654 Training epoch 0, iteration 263/499 | lr: 0.0004292 | global_batch_size: 8 | global_step: 263 | reduced_train_loss: 8.627 | consumed_samples: 2112 | val_loss: 8.654 Training epoch 0, iteration 264/499 | lr: 0.0004259 | global_batch_size: 8 | global_step: 264 | reduced_train_loss: 8.51 | consumed_samples: 2120 | val_loss: 8.654 Training epoch 0, iteration 265/499 | lr: 0.0004227 | global_batch_size: 8 | global_step: 265 | reduced_train_loss: 8.568 | consumed_samples: 2128 | val_loss: 8.654 Training epoch 0, iteration 266/499 | lr: 0.0004194 | global_batch_size: 8 | global_step: 266 | reduced_train_loss: 8.372 | consumed_samples: 2136 | val_loss: 8.654 Training epoch 0, iteration 267/499 | lr: 0.0004161 | global_batch_size: 8 | global_step: 267 | reduced_train_loss: 8.684 | consumed_samples: 2144 | val_loss: 8.654 Training epoch 0, iteration 268/499 | lr: 0.0004129 | global_batch_size: 8 | global_step: 268 | reduced_train_loss: 8.702 | consumed_samples: 2152 | val_loss: 8.654 Training epoch 0, iteration 269/499 | lr: 0.0004096 | global_batch_size: 8 | global_step: 269 | reduced_train_loss: 8.525 | consumed_samples: 2160 | val_loss: 8.654 Training epoch 0, iteration 270/499 | lr: 0.0004064 | global_batch_size: 8 | global_step: 270 | reduced_train_loss: 8.479 | consumed_samples: 2168 | val_loss: 8.654 Training epoch 0, iteration 271/499 | lr: 0.0004032 | global_batch_size: 8 | global_step: 271 | reduced_train_loss: 8.5 | consumed_samples: 2176 | val_loss: 8.654 Training epoch 0, iteration 272/499 | lr: 0.0003999 | global_batch_size: 8 | global_step: 272 | reduced_train_loss: 8.512 | consumed_samples: 2184 | val_loss: 8.654 Training epoch 0, iteration 273/499 | lr: 0.0003967 | global_batch_size: 8 | global_step: 273 | reduced_train_loss: 8.399 | consumed_samples: 2192 | val_loss: 8.654 Training epoch 0, iteration 274/499 | lr: 0.0003935 | global_batch_size: 8 | global_step: 274 | reduced_train_loss: 8.389 | consumed_samples: 2200 | val_loss: 8.654 Training epoch 0, iteration 275/499 | lr: 0.0003902 | global_batch_size: 8 | global_step: 275 | reduced_train_loss: 8.437 | consumed_samples: 2208 | val_loss: 8.654 Training epoch 0, iteration 276/499 | lr: 0.000387 | global_batch_size: 8 | global_step: 276 | reduced_train_loss: 8.533 | consumed_samples: 2216 | val_loss: 8.654 Training epoch 0, iteration 277/499 | lr: 0.0003838 | global_batch_size: 8 | global_step: 277 | reduced_train_loss: 8.527 | consumed_samples: 2224 | val_loss: 8.654 Training epoch 0, iteration 278/499 | lr: 0.0003806 | global_batch_size: 8 | global_step: 278 | reduced_train_loss: 8.584 | consumed_samples: 2232 | val_loss: 8.654 Training epoch 0, iteration 279/499 | lr: 0.0003774 | global_batch_size: 8 | global_step: 279 | reduced_train_loss: 8.218 | consumed_samples: 2240 | val_loss: 8.654 Training epoch 0, iteration 280/499 | lr: 0.0003742 | global_batch_size: 8 | global_step: 280 | reduced_train_loss: 8.51 | consumed_samples: 2248 | val_loss: 8.654 Training epoch 0, iteration 281/499 | lr: 0.000371 | global_batch_size: 8 | global_step: 281 | reduced_train_loss: 8.493 | consumed_samples: 2256 | val_loss: 8.654 Training epoch 0, iteration 282/499 | lr: 0.0003679 | global_batch_size: 8 | global_step: 282 | reduced_train_loss: 8.321 | consumed_samples: 2264 | val_loss: 8.654 Training epoch 0, iteration 283/499 | lr: 0.0003647 | global_batch_size: 8 | global_step: 283 | reduced_train_loss: 8.384 | consumed_samples: 2272 | val_loss: 8.654 Training epoch 0, iteration 284/499 | lr: 0.0003615 | global_batch_size: 8 | global_step: 284 | reduced_train_loss: 8.444 | consumed_samples: 2280 | val_loss: 8.654 Training epoch 0, iteration 285/499 | lr: 0.0003583 | global_batch_size: 8 | global_step: 285 | reduced_train_loss: 8.281 | consumed_samples: 2288 | val_loss: 8.654 Training epoch 0, iteration 286/499 | lr: 0.0003552 | global_batch_size: 8 | global_step: 286 | reduced_train_loss: 8.313 | consumed_samples: 2296 | val_loss: 8.654 Training epoch 0, iteration 287/499 | lr: 0.000352 | global_batch_size: 8 | global_step: 287 | reduced_train_loss: 8.613 | consumed_samples: 2304 | val_loss: 8.654 Training epoch 0, iteration 288/499 | lr: 0.0003489 | global_batch_size: 8 | global_step: 288 | reduced_train_loss: 8.372 | consumed_samples: 2312 | val_loss: 8.654 Training epoch 0, iteration 289/499 | lr: 0.0003458 | global_batch_size: 8 | global_step: 289 | reduced_train_loss: 8.255 | consumed_samples: 2320 | val_loss: 8.654 Training epoch 0, iteration 290/499 | lr: 0.0003426 | global_batch_size: 8 | global_step: 290 | reduced_train_loss: 8.396 | consumed_samples: 2328 | val_loss: 8.654 Training epoch 0, iteration 291/499 | lr: 0.0003395 | global_batch_size: 8 | global_step: 291 | reduced_train_loss: 8.399 | consumed_samples: 2336 | val_loss: 8.654 Training epoch 0, iteration 292/499 | lr: 0.0003364 | global_batch_size: 8 | global_step: 292 | reduced_train_loss: 8.244 | consumed_samples: 2344 | val_loss: 8.654 Training epoch 0, iteration 293/499 | lr: 0.0003333 | global_batch_size: 8 | global_step: 293 | reduced_train_loss: 8.398 | consumed_samples: 2352 | val_loss: 8.654 Training epoch 0, iteration 294/499 | lr: 0.0003302 | global_batch_size: 8 | global_step: 294 | reduced_train_loss: 8.253 | consumed_samples: 2360 | val_loss: 8.654 Training epoch 0, iteration 295/499 | lr: 0.0003271 | global_batch_size: 8 | global_step: 295 | reduced_train_loss: 8.443 | consumed_samples: 2368 | val_loss: 8.654 Training epoch 0, iteration 296/499 | lr: 0.000324 | global_batch_size: 8 | global_step: 296 | reduced_train_loss: 8.384 | consumed_samples: 2376 | val_loss: 8.654 Training epoch 0, iteration 297/499 | lr: 0.0003209 | global_batch_size: 8 | global_step: 297 | reduced_train_loss: 8.369 | consumed_samples: 2384 | val_loss: 8.654 Training epoch 0, iteration 298/499 | lr: 0.0003179 | global_batch_size: 8 | global_step: 298 | reduced_train_loss: 8.297 | consumed_samples: 2392 | val_loss: 8.654 Training epoch 0, iteration 299/499 | lr: 0.0003148 | global_batch_size: 8 | global_step: 299 | reduced_train_loss: 8.311 | consumed_samples: 2400 | val_loss: 8.654 [INFO | pytorch_lightning.utilities.rank_zero]: Epoch 0, global step 299: 'reduced_train_loss' reached 8.31067 (best 8.31067), saving model to '/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0.ckpt' as top 2 [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [NeMo I 2025-03-11 20:31:40 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 299 : Start time: 1741725100.457s : Save duration: 0.051s [NeMo I 2025-03-11 20:31:43 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0.ckpt [NeMo I 2025-03-11 20:31:44 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 299 : Start time: 1741725103.571s : Save duration: 0.453s [NeMo I 2025-03-11 20:31:46 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0-last.ckpt [NeMo I 2025-03-11 20:31:47 nemo_logging:393] Successfully saved checkpoint from iteration 299 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0.ckpt [NeMo I 2025-03-11 20:31:47 nemo_logging:393] Async checkpoint save for step 300 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:47 nemo_logging:393] Successfully saved checkpoint from iteration 299 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0-last.ckpt [NeMo I 2025-03-11 20:31:47 nemo_logging:393] Async checkpoint save for step 300 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.65-step=299-consumed_samples=2400.0-last.ckpt) finalized successfully. [NeMo I 2025-03-11 20:31:47 nemo_logging:393] Async finalization time took 0.101 s Validation: iteration 1/2 Validation: iteration 2/2 Validation: iteration 3/2 Validation: iteration 4/2 Validation: iteration 5/2 Validation: iteration 6/2 Validation: iteration 7/2 Validation: iteration 8/2 Training epoch 0, iteration 300/499 | lr: 0.0003118 | global_batch_size: 8 | global_step: 300 | reduced_train_loss: 8.597 | consumed_samples: 2408 | val_loss: 8.409 Training epoch 0, iteration 301/499 | lr: 0.0003087 | global_batch_size: 8 | global_step: 301 | reduced_train_loss: 8.3 | consumed_samples: 2416 | val_loss: 8.409 Training epoch 0, iteration 302/499 | lr: 0.0003057 | global_batch_size: 8 | global_step: 302 | reduced_train_loss: 8.272 | consumed_samples: 2424 | val_loss: 8.409 Training epoch 0, iteration 303/499 | lr: 0.0003027 | global_batch_size: 8 | global_step: 303 | reduced_train_loss: 8.361 | consumed_samples: 2432 | val_loss: 8.409 Training epoch 0, iteration 304/499 | lr: 0.0002996 | global_batch_size: 8 | global_step: 304 | reduced_train_loss: 8.432 | consumed_samples: 2440 | val_loss: 8.409 Training epoch 0, iteration 305/499 | lr: 0.0002966 | global_batch_size: 8 | global_step: 305 | reduced_train_loss: 8.315 | consumed_samples: 2448 | val_loss: 8.409 Training epoch 0, iteration 306/499 | lr: 0.0002936 | global_batch_size: 8 | global_step: 306 | reduced_train_loss: 8.224 | consumed_samples: 2456 | val_loss: 8.409 Training epoch 0, iteration 307/499 | lr: 0.0002907 | global_batch_size: 8 | global_step: 307 | reduced_train_loss: 8.461 | consumed_samples: 2464 | val_loss: 8.409 Training epoch 0, iteration 308/499 | lr: 0.0002877 | global_batch_size: 8 | global_step: 308 | reduced_train_loss: 8.279 | consumed_samples: 2472 | val_loss: 8.409 Training epoch 0, iteration 309/499 | lr: 0.0002847 | global_batch_size: 8 | global_step: 309 | reduced_train_loss: 8.054 | consumed_samples: 2480 | val_loss: 8.409 Training epoch 0, iteration 310/499 | lr: 0.0002817 | global_batch_size: 8 | global_step: 310 | reduced_train_loss: 8.389 | consumed_samples: 2488 | val_loss: 8.409 Training epoch 0, iteration 311/499 | lr: 0.0002788 | global_batch_size: 8 | global_step: 311 | reduced_train_loss: 8.42 | consumed_samples: 2496 | val_loss: 8.409 Training epoch 0, iteration 312/499 | lr: 0.0002759 | global_batch_size: 8 | global_step: 312 | reduced_train_loss: 8.33 | consumed_samples: 2504 | val_loss: 8.409 Training epoch 0, iteration 313/499 | lr: 0.0002729 | global_batch_size: 8 | global_step: 313 | reduced_train_loss: 8.386 | consumed_samples: 2512 | val_loss: 8.409 Training epoch 0, iteration 314/499 | lr: 0.00027 | global_batch_size: 8 | global_step: 314 | reduced_train_loss: 8.466 | consumed_samples: 2520 | val_loss: 8.409 Training epoch 0, iteration 315/499 | lr: 0.0002671 | global_batch_size: 8 | global_step: 315 | reduced_train_loss: 8.307 | consumed_samples: 2528 | val_loss: 8.409 Training epoch 0, iteration 316/499 | lr: 0.0002642 | global_batch_size: 8 | global_step: 316 | reduced_train_loss: 8.371 | consumed_samples: 2536 | val_loss: 8.409 Training epoch 0, iteration 317/499 | lr: 0.0002613 | global_batch_size: 8 | global_step: 317 | reduced_train_loss: 8.423 | consumed_samples: 2544 | val_loss: 8.409 Training epoch 0, iteration 318/499 | lr: 0.0002585 | global_batch_size: 8 | global_step: 318 | reduced_train_loss: 8.183 | consumed_samples: 2552 | val_loss: 8.409 Training epoch 0, iteration 319/499 | lr: 0.0002556 | global_batch_size: 8 | global_step: 319 | reduced_train_loss: 8.188 | consumed_samples: 2560 | val_loss: 8.409 Training epoch 0, iteration 320/499 | lr: 0.0002527 | global_batch_size: 8 | global_step: 320 | reduced_train_loss: 8.702 | consumed_samples: 2568 | val_loss: 8.409 Training epoch 0, iteration 321/499 | lr: 0.0002499 | global_batch_size: 8 | global_step: 321 | reduced_train_loss: 8.393 | consumed_samples: 2576 | val_loss: 8.409 Training epoch 0, iteration 322/499 | lr: 0.0002471 | global_batch_size: 8 | global_step: 322 | reduced_train_loss: 8.427 | consumed_samples: 2584 | val_loss: 8.409 Training epoch 0, iteration 323/499 | lr: 0.0002443 | global_batch_size: 8 | global_step: 323 | reduced_train_loss: 8.294 | consumed_samples: 2592 | val_loss: 8.409 Training epoch 0, iteration 324/499 | lr: 0.0002414 | global_batch_size: 8 | global_step: 324 | reduced_train_loss: 8.501 | consumed_samples: 2600 | val_loss: 8.409 Training epoch 0, iteration 325/499 | lr: 0.0002386 | global_batch_size: 8 | global_step: 325 | reduced_train_loss: 8.317 | consumed_samples: 2608 | val_loss: 8.409 Training epoch 0, iteration 326/499 | lr: 0.0002359 | global_batch_size: 8 | global_step: 326 | reduced_train_loss: 8.396 | consumed_samples: 2616 | val_loss: 8.409 Training epoch 0, iteration 327/499 | lr: 0.0002331 | global_batch_size: 8 | global_step: 327 | reduced_train_loss: 8.346 | consumed_samples: 2624 | val_loss: 8.409 Training epoch 0, iteration 328/499 | lr: 0.0002303 | global_batch_size: 8 | global_step: 328 | reduced_train_loss: 8.372 | consumed_samples: 2632 | val_loss: 8.409 Training epoch 0, iteration 329/499 | lr: 0.0002276 | global_batch_size: 8 | global_step: 329 | reduced_train_loss: 8.284 | consumed_samples: 2640 | val_loss: 8.409 Training epoch 0, iteration 330/499 | lr: 0.0002249 | global_batch_size: 8 | global_step: 330 | reduced_train_loss: 8.383 | consumed_samples: 2648 | val_loss: 8.409 Training epoch 0, iteration 331/499 | lr: 0.0002221 | global_batch_size: 8 | global_step: 331 | reduced_train_loss: 8.438 | consumed_samples: 2656 | val_loss: 8.409 Training epoch 0, iteration 332/499 | lr: 0.0002194 | global_batch_size: 8 | global_step: 332 | reduced_train_loss: 8.346 | consumed_samples: 2664 | val_loss: 8.409 Training epoch 0, iteration 333/499 | lr: 0.0002167 | global_batch_size: 8 | global_step: 333 | reduced_train_loss: 8.113 | consumed_samples: 2672 | val_loss: 8.409 Training epoch 0, iteration 334/499 | lr: 0.000214 | global_batch_size: 8 | global_step: 334 | reduced_train_loss: 8.291 | consumed_samples: 2680 | val_loss: 8.409 Training epoch 0, iteration 335/499 | lr: 0.0002114 | global_batch_size: 8 | global_step: 335 | reduced_train_loss: 8.24 | consumed_samples: 2688 | val_loss: 8.409 Training epoch 0, iteration 336/499 | lr: 0.0002087 | global_batch_size: 8 | global_step: 336 | reduced_train_loss: 8.657 | consumed_samples: 2696 | val_loss: 8.409 Training epoch 0, iteration 337/499 | lr: 0.0002061 | global_batch_size: 8 | global_step: 337 | reduced_train_loss: 8.329 | consumed_samples: 2704 | val_loss: 8.409 Training epoch 0, iteration 338/499 | lr: 0.0002034 | global_batch_size: 8 | global_step: 338 | reduced_train_loss: 8.128 | consumed_samples: 2712 | val_loss: 8.409 Training epoch 0, iteration 339/499 | lr: 0.0002008 | global_batch_size: 8 | global_step: 339 | reduced_train_loss: 8.335 | consumed_samples: 2720 | val_loss: 8.409 Training epoch 0, iteration 340/499 | lr: 0.0001982 | global_batch_size: 8 | global_step: 340 | reduced_train_loss: 8.406 | consumed_samples: 2728 | val_loss: 8.409 Training epoch 0, iteration 341/499 | lr: 0.0001956 | global_batch_size: 8 | global_step: 341 | reduced_train_loss: 8.507 | consumed_samples: 2736 | val_loss: 8.409 Training epoch 0, iteration 342/499 | lr: 0.0001931 | global_batch_size: 8 | global_step: 342 | reduced_train_loss: 8.472 | consumed_samples: 2744 | val_loss: 8.409 Training epoch 0, iteration 343/499 | lr: 0.0001905 | global_batch_size: 8 | global_step: 343 | reduced_train_loss: 8.335 | consumed_samples: 2752 | val_loss: 8.409 Training epoch 0, iteration 344/499 | lr: 0.0001879 | global_batch_size: 8 | global_step: 344 | reduced_train_loss: 8.267 | consumed_samples: 2760 | val_loss: 8.409 Training epoch 0, iteration 345/499 | lr: 0.0001854 | global_batch_size: 8 | global_step: 345 | reduced_train_loss: 8.43 | consumed_samples: 2768 | val_loss: 8.409 Training epoch 0, iteration 346/499 | lr: 0.0001829 | global_batch_size: 8 | global_step: 346 | reduced_train_loss: 8.418 | consumed_samples: 2776 | val_loss: 8.409 Training epoch 0, iteration 347/499 | lr: 0.0001804 | global_batch_size: 8 | global_step: 347 | reduced_train_loss: 8.476 | consumed_samples: 2784 | val_loss: 8.409 Training epoch 0, iteration 348/499 | lr: 0.0001779 | global_batch_size: 8 | global_step: 348 | reduced_train_loss: 8.342 | consumed_samples: 2792 | val_loss: 8.409 Training epoch 0, iteration 349/499 | lr: 0.0001754 | global_batch_size: 8 | global_step: 349 | reduced_train_loss: 8.421 | consumed_samples: 2800 | val_loss: 8.409 Training epoch 0, iteration 350/499 | lr: 0.000173 | global_batch_size: 8 | global_step: 350 | reduced_train_loss: 8.473 | consumed_samples: 2808 | val_loss: 8.409 Training epoch 0, iteration 351/499 | lr: 0.0001705 | global_batch_size: 8 | global_step: 351 | reduced_train_loss: 8.392 | consumed_samples: 2816 | val_loss: 8.409 Training epoch 0, iteration 352/499 | lr: 0.0001681 | global_batch_size: 8 | global_step: 352 | reduced_train_loss: 8.21 | consumed_samples: 2824 | val_loss: 8.409 Training epoch 0, iteration 353/499 | lr: 0.0001657 | global_batch_size: 8 | global_step: 353 | reduced_train_loss: 8.273 | consumed_samples: 2832 | val_loss: 8.409 Training epoch 0, iteration 354/499 | lr: 0.0001633 | global_batch_size: 8 | global_step: 354 | reduced_train_loss: 8.325 | consumed_samples: 2840 | val_loss: 8.409 Training epoch 0, iteration 355/499 | lr: 0.0001609 | global_batch_size: 8 | global_step: 355 | reduced_train_loss: 8.464 | consumed_samples: 2848 | val_loss: 8.409 Training epoch 0, iteration 356/499 | lr: 0.0001585 | global_batch_size: 8 | global_step: 356 | reduced_train_loss: 8.47 | consumed_samples: 2856 | val_loss: 8.409 Training epoch 0, iteration 357/499 | lr: 0.0001562 | global_batch_size: 8 | global_step: 357 | reduced_train_loss: 8.318 | consumed_samples: 2864 | val_loss: 8.409 Training epoch 0, iteration 358/499 | lr: 0.0001538 | global_batch_size: 8 | global_step: 358 | reduced_train_loss: 8.282 | consumed_samples: 2872 | val_loss: 8.409 Training epoch 0, iteration 359/499 | lr: 0.0001515 | global_batch_size: 8 | global_step: 359 | reduced_train_loss: 8.137 | consumed_samples: 2880 | val_loss: 8.409 Training epoch 0, iteration 360/499 | lr: 0.0001492 | global_batch_size: 8 | global_step: 360 | reduced_train_loss: 8.435 | consumed_samples: 2888 | val_loss: 8.409 Training epoch 0, iteration 361/499 | lr: 0.0001469 | global_batch_size: 8 | global_step: 361 | reduced_train_loss: 8.139 | consumed_samples: 2896 | val_loss: 8.409 Training epoch 0, iteration 362/499 | lr: 0.0001446 | global_batch_size: 8 | global_step: 362 | reduced_train_loss: 8.237 | consumed_samples: 2904 | val_loss: 8.409 Training epoch 0, iteration 363/499 | lr: 0.0001424 | global_batch_size: 8 | global_step: 363 | reduced_train_loss: 8.218 | consumed_samples: 2912 | val_loss: 8.409 Training epoch 0, iteration 364/499 | lr: 0.0001401 | global_batch_size: 8 | global_step: 364 | reduced_train_loss: 8.326 | consumed_samples: 2920 | val_loss: 8.409 Training epoch 0, iteration 365/499 | lr: 0.0001379 | global_batch_size: 8 | global_step: 365 | reduced_train_loss: 8.404 | consumed_samples: 2928 | val_loss: 8.409 Training epoch 0, iteration 366/499 | lr: 0.0001357 | global_batch_size: 8 | global_step: 366 | reduced_train_loss: 8.402 | consumed_samples: 2936 | val_loss: 8.409 Training epoch 0, iteration 367/499 | lr: 0.0001335 | global_batch_size: 8 | global_step: 367 | reduced_train_loss: 8.212 | consumed_samples: 2944 | val_loss: 8.409 Training epoch 0, iteration 368/499 | lr: 0.0001313 | global_batch_size: 8 | global_step: 368 | reduced_train_loss: 8.521 | consumed_samples: 2952 | val_loss: 8.409 Training epoch 0, iteration 369/499 | lr: 0.0001291 | global_batch_size: 8 | global_step: 369 | reduced_train_loss: 8.346 | consumed_samples: 2960 | val_loss: 8.409 Training epoch 0, iteration 370/499 | lr: 0.000127 | global_batch_size: 8 | global_step: 370 | reduced_train_loss: 8.482 | consumed_samples: 2968 | val_loss: 8.409 Training epoch 0, iteration 371/499 | lr: 0.0001249 | global_batch_size: 8 | global_step: 371 | reduced_train_loss: 8.278 | consumed_samples: 2976 | val_loss: 8.409 Training epoch 0, iteration 372/499 | lr: 0.0001228 | global_batch_size: 8 | global_step: 372 | reduced_train_loss: 8.106 | consumed_samples: 2984 | val_loss: 8.409 Training epoch 0, iteration 373/499 | lr: 0.0001207 | global_batch_size: 8 | global_step: 373 | reduced_train_loss: 8.291 | consumed_samples: 2992 | val_loss: 8.409 Training epoch 0, iteration 374/499 | lr: 0.0001186 | global_batch_size: 8 | global_step: 374 | reduced_train_loss: 8.508 | consumed_samples: 3000 | val_loss: 8.409 Training epoch 0, iteration 375/499 | lr: 0.0001165 | global_batch_size: 8 | global_step: 375 | reduced_train_loss: 8.166 | consumed_samples: 3008 | val_loss: 8.409 Training epoch 0, iteration 376/499 | lr: 0.0001145 | global_batch_size: 8 | global_step: 376 | reduced_train_loss: 8.251 | consumed_samples: 3016 | val_loss: 8.409 Training epoch 0, iteration 377/499 | lr: 0.0001125 | global_batch_size: 8 | global_step: 377 | reduced_train_loss: 8.247 | consumed_samples: 3024 | val_loss: 8.409 Training epoch 0, iteration 378/499 | lr: 0.0001105 | global_batch_size: 8 | global_step: 378 | reduced_train_loss: 8.389 | consumed_samples: 3032 | val_loss: 8.409 Training epoch 0, iteration 379/499 | lr: 0.0001085 | global_batch_size: 8 | global_step: 379 | reduced_train_loss: 8.232 | consumed_samples: 3040 | val_loss: 8.409 Training epoch 0, iteration 380/499 | lr: 0.0001065 | global_batch_size: 8 | global_step: 380 | reduced_train_loss: 8.426 | consumed_samples: 3048 | val_loss: 8.409 Training epoch 0, iteration 381/499 | lr: 0.0001045 | global_batch_size: 8 | global_step: 381 | reduced_train_loss: 8.203 | consumed_samples: 3056 | val_loss: 8.409 Training epoch 0, iteration 382/499 | lr: 0.0001026 | global_batch_size: 8 | global_step: 382 | reduced_train_loss: 8.41 | consumed_samples: 3064 | val_loss: 8.409 Training epoch 0, iteration 383/499 | lr: 0.0001007 | global_batch_size: 8 | global_step: 383 | reduced_train_loss: 8.245 | consumed_samples: 3072 | val_loss: 8.409 Training epoch 0, iteration 384/499 | lr: 9.878e-05 | global_batch_size: 8 | global_step: 384 | reduced_train_loss: 8.262 | consumed_samples: 3080 | val_loss: 8.409 Training epoch 0, iteration 385/499 | lr: 9.69e-05 | global_batch_size: 8 | global_step: 385 | reduced_train_loss: 8.584 | consumed_samples: 3088 | val_loss: 8.409 Training epoch 0, iteration 386/499 | lr: 9.504e-05 | global_batch_size: 8 | global_step: 386 | reduced_train_loss: 8.3 | consumed_samples: 3096 | val_loss: 8.409 Training epoch 0, iteration 387/499 | lr: 9.319e-05 | global_batch_size: 8 | global_step: 387 | reduced_train_loss: 8.302 | consumed_samples: 3104 | val_loss: 8.409 Training epoch 0, iteration 388/499 | lr: 9.137e-05 | global_batch_size: 8 | global_step: 388 | reduced_train_loss: 8.218 | consumed_samples: 3112 | val_loss: 8.409 Training epoch 0, iteration 389/499 | lr: 8.956e-05 | global_batch_size: 8 | global_step: 389 | reduced_train_loss: 8.151 | consumed_samples: 3120 | val_loss: 8.409 Training epoch 0, iteration 390/499 | lr: 8.777e-05 | global_batch_size: 8 | global_step: 390 | reduced_train_loss: 8.313 | consumed_samples: 3128 | val_loss: 8.409 Training epoch 0, iteration 391/499 | lr: 8.6e-05 | global_batch_size: 8 | global_step: 391 | reduced_train_loss: 8.469 | consumed_samples: 3136 | val_loss: 8.409 Training epoch 0, iteration 392/499 | lr: 8.425e-05 | global_batch_size: 8 | global_step: 392 | reduced_train_loss: 8.208 | consumed_samples: 3144 | val_loss: 8.409 Training epoch 0, iteration 393/499 | lr: 8.251e-05 | global_batch_size: 8 | global_step: 393 | reduced_train_loss: 8.206 | consumed_samples: 3152 | val_loss: 8.409 Training epoch 0, iteration 394/499 | lr: 8.08e-05 | global_batch_size: 8 | global_step: 394 | reduced_train_loss: 8.124 | consumed_samples: 3160 | val_loss: 8.409 Training epoch 0, iteration 395/499 | lr: 7.91e-05 | global_batch_size: 8 | global_step: 395 | reduced_train_loss: 8.241 | consumed_samples: 3168 | val_loss: 8.409 Training epoch 0, iteration 396/499 | lr: 7.742e-05 | global_batch_size: 8 | global_step: 396 | reduced_train_loss: 8.457 | consumed_samples: 3176 | val_loss: 8.409 Training epoch 0, iteration 397/499 | lr: 7.577e-05 | global_batch_size: 8 | global_step: 397 | reduced_train_loss: 8.335 | consumed_samples: 3184 | val_loss: 8.409 Training epoch 0, iteration 398/499 | lr: 7.413e-05 | global_batch_size: 8 | global_step: 398 | reduced_train_loss: 8.288 | consumed_samples: 3192 | val_loss: 8.409 Training epoch 0, iteration 399/499 | lr: 7.251e-05 | global_batch_size: 8 | global_step: 399 | reduced_train_loss: 8.437 | consumed_samples: 3200 | val_loss: 8.409 [INFO | pytorch_lightning.utilities.rank_zero]: Epoch 0, global step 399: 'reduced_train_loss' reached 8.43677 (best 8.31067), saving model to '/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0.ckpt' as top 2 [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [NeMo I 2025-03-11 20:31:55 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 399 : Start time: 1741725115.940s : Save duration: 0.051s [NeMo I 2025-03-11 20:31:59 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0.ckpt [NeMo I 2025-03-11 20:31:59 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 399 : Start time: 1741725119.044s : Save duration: 0.051s [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0-last.ckpt [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Successfully saved checkpoint from iteration 399 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0.ckpt [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Async checkpoint save for step 400 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0.ckpt) finalized successfully. [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Successfully saved checkpoint from iteration 399 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0-last.ckpt [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Async checkpoint save for step 400 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.41-step=399-consumed_samples=3200.0-last.ckpt) finalized successfully. [NeMo I 2025-03-11 20:32:02 nemo_logging:393] Async finalization time took 0.096 s Validation: iteration 1/2 Validation: iteration 2/2 Validation: iteration 3/2 Validation: iteration 4/2 Validation: iteration 5/2 Validation: iteration 6/2 Validation: iteration 7/2 Validation: iteration 8/2 Training epoch 0, iteration 400/499 | lr: 7.091e-05 | global_batch_size: 8 | global_step: 400 | reduced_train_loss: 8.163 | consumed_samples: 3208 | val_loss: 8.28 Training epoch 0, iteration 401/499 | lr: 6.933e-05 | global_batch_size: 8 | global_step: 401 | reduced_train_loss: 8.448 | consumed_samples: 3216 | val_loss: 8.28 Training epoch 0, iteration 402/499 | lr: 6.777e-05 | global_batch_size: 8 | global_step: 402 | reduced_train_loss: 8.387 | consumed_samples: 3224 | val_loss: 8.28 Training epoch 0, iteration 403/499 | lr: 6.623e-05 | global_batch_size: 8 | global_step: 403 | reduced_train_loss: 8.415 | consumed_samples: 3232 | val_loss: 8.28 Training epoch 0, iteration 404/499 | lr: 6.471e-05 | global_batch_size: 8 | global_step: 404 | reduced_train_loss: 8.102 | consumed_samples: 3240 | val_loss: 8.28 Training epoch 0, iteration 405/499 | lr: 6.32e-05 | global_batch_size: 8 | global_step: 405 | reduced_train_loss: 8.257 | consumed_samples: 3248 | val_loss: 8.28 Training epoch 0, iteration 406/499 | lr: 6.172e-05 | global_batch_size: 8 | global_step: 406 | reduced_train_loss: 8.301 | consumed_samples: 3256 | val_loss: 8.28 Training epoch 0, iteration 407/499 | lr: 6.026e-05 | global_batch_size: 8 | global_step: 407 | reduced_train_loss: 8.347 | consumed_samples: 3264 | val_loss: 8.28 Training epoch 0, iteration 408/499 | lr: 5.882e-05 | global_batch_size: 8 | global_step: 408 | reduced_train_loss: 8.335 | consumed_samples: 3272 | val_loss: 8.28 Training epoch 0, iteration 409/499 | lr: 5.739e-05 | global_batch_size: 8 | global_step: 409 | reduced_train_loss: 8.205 | consumed_samples: 3280 | val_loss: 8.28 Training epoch 0, iteration 410/499 | lr: 5.599e-05 | global_batch_size: 8 | global_step: 410 | reduced_train_loss: 8.378 | consumed_samples: 3288 | val_loss: 8.28 Training epoch 0, iteration 411/499 | lr: 5.461e-05 | global_batch_size: 8 | global_step: 411 | reduced_train_loss: 8.424 | consumed_samples: 3296 | val_loss: 8.28 Training epoch 0, iteration 412/499 | lr: 5.324e-05 | global_batch_size: 8 | global_step: 412 | reduced_train_loss: 8.135 | consumed_samples: 3304 | val_loss: 8.28 Training epoch 0, iteration 413/499 | lr: 5.19e-05 | global_batch_size: 8 | global_step: 413 | reduced_train_loss: 8.187 | consumed_samples: 3312 | val_loss: 8.28 Training epoch 0, iteration 414/499 | lr: 5.058e-05 | global_batch_size: 8 | global_step: 414 | reduced_train_loss: 8.055 | consumed_samples: 3320 | val_loss: 8.28 Training epoch 0, iteration 415/499 | lr: 4.928e-05 | global_batch_size: 8 | global_step: 415 | reduced_train_loss: 8.373 | consumed_samples: 3328 | val_loss: 8.28 Training epoch 0, iteration 416/499 | lr: 4.8e-05 | global_batch_size: 8 | global_step: 416 | reduced_train_loss: 8.308 | consumed_samples: 3336 | val_loss: 8.28 Training epoch 0, iteration 417/499 | lr: 4.674e-05 | global_batch_size: 8 | global_step: 417 | reduced_train_loss: 8.31 | consumed_samples: 3344 | val_loss: 8.28 Training epoch 0, iteration 418/499 | lr: 4.55e-05 | global_batch_size: 8 | global_step: 418 | reduced_train_loss: 8.076 | consumed_samples: 3352 | val_loss: 8.28 Training epoch 0, iteration 419/499 | lr: 4.428e-05 | global_batch_size: 8 | global_step: 419 | reduced_train_loss: 8.194 | consumed_samples: 3360 | val_loss: 8.28 Training epoch 0, iteration 420/499 | lr: 4.308e-05 | global_batch_size: 8 | global_step: 420 | reduced_train_loss: 8.139 | consumed_samples: 3368 | val_loss: 8.28 Training epoch 0, iteration 421/499 | lr: 4.19e-05 | global_batch_size: 8 | global_step: 421 | reduced_train_loss: 8.361 | consumed_samples: 3376 | val_loss: 8.28 Training epoch 0, iteration 422/499 | lr: 4.074e-05 | global_batch_size: 8 | global_step: 422 | reduced_train_loss: 8.315 | consumed_samples: 3384 | val_loss: 8.28 Training epoch 0, iteration 423/499 | lr: 3.96e-05 | global_batch_size: 8 | global_step: 423 | reduced_train_loss: 8.419 | consumed_samples: 3392 | val_loss: 8.28 Training epoch 0, iteration 424/499 | lr: 3.848e-05 | global_batch_size: 8 | global_step: 424 | reduced_train_loss: 8.131 | consumed_samples: 3400 | val_loss: 8.28 Training epoch 0, iteration 425/499 | lr: 3.739e-05 | global_batch_size: 8 | global_step: 425 | reduced_train_loss: 8.344 | consumed_samples: 3408 | val_loss: 8.28 Training epoch 0, iteration 426/499 | lr: 3.631e-05 | global_batch_size: 8 | global_step: 426 | reduced_train_loss: 8.262 | consumed_samples: 3416 | val_loss: 8.28 Training epoch 0, iteration 427/499 | lr: 3.526e-05 | global_batch_size: 8 | global_step: 427 | reduced_train_loss: 8.517 | consumed_samples: 3424 | val_loss: 8.28 Training epoch 0, iteration 428/499 | lr: 3.423e-05 | global_batch_size: 8 | global_step: 428 | reduced_train_loss: 8.197 | consumed_samples: 3432 | val_loss: 8.28 Training epoch 0, iteration 429/499 | lr: 3.322e-05 | global_batch_size: 8 | global_step: 429 | reduced_train_loss: 8.327 | consumed_samples: 3440 | val_loss: 8.28 Training epoch 0, iteration 430/499 | lr: 3.222e-05 | global_batch_size: 8 | global_step: 430 | reduced_train_loss: 8.209 | consumed_samples: 3448 | val_loss: 8.28 Training epoch 0, iteration 431/499 | lr: 3.125e-05 | global_batch_size: 8 | global_step: 431 | reduced_train_loss: 8.107 | consumed_samples: 3456 | val_loss: 8.28 Training epoch 0, iteration 432/499 | lr: 3.031e-05 | global_batch_size: 8 | global_step: 432 | reduced_train_loss: 8.163 | consumed_samples: 3464 | val_loss: 8.28 Training epoch 0, iteration 433/499 | lr: 2.938e-05 | global_batch_size: 8 | global_step: 433 | reduced_train_loss: 8.11 | consumed_samples: 3472 | val_loss: 8.28 Training epoch 0, iteration 434/499 | lr: 2.847e-05 | global_batch_size: 8 | global_step: 434 | reduced_train_loss: 8.456 | consumed_samples: 3480 | val_loss: 8.28 Training epoch 0, iteration 435/499 | lr: 2.759e-05 | global_batch_size: 8 | global_step: 435 | reduced_train_loss: 8.108 | consumed_samples: 3488 | val_loss: 8.28 Training epoch 0, iteration 436/499 | lr: 2.672e-05 | global_batch_size: 8 | global_step: 436 | reduced_train_loss: 8.194 | consumed_samples: 3496 | val_loss: 8.28 Training epoch 0, iteration 437/499 | lr: 2.588e-05 | global_batch_size: 8 | global_step: 437 | reduced_train_loss: 8.314 | consumed_samples: 3504 | val_loss: 8.28 Training epoch 0, iteration 438/499 | lr: 2.506e-05 | global_batch_size: 8 | global_step: 438 | reduced_train_loss: 8.144 | consumed_samples: 3512 | val_loss: 8.28 Training epoch 0, iteration 439/499 | lr: 2.426e-05 | global_batch_size: 8 | global_step: 439 | reduced_train_loss: 8.36 | consumed_samples: 3520 | val_loss: 8.28 Training epoch 0, iteration 440/499 | lr: 2.348e-05 | global_batch_size: 8 | global_step: 440 | reduced_train_loss: 8.267 | consumed_samples: 3528 | val_loss: 8.28 Training epoch 0, iteration 441/499 | lr: 2.273e-05 | global_batch_size: 8 | global_step: 441 | reduced_train_loss: 8.478 | consumed_samples: 3536 | val_loss: 8.28 Training epoch 0, iteration 442/499 | lr: 2.199e-05 | global_batch_size: 8 | global_step: 442 | reduced_train_loss: 8.341 | consumed_samples: 3544 | val_loss: 8.28 Training epoch 0, iteration 443/499 | lr: 2.128e-05 | global_batch_size: 8 | global_step: 443 | reduced_train_loss: 8.305 | consumed_samples: 3552 | val_loss: 8.28 Training epoch 0, iteration 444/499 | lr: 2.059e-05 | global_batch_size: 8 | global_step: 444 | reduced_train_loss: 8.325 | consumed_samples: 3560 | val_loss: 8.28 Training epoch 0, iteration 445/499 | lr: 1.992e-05 | global_batch_size: 8 | global_step: 445 | reduced_train_loss: 8.417 | consumed_samples: 3568 | val_loss: 8.28 Training epoch 0, iteration 446/499 | lr: 1.927e-05 | global_batch_size: 8 | global_step: 446 | reduced_train_loss: 8.262 | consumed_samples: 3576 | val_loss: 8.28 Training epoch 0, iteration 447/499 | lr: 1.864e-05 | global_batch_size: 8 | global_step: 447 | reduced_train_loss: 8.189 | consumed_samples: 3584 | val_loss: 8.28 Training epoch 0, iteration 448/499 | lr: 1.804e-05 | global_batch_size: 8 | global_step: 448 | reduced_train_loss: 8.114 | consumed_samples: 3592 | val_loss: 8.28 Training epoch 0, iteration 449/499 | lr: 1.746e-05 | global_batch_size: 8 | global_step: 449 | reduced_train_loss: 8.154 | consumed_samples: 3600 | val_loss: 8.28 Training epoch 0, iteration 450/499 | lr: 1.69e-05 | global_batch_size: 8 | global_step: 450 | reduced_train_loss: 8.13 | consumed_samples: 3608 | val_loss: 8.28 Training epoch 0, iteration 451/499 | lr: 1.636e-05 | global_batch_size: 8 | global_step: 451 | reduced_train_loss: 8.24 | consumed_samples: 3616 | val_loss: 8.28 Training epoch 0, iteration 452/499 | lr: 1.584e-05 | global_batch_size: 8 | global_step: 452 | reduced_train_loss: 8.206 | consumed_samples: 3624 | val_loss: 8.28 Training epoch 0, iteration 453/499 | lr: 1.534e-05 | global_batch_size: 8 | global_step: 453 | reduced_train_loss: 7.939 | consumed_samples: 3632 | val_loss: 8.28 Training epoch 0, iteration 454/499 | lr: 1.487e-05 | global_batch_size: 8 | global_step: 454 | reduced_train_loss: 8.314 | consumed_samples: 3640 | val_loss: 8.28 Training epoch 0, iteration 455/499 | lr: 1.442e-05 | global_batch_size: 8 | global_step: 455 | reduced_train_loss: 8.245 | consumed_samples: 3648 | val_loss: 8.28 Training epoch 0, iteration 456/499 | lr: 1.399e-05 | global_batch_size: 8 | global_step: 456 | reduced_train_loss: 8.481 | consumed_samples: 3656 | val_loss: 8.28 Training epoch 0, iteration 457/499 | lr: 1.358e-05 | global_batch_size: 8 | global_step: 457 | reduced_train_loss: 8.144 | consumed_samples: 3664 | val_loss: 8.28 Training epoch 0, iteration 458/499 | lr: 1.319e-05 | global_batch_size: 8 | global_step: 458 | reduced_train_loss: 8.092 | consumed_samples: 3672 | val_loss: 8.28 Training epoch 0, iteration 459/499 | lr: 1.283e-05 | global_batch_size: 8 | global_step: 459 | reduced_train_loss: 8.222 | consumed_samples: 3680 | val_loss: 8.28 Training epoch 0, iteration 460/499 | lr: 1.249e-05 | global_batch_size: 8 | global_step: 460 | reduced_train_loss: 8.245 | consumed_samples: 3688 | val_loss: 8.28 Training epoch 0, iteration 461/499 | lr: 1.217e-05 | global_batch_size: 8 | global_step: 461 | reduced_train_loss: 8.278 | consumed_samples: 3696 | val_loss: 8.28 Training epoch 0, iteration 462/499 | lr: 1.187e-05 | global_batch_size: 8 | global_step: 462 | reduced_train_loss: 8.303 | consumed_samples: 3704 | val_loss: 8.28 Training epoch 0, iteration 463/499 | lr: 1.159e-05 | global_batch_size: 8 | global_step: 463 | reduced_train_loss: 8.142 | consumed_samples: 3712 | val_loss: 8.28 Training epoch 0, iteration 464/499 | lr: 1.134e-05 | global_batch_size: 8 | global_step: 464 | reduced_train_loss: 8.387 | consumed_samples: 3720 | val_loss: 8.28 Training epoch 0, iteration 465/499 | lr: 1.111e-05 | global_batch_size: 8 | global_step: 465 | reduced_train_loss: 8.264 | consumed_samples: 3728 | val_loss: 8.28 Training epoch 0, iteration 466/499 | lr: 1.09e-05 | global_batch_size: 8 | global_step: 466 | reduced_train_loss: 8.385 | consumed_samples: 3736 | val_loss: 8.28 Training epoch 0, iteration 467/499 | lr: 1.071e-05 | global_batch_size: 8 | global_step: 467 | reduced_train_loss: 8.145 | consumed_samples: 3744 | val_loss: 8.28 Training epoch 0, iteration 468/499 | lr: 1.054e-05 | global_batch_size: 8 | global_step: 468 | reduced_train_loss: 8.335 | consumed_samples: 3752 | val_loss: 8.28 Training epoch 0, iteration 469/499 | lr: 1.04e-05 | global_batch_size: 8 | global_step: 469 | reduced_train_loss: 8.429 | consumed_samples: 3760 | val_loss: 8.28 Training epoch 0, iteration 470/499 | lr: 1.028e-05 | global_batch_size: 8 | global_step: 470 | reduced_train_loss: 8.274 | consumed_samples: 3768 | val_loss: 8.28 Training epoch 0, iteration 471/499 | lr: 1.018e-05 | global_batch_size: 8 | global_step: 471 | reduced_train_loss: 8.185 | consumed_samples: 3776 | val_loss: 8.28 Training epoch 0, iteration 472/499 | lr: 1.01e-05 | global_batch_size: 8 | global_step: 472 | reduced_train_loss: 8.278 | consumed_samples: 3784 | val_loss: 8.28 Training epoch 0, iteration 473/499 | lr: 1.004e-05 | global_batch_size: 8 | global_step: 473 | reduced_train_loss: 8.431 | consumed_samples: 3792 | val_loss: 8.28 Training epoch 0, iteration 474/499 | lr: 1.001e-05 | global_batch_size: 8 | global_step: 474 | reduced_train_loss: 8.386 | consumed_samples: 3800 | val_loss: 8.28 Training epoch 0, iteration 475/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 475 | reduced_train_loss: 8.126 | consumed_samples: 3808 | val_loss: 8.28 Training epoch 0, iteration 476/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 476 | reduced_train_loss: 8.338 | consumed_samples: 3816 | val_loss: 8.28 Training epoch 0, iteration 477/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 477 | reduced_train_loss: 8.296 | consumed_samples: 3824 | val_loss: 8.28 Training epoch 0, iteration 478/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 478 | reduced_train_loss: 8.111 | consumed_samples: 3832 | val_loss: 8.28 Training epoch 0, iteration 479/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 479 | reduced_train_loss: 8.277 | consumed_samples: 3840 | val_loss: 8.28 Training epoch 0, iteration 480/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 480 | reduced_train_loss: 8.187 | consumed_samples: 3848 | val_loss: 8.28 Training epoch 0, iteration 481/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 481 | reduced_train_loss: 8.268 | consumed_samples: 3856 | val_loss: 8.28 Training epoch 0, iteration 482/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 482 | reduced_train_loss: 8.208 | consumed_samples: 3864 | val_loss: 8.28 Training epoch 0, iteration 483/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 483 | reduced_train_loss: 8.206 | consumed_samples: 3872 | val_loss: 8.28 Training epoch 0, iteration 484/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 484 | reduced_train_loss: 8.303 | consumed_samples: 3880 | val_loss: 8.28 Training epoch 0, iteration 485/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 485 | reduced_train_loss: 8.281 | consumed_samples: 3888 | val_loss: 8.28 Training epoch 0, iteration 486/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 486 | reduced_train_loss: 8.362 | consumed_samples: 3896 | val_loss: 8.28 Training epoch 0, iteration 487/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 487 | reduced_train_loss: 8.114 | consumed_samples: 3904 | val_loss: 8.28 Training epoch 0, iteration 488/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 488 | reduced_train_loss: 8.362 | consumed_samples: 3912 | val_loss: 8.28 Training epoch 0, iteration 489/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 489 | reduced_train_loss: 8.33 | consumed_samples: 3920 | val_loss: 8.28 Training epoch 0, iteration 490/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 490 | reduced_train_loss: 8.2 | consumed_samples: 3928 | val_loss: 8.28 Training epoch 0, iteration 491/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 491 | reduced_train_loss: 8.397 | consumed_samples: 3936 | val_loss: 8.28 Training epoch 0, iteration 492/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 492 | reduced_train_loss: 8.12 | consumed_samples: 3944 | val_loss: 8.28 Training epoch 0, iteration 493/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 493 | reduced_train_loss: 8.295 | consumed_samples: 3952 | val_loss: 8.28 Training epoch 0, iteration 494/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 494 | reduced_train_loss: 8.39 | consumed_samples: 3960 | val_loss: 8.28 Training epoch 0, iteration 495/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 495 | reduced_train_loss: 8.192 | consumed_samples: 3968 | val_loss: 8.28 Training epoch 0, iteration 496/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 496 | reduced_train_loss: 8.25 | consumed_samples: 3976 | val_loss: 8.28 Training epoch 0, iteration 497/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 497 | reduced_train_loss: 8.08 | consumed_samples: 3984 | val_loss: 8.28 Training epoch 0, iteration 498/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 498 | reduced_train_loss: 8.343 | consumed_samples: 3992 | val_loss: 8.28 Training epoch 0, iteration 499/499 | lr: 1e-05 | global_batch_size: 8 | global_step: 499 | reduced_train_loss: 8.03 | consumed_samples: 4000 | val_loss: 8.28 [INFO | pytorch_lightning.utilities.rank_zero]: Epoch 0, global step 499: 'reduced_train_loss' reached 8.02971 (best 8.02971), saving model to '/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt' as top 2 [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [NeMo I 2025-03-11 20:32:11 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 499 : Start time: 1741725131.041s : Save duration: 0.014s [NeMo I 2025-03-11 20:32:14 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt [NeMo I 2025-03-11 20:32:14 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 499 : Start time: 1741725134.016s : Save duration: 0.013s [NeMo I 2025-03-11 20:32:16 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt [NeMo I 2025-03-11 20:32:17 nemo_logging:393] Successfully saved checkpoint from iteration 499 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt [NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async checkpoint save for step 500 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt) finalized successfully. [NeMo I 2025-03-11 20:32:17 nemo_logging:393] Successfully saved checkpoint from iteration 499 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt [NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async checkpoint save for step 500 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt) finalized successfully. [NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async finalization time took 0.090 s Validation: iteration 1/2 Validation: iteration 2/2 Validation: iteration 3/2 Validation: iteration 4/2 Validation: iteration 5/2 Validation: iteration 6/2 Validation: iteration 7/2 Validation: iteration 8/2 [INFO | pytorch_lightning.utilities.rank_zero]: `Trainer.fit` stopped: `max_steps=500` reached. wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync /workspace/bionemo2/results/geneformer-10m/wandb/offline-run-20250311_203046-1 wandb: Find logs at: ../../../../../results/geneformer-10m/wandb/offline-run-20250311_203046-1/logs
Running inference.¶
We can see from the above training job that the model was trained 1000 steps. At the end of training, the experiment manager leaves a message about where the resulting .ckpt
checkpoint is written. This file is used for finetuning, inference, or training from an existing set of model weights. See the example produced below from our run:
[NeMo I 2025-03-11 20:32:11 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 499 : Start time: 1741725131.041s : Save duration: 0.014s
[NeMo I 2025-03-11 20:32:14 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt
[NeMo I 2025-03-11 20:32:14 nemo_logging:393] Global Checkpoint Save : Rank: 0 : Iteration: 499 : Start time: 1741725134.016s : Save duration: 0.013s
[NeMo I 2025-03-11 20:32:16 nemo_logging:393] Scheduled async checkpoint save for /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt
[NeMo I 2025-03-11 20:32:17 nemo_logging:393] Successfully saved checkpoint from iteration 499 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt
[NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async checkpoint save for step 500 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0.ckpt) finalized successfully.
[NeMo I 2025-03-11 20:32:17 nemo_logging:393] Successfully saved checkpoint from iteration 499 to /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt
[NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async checkpoint save for step 500 (/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last.ckpt) finalized successfully.
[NeMo I 2025-03-11 20:32:17 nemo_logging:393] Async finalization time took 0.090 s
We will take the .ckpt
file logged:
/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last
and use this for inference.
# pretrained_nemo_file = '/workspace/bionemo/data/singlecell_tutorial/inference_output/geneformer/2024-05-13_16-53-24/checkpoints/geneformer.nemo'
pretrained_checkpoint_path = "/workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last"
!python /workspace/bionemo2/sub-packages/bionemo-geneformer/src/bionemo/geneformer/scripts/infer_geneformer.py \
--data-dir {test_tutorial_processed_dir} \
--checkpoint-path {pretrained_checkpoint_path} \
--results-path {tutorial_output_dir}
[NeMo W 2025-03-11 20:41:44 nemo_logging:405] /usr/local/lib/python3.12/dist-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:163: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/selective_scan_interface.py:239: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:985: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1044: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:25: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/distributed/tensor_parallel.py:61: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:757: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead. @custom_fwd [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:835: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead. @custom_bwd [NeMo W 2025-03-11 20:41:48 nemo_logging:405] Tokenizer vocab file: /workspace/bionemo2/.cache/bionemo/d8e3ea569bc43768c24aa651aff77722df202078415528497c22394046b08cc3-singlecell-scdltestdata-20241203.tar.gz.untar/cellxgene_2023-12-15_small_processed_scdl/train/geneformer.vocab already exists. Overwriting... [NeMo I 2025-03-11 20:41:48 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Resource already exists, skipping download: https://huggingface.co/ctheodoris/Geneformer/resolve/main/geneformer/gene_dictionaries_30m/gene_name_id_dict_gc30M.pkl?download=true [NeMo I 2025-03-11 20:41:48 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:41:48 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Resource already exists, skipping download: https://huggingface.co/ctheodoris/Geneformer/resolve/main/geneformer/gene_dictionaries_30m/gene_median_dictionary_gc30M.pkl?download=true [NeMo I 2025-03-11 20:41:48 nemo_logging:393] No checksum provided, filename exists. Assuming it is complete. [NeMo I 2025-03-11 20:41:48 nemo_logging:393] *************** Preprocessing Finished ************ [INFO | pytorch_lightning.utilities.rank_zero]: GPU available: True (cuda), used: True [INFO | pytorch_lightning.utilities.rank_zero]: TPU available: False, using: 0 TPU cores [INFO | pytorch_lightning.utilities.rank_zero]: HPU available: False, using: 0 HPUs [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Fixing mis-match between ddp-config & mcore-optimizer config [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has data parallel group : [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has combined group of data parallel and context parallel : [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All data parallel group ranks with context parallel combined: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Ranks 0 has data parallel rank: 0 [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has context parallel group: [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All context parallel group ranks: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Ranks 0 has context parallel rank: 0 [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has model parallel group: [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has tensor model parallel group: [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All tensor model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has tensor model parallel rank: 0 [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has pipeline model parallel group: [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has embedding group: [0] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All pipeline model parallel group ranks: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has pipeline model parallel rank 0 [NeMo I 2025-03-11 20:41:48 nemo_logging:393] All embedding group ranks: [[0]] [NeMo I 2025-03-11 20:41:48 nemo_logging:393] Rank 0 has embedding rank: 0 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 [INFO | pytorch_lightning.utilities.rank_zero]: ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 1 processes ---------------------------------------------------------------------------------------------------- [WARNING | /workspace/bionemo2/sub-packages/bionemo-llm/src/bionemo/llm/model/config.py]: Loading /workspace/bionemo2/results/geneformer-10m/dev/checkpoints/epoch=0-val_loss=8.28-step=499-consumed_samples=4000.0-last [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/models/bert/bert_layer_specs.py:79: UserWarning: Attribute bert_layer_specs.bert_layer_with_transformer_engine_spec is on a deprecation track and will be removed in future releases. Please migrate to bert_layer_specs.get_bert_layer_with_transformer_engine_spec(). warnings.warn( [NeMo I 2025-03-11 20:41:49 nemo_logging:393] Padded vocab_size: 25472, original vocab_size: 25429, dummy tokens: 43. [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py:339: UserWarning: TransformerLayer._get_layer_offset is deprecated.Please use get_transformer_layer_offset instead. warnings.warn( [WARNING | py.warnings ]: /workspace/bionemo2/3rdparty/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:847: FutureWarning: `load_state_dict` is deprecated and will be removed in future versions. Please use `load` instead. checkpoint.load_state_dict( [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/planner_helpers.py:316: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. device = getattr(value, "device", None) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3] [NeMo W 2025-03-11 20:41:49 nemo_logging:405] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored. [NeMo I 2025-03-11 20:41:49 nemo_logging:393] > number of parameters on (tensor, pipeline) model parallel rank (0 ,0): 10300032 [WARNING | py.warnings ]: /usr/local/lib/python3.12/dist-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'predict_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=63` in the `DataLoader` to improve performance.
!ls -altrh {tutorial_output_dir}/
tutorial_output_inference_pickle = f"{tutorial_output_dir}/predictions__rank_0.pt"
!ls -altrh {tutorial_output_inference_pickle}
total 128K drwxr-xr-x 5 jomitchell domain-users 4.0K Mar 11 18:57 .. -rw-r--r-- 1 jomitchell domain-users 118K Mar 11 20:41 predictions__rank_0.pt drwxr-xr-x 2 jomitchell domain-users 4.0K Mar 11 20:41 . -rw-r--r-- 1 jomitchell domain-users 118K Mar 11 20:41 /workspace/bionemo2/data/singlecell_tutorial/inference_output/predictions__rank_0.pt
Load inference result and cluster with UMAP.¶
Now we will inspect our result. First, we expect there to be one prediction for each cell, we can compare the shape of the anndata object to the predictions produced by our model. After this, we can simply pass our embeddings into umap, and view the result! In this case its a very poorly trained model with very few cells, so keep expectations low!
The inference_results .pt file contains one set of hiddens and embeddings for each cell. The hiddens contain the embedding per-token, whereas the embeddings contain the mean embedding for all gene tokens with special tokens (CLS, MASK, etc) removed.
# Load inference results with torch load
import torch
inference_results = torch.load(tutorial_output_inference_pickle)
print(inference_results.keys())
# print(len(inference_results), adata.shape, inference_results[0].keys())
print(inference_results["embeddings"].shape)
dict_keys(['token_logits', 'binary_logits', 'embeddings']) torch.Size([232, 256]) Inference results data type: torch.float32
import umap
reducer = umap.UMAP()
embedding = reducer.fit_transform(inference_results["embeddings"].float())
/usr/local/lib/python3.12/dist-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8. warnings.warn(
print("embedding.shape: ", embedding.shape)
print("adata_test.obs.shape[0]: ", adata_test.obs.shape[0])
assert adata_test.obs.shape[0] == inference_results["embeddings"].shape[0]
embedding.shape: (232, 2) adata_test.obs.shape[0]: 232
from matplotlib import pyplot as plt
results = adata_test.obs.copy()
results["x"] = embedding[:, 0]
results["y"] = embedding[:, 1]
covariates = ["assay", "development_stage", "dataset_id", "sex"]
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True, figsize=(10, 10))
for ax, covar in zip(axes.flat, covariates):
for cov, cov_df in results.groupby(covar):
ax.scatter(
cov_df.x,
cov_df.y,
s=3,
alpha=0.75,
label=cov,
)
if len(results[covar].unique()) < 8:
ax.legend()
ax.set_title(f"Embeddings by {covar}")
/tmp/ipykernel_29198/3842729369.py:11: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. for cov, cov_df in results.groupby(covar):
adata_test.obs.columns
Index(['soma_joinid', 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'raw_sum', 'nnz', 'raw_mean_nnz', 'raw_variance_nnz', 'n_measured_vars'], dtype='object')