Skip to content

Utils

Shared constants and helper functions for the BaseCamp Research dataloader.

This module is used by multiple recipes via bionemo-recipeutils. It must not import megatron-core, megatron-bridge, or NeMo.

extract_sample_id(sequence_id)

Extract sample ID from sequence ID format: BCR__EXT-SAMPLE1__CT1-1.

Source code in bionemo/recipeutils/data/basecamp/utils.py
27
28
29
30
def extract_sample_id(sequence_id: str) -> str:
    """Extract sample ID from sequence ID format: BCR__EXT-SAMPLE1__CT1-1."""
    parts = sequence_id.split("__")[1].split("-")[1:]
    return ".".join(parts)