Skip to content

Utils

sample_or_truncate(gene_ids, max_length, sample=True)

Truncate and pad samples.

Parameters:

Name Type Description Default
gene_ids ndarray

Array of gene IDs.

required
max_length int

Maximum length of the samples.

required
sample bool

Whether to sample or truncate the samples. Defaults to True.

True

Returns:

Type Description
ndarray

np.array: Tuple containing the truncated or padded gene IDs.

Source code in bionemo/geneformer/data/singlecell/utils.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def sample_or_truncate(
    gene_ids: np.ndarray,
    max_length: int,
    sample: bool = True,
) -> np.ndarray:
    """Truncate and pad samples.

    Args:
        gene_ids (np.ndarray): Array of gene IDs.
        max_length (int): Maximum length of the samples.
        sample (bool, optional): Whether to sample or truncate the samples. Defaults to True.

    Returns:
        np.array: Tuple containing the truncated or padded gene IDs.
    """
    if len(gene_ids) <= max_length:
        return gene_ids

    if sample:
        indices = np.random.permutation(len(gene_ids))[:max_length]
        return gene_ids[indices]
    else:
        return gene_ids[:max_length]