Config models
ExposedFineTuneSeqLenBioBertConfig
Bases: ExposedModelConfig[FineTuneSeqLenBioBertConfig]
Config for models that fine-tune a BioBERT model from a pre-trained checkpoint.
Source code in bionemo/geneformer/run/config_models.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
model_class()
Binds the class to FineTuneSeqLenBioBertConfig.
Source code in bionemo/geneformer/run/config_models.py
153 154 155 |
|
ExposedGeneformerPretrainConfig
Bases: ExposedModelConfig[GeneformerConfig]
Exposes custom parameters for pretraining and binds the class to GeneformerConfig.
Attributes:
Name | Type | Description |
---|---|---|
initial_ckpt_path |
str
|
Path to a directory containing checkpoint files for initializing the model. This is only |
initial_ckpt_skip_keys_with_these_prefixes |
List[str]
|
Skip any layer that contains this key during restoration. Useful for finetuning, set the names of the task heads so checkpoint restoration does not errorniously try to restore these. |
Source code in bionemo/geneformer/run/config_models.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
GeneformerDataArtifacts
dataclass
Data artifacts produced by the geneformer preprocess.
Source code in bionemo/geneformer/run/config_models.py
33 34 35 36 37 38 |
|
GeneformerPretrainingDataConfig
Bases: DataConfig[SingleCellDataModule]
Configuration class for Geneformer pretraining data.
Expects train/test/val to be prior split by directory and processed by sub-packages/bionemo-geneformer/src/bionemo/geneformer/data/singlecell/sc_memmap.py
.
Attributes:
Name | Type | Description |
---|---|---|
data_dir |
str
|
Directory where the data is stored. |
result_dir |
str | Path
|
Directory where the results will be stored. Defaults to "./results". |
micro_batch_size |
int
|
Size of the micro-batch. Defaults to 8. |
seq_length |
int
|
Sequence length for the data. Defaults to 2048. |
num_dataset_workers |
int
|
Number of workers for data loading. Defaults to 0. |
Properties
train_data_path (str): Path to the training data. val_data_path (str): Path to the validation data. test_data_path (str): Path to the test data.
Methods:
Name | Description |
---|---|
geneformer_preprocess |
Preprocesses the data using a legacy preprocessor from BioNeMo 1 and returns the necessary artifacts. |
construct_data_module |
int) -> SingleCellDataModule: Constructs and returns a SingleCellDataModule using the preprocessed data artifacts. |
Source code in bionemo/geneformer/run/config_models.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
construct_data_module(global_batch_size)
Downloads the requisite data artifacts and instantiates the DataModule.
Source code in bionemo/geneformer/run/config_models.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
geneformer_preprocess()
Geneformer datamodule expects certain artifacts to be present in the data directory.
This method uses a legacy 'preprocessor' from BioNeMo 1 to acquire the associated artifacts.
Source code in bionemo/geneformer/run/config_models.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|