Config models
ExposedFineTuneSeqLenBioBertConfig
Bases: ExposedModelConfig[FineTuneSeqLenBioBertConfig]
Config for models that fine-tune a BioBERT model from a pre-trained checkpoint.
Source code in bionemo/geneformer/run/config_models.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
model_class()
Binds the class to FineTuneSeqLenBioBertConfig.
Source code in bionemo/geneformer/run/config_models.py
164 165 166 |
|
ExposedGeneformerPretrainConfig
Bases: ExposedModelConfig[GeneformerConfig]
Exposes custom parameters for pretraining and binds the class to GeneformerConfig.
Attributes:
Name | Type | Description |
---|---|---|
initial_ckpt_path |
str
|
Path to a directory containing checkpoint files for initializing the model. This is only |
initial_ckpt_skip_keys_with_these_prefixes |
List[str]
|
Skip any layer that contains this key during restoration. Useful for finetuning, set the names of the task heads so checkpoint restoration does not errorniously try to restore these. |
Source code in bionemo/geneformer/run/config_models.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
GeneformerDataArtifacts
dataclass
Data artifacts produced by the geneformer preprocess.
Source code in bionemo/geneformer/run/config_models.py
36 37 38 39 40 41 |
|
GeneformerPretrainingDataConfig
Bases: DataConfig[SingleCellDataModule]
Configuration class for Geneformer pretraining data.
Expects train/test/val to be prior split by directory and processed by sub-packages/bionemo-scdl/src/bionemo/scdl/scripts/convert_h5ad_to_scdl.py
.
Attributes:
Name | Type | Description |
---|---|---|
data_dir |
str
|
Directory where the data is stored. |
result_dir |
str | Path
|
Directory where the results will be stored. Defaults to "./results". |
micro_batch_size |
int
|
Size of the micro-batch. Defaults to 8. |
seq_length |
int
|
Sequence length for the data. Defaults to 2048. |
num_dataset_workers |
int
|
Number of workers for data loading. Defaults to 0. |
Properties
train_data_path (str): Path to the training data. val_data_path (str): Path to the validation data. test_data_path (str): Path to the test data.
Methods:
Name | Description |
---|---|
geneformer_preprocess |
Preprocesses the data using a legacy preprocessor from BioNeMo 1 and returns the necessary artifacts. |
construct_data_module |
int) -> SingleCellDataModule: Constructs and returns a SingleCellDataModule using the preprocessed data artifacts. |
Source code in bionemo/geneformer/run/config_models.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
construct_data_module(global_batch_size)
Downloads the requisite data artifacts and instantiates the DataModule.
Source code in bionemo/geneformer/run/config_models.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
geneformer_preprocess()
Geneformer datamodule expects certain artifacts to be present in the data directory.
This method uses a legacy 'preprocessor' from BioNeMo 1 to acquire the associated artifacts.
Source code in bionemo/geneformer/run/config_models.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|