Config models
DataConfig
Bases: BaseModel
, Generic[DataModuleT]
, ABC
Base class for all data configurations.
This class is used to define the interface for all data configurations. It is used to define the data module that will be used in the training loop.
Source code in bionemo/llm/run/config_models.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
construct_data_module(global_batch_size)
abstractmethod
Construct the data module from the configuration. Cannot be defined generically.
Source code in bionemo/llm/run/config_models.py
61 62 63 64 |
|
custom_model_validator(global_cfg)
Use custom implementation of this method to define the things inside global_config.
The following expression will always be true:
global_cfg.data_config == self
Source code in bionemo/llm/run/config_models.py
66 67 68 69 70 71 72 73 |
|
ExperimentConfig
Bases: BaseModel
Configuration class for setting up and managing experiment parameters.
Attributes:
Name | Type | Description |
---|---|---|
save_every_n_steps |
int
|
Number of steps between saving checkpoints. |
result_dir |
str | Path
|
Directory where results will be saved. |
experiment_name |
str
|
Name of the experiment. |
restore_from_checkpoint_path |
Optional[str]
|
Path to restore from a checkpoint. Note: This does not invoke the checkpoint callback as expected. |
save_last_checkpoint |
bool
|
Flag to save the last checkpoint. Default is True. |
metric_to_monitor_for_checkpoints |
str
|
Metric to monitor for saving top-k checkpoints. Default is "reduced_train_loss". |
save_top_k |
int
|
Number of top checkpoints to save based on the monitored metric. Default is 2. |
create_tensorboard_logger |
bool
|
Flag to create a TensorBoard logger. Default is False. |
Source code in bionemo/llm/run/config_models.py
309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
|
ExposedModelConfig
Bases: BaseModel
, Generic[ModelConfigT]
, ABC
BioNeMo model configuration class, wraps TransformerConfig and friends.
This class is used to define the interface for all model configurations. It is Exposed to guard against ill-typed
or poorly defined fields in the underlying configuration objects. ModelConfigT
declares the associated type of the
underlying config (most commonly a BioBertGenericConfig, but could also be a TransformerConfig or something similar).
Children should try to expose the minimal set of fields necessary for the user to configure the model while keeping
the more esoteric configuration private to the underlying ModelConfigT.
Source code in bionemo/llm/run/config_models.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
custom_model_validator(global_cfg)
Use custom implementation of this method to define the things inside global_config.
The following expression will always be true:
global_cfg.bionemo_model_config == self
Source code in bionemo/llm/run/config_models.py
99 100 101 102 103 104 105 106 |
|
exposed_to_internal_bionemo_model_config()
Converts the exposed dataclass to the underlying Transformer config.
The underlying ModelConfigT may both be incomplete and unserializable. We use this transformation as a way to hide fields that are either not serializable by Pydantic or that we do not want to expose.
Source code in bionemo/llm/run/config_models.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
model_class()
Returns the underlying model class that this config wraps.
Source code in bionemo/llm/run/config_models.py
95 96 97 |
|
precision_validator(v)
classmethod
Validates the precision type and returns the corresponding torch dtype.
Source code in bionemo/llm/run/config_models.py
218 219 220 221 222 |
|
serialize_activation_func(v)
Serializes a given activation function to its corresponding string representation.
By default, all activation functions from torch.nn.functional
are serialized to their name. User defined
activation functions should also be defined here with a custom mapping in CUSTOM_ACTIVATION_FNS defined at the
top of this file. This allows our Pydantic model to serialize and deserialize the activation function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
v
|
Callable[[Tensor, Any], Tensor]
|
The activation function to serialize. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The name of the activation function if it is a standard PyTorch function, or the corresponding serialization key if it is a custom activation function. |
Raises:
Type | Description |
---|---|
ValueError
|
If the activation function is not supported. |
Source code in bionemo/llm/run/config_models.py
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
|
serialize_dtypes(v)
Serializes the torch dtype to the corresponding precision type.
Source code in bionemo/llm/run/config_models.py
224 225 226 227 |
|
validate_activation_func(activation_func)
classmethod
Validates the activation function, assumes this function exists in torch.nn.functional.
For custom activation functions, use the CUSTOM_ACTIVATION_FUNCTIONS dictionary in the module. This method validates the provided activation function string and returns a callable function based on the validation context using the provided validator in the base class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
activation_func
|
str
|
The activation function to be validated. |
required |
context
|
ValidationInfo
|
The context for validation. |
required |
Returns:
Name | Type | Description |
---|---|---|
Callable |
Callable
|
A callable function after validation. |
See Also
CUSTOM_ACTIVATION_FNS
Source code in bionemo/llm/run/config_models.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
MainConfig
Bases: BaseModel
, Generic[ExModelConfigT, DataConfigT]
Main configuration class for BioNeMo. All serialized configs that are a valid MainConfig should be Runnable.
This class is used to define the main configuration for BioNeMo. It defines the minimal pieces of configuration to execution a training job with the NeMo2 training api. It accepts two generic type parameters which users must define in their own environment for execution.
Additionally, this class assumes that the configs for ExposedModelConfig and DataConfig may have custom validators implemented that operate on the entire MainConfig. This prevents the need from type based conditionals inside this class while still allowing for custom validation global logic to be implemented in the underlying classes. For example, some models may want to restrict their Datamodules seq_length to a certain value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_config
|
Generic config type that contains instructions on instantiating the required DataModule. |
required | |
parallel_config
|
The parallel configuration for the model. |
required | |
training_config
|
The training configuration for the model. |
required | |
bionemo_model_config
|
Generic ExposedModelConfig type. This class hides extra configuration parameters in the underlying model configuration as well as providing |
required | |
optim_config
|
The optimizer/scheduler configuration for the model. |
required | |
experiment_config
|
The experiment configuration for the model. |
required | |
wandb_config
|
Optional, the wandb configuration for the model. |
required |
Source code in bionemo/llm/run/config_models.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 |
|
run_bionemo_model_config_model_validators()
Runs the model validators on the bionemo_model_config.
Source code in bionemo/llm/run/config_models.py
378 379 380 381 |
|
run_data_config_model_validators()
Runs the model validators on the data_config.
Source code in bionemo/llm/run/config_models.py
383 384 385 386 |
|
validate_master_config()
Validates the master configuration object.
Source code in bionemo/llm/run/config_models.py
372 373 374 375 376 |
|
OptimizerSchedulerConfig
Bases: BaseModel
Configuration for the optimizer and learning rate scheduler.
Attributes:
Name | Type | Description |
---|---|---|
lr |
float
|
Learning rate for the optimizer. Default is 1e-4. |
optimizer |
str
|
Type of optimizer to use. Default is "adam". |
interval |
str
|
Interval for updating the learning rate scheduler. Default is "step". |
monitor |
str
|
Metric to monitor for learning rate adjustments. Default is "val_loss". |
interval |
str
|
Interval for updating the learning rate scheduler. Default is "step". |
monitor |
str
|
Metric to monitor for learning rate adjustments. Default is "val_loss". |
warmup_steps |
int
|
Number of warmup steps for use with the warmup annealing learning rate scheduler. Default is 0. |
lr_scheduler |
Literal['warmup_anneal', 'cosine']
|
Type of learning rate scheduler to use. Default is 'warmup_anneal'. NOTE this is likely to change. |
Source code in bionemo/llm/run/config_models.py
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 |
|
ParallelConfig
Bases: BaseModel
ParallelConfig is a configuration class for setting up parallelism in model training.
Attributes:
Name | Type | Description |
---|---|---|
tensor_model_parallel_size |
int
|
The size of the tensor model parallelism. Default is 1. |
pipeline_model_parallel_size |
int
|
The size of the pipeline model parallelism. Default is 1. |
accumulate_grad_batches |
int
|
The number of batches to accumulate gradients over. Default is 1. |
ddp |
Literal['megatron']
|
The distributed data parallel method to use. Default is "megatron". |
remove_unused_parameters |
bool
|
Whether to remove unused parameters. Default is True. |
num_devices |
int
|
The number of devices to use. Default is 1. |
num_nodes |
int
|
The number of nodes to use. Default is 1. |
Methods:
Name | Description |
---|---|
validate_devices |
Validates the number of devices based on the tensor and pipeline model parallel sizes. |
Source code in bionemo/llm/run/config_models.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
validate_devices()
Validates the number of devices based on the tensor and pipeline model parallel sizes.
Source code in bionemo/llm/run/config_models.py
254 255 256 257 258 259 |
|
TrainingConfig
Bases: BaseModel
TrainingConfig is a configuration class for training models.
Attributes:
Name | Type | Description |
---|---|---|
max_steps |
int
|
The maximum number of training steps. |
limit_val_batches |
int | float
|
The number of validation batches to use. Can be a fraction or a count. |
val_check_interval |
int
|
The interval (in steps) at which to check validation. |
precision |
Literal['32', 'bf16-mixed', '16-mixed']
|
The precision to use for training. Defaults to "bf16-mixed". |
accelerator |
str
|
The type of accelerator to use for training. Defaults to "gpu". |
gc_interval |
int
|
The interval of global steps at which to run synchronized garbage collection. Useful for synchronizing garbage collection when performing distributed training. Defaults to 0. |
include_perplexity |
bool
|
Whether to include perplexity in the validation logs. Defaults to False. |
Source code in bionemo/llm/run/config_models.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
|