Datamodule
MegatronDataModule
Bases: LightningDataModule
A mixin that adds a state_dict
and load_state_dict
method for datamodule training resumption in NeMo.
Source code in bionemo/llm/data/datamodule.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
__init__(*args, **kwargs)
Set init_global_step to 0 for datamodule resumption.
Source code in bionemo/llm/data/datamodule.py
32 33 34 35 |
|
load_state_dict(state_dict)
Called when loading a checkpoint, implement to reload datamodule state given datamodule stat.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state_dict
|
Dict[str, Any]
|
the datamodule state returned by |
required |
Source code in bionemo/llm/data/datamodule.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
state_dict()
Called when saving a checkpoint, implement to generate and save datamodule state.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dictionary containing datamodule state. |
Source code in bionemo/llm/data/datamodule.py
44 45 46 47 48 49 50 51 52 |
|
update_init_global_step()
Please always call this when you get a new dataloader... if you forget, your resumption will not work.
Source code in bionemo/llm/data/datamodule.py
37 38 39 40 41 42 |
|
MockDataModule
Bases: MegatronDataModule
A simple data module that just wraps input datasets with dataloaders.
Source code in bionemo/llm/data/datamodule.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
__init__(train_dataset=None, valid_dataset=None, test_dataset=None, predict_dataset=None, pad_token_id=0, min_seq_length=None, max_seq_length=512, micro_batch_size=16, global_batch_size=16, num_workers=4)
Initialize the MockDataModule.
Source code in bionemo/llm/data/datamodule.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|