Testing callbacks
AbstractStopAndGoCallback
Bases: ABC
, BaseInterruptedVsContinuousCallback
Abstract base class for stop-and-go callback to compare metadata before pausing and after resuming training.
This base class provides utility methods to help streamline stop and go comparison.
Provided methods
- init: initializes the callback with the given mode.
- get_metadata: abstract method that should be overridden to get metadata from the trainer and pl_module.
Default behaviors
- in stop mode, metadata is gotten and compared on_validation_epoch_end.
- in go mode, metadata is gotten and saved on_train_epoch_start.
Override these behaviors if necessary.
Source code in bionemo/testing/testing_callbacks.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
__init__(mode=Mode.STOP)
Initialize StopAndGoCallback.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode
|
str
|
Mode to run in. Must be either Mode.STOP or Mode.RESUME. Defaults to Mode.STOP. |
STOP
|
Notes
User must override get_metadata to get metadata from the trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
get_metadata(trainer, pl_module)
abstractmethod
Get metadata from trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
231 232 233 234 |
|
BaseInterruptedVsContinuousCallback
Bases: Callback
, CallbackMethods
, IOMixin
Base class for serializable stop-and-go callback to compare continuous to interrupted training.
This class is used by extending a callback and collecting data into the self.data
attribute. This data is then
compared between continuous and interrupted training.
See nemo.lightning.megatron_parallel.CallbackMethods for the available callback methods.
Source code in bionemo/testing/testing_callbacks.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
__deepcopy__(memo)
Don't actually attempt to copy this data when this callback is being serialized.
Source code in bionemo/testing/testing_callbacks.py
57 58 59 |
|
__init__()
Initializes the callback.
Source code in bionemo/testing/testing_callbacks.py
53 54 55 |
|
ConsumedSamplesCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
82 83 84 85 86 87 88 89 90 91 92 93 |
|
on_megatron_step_start(step)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
85 86 87 88 89 90 91 92 93 |
|
GlobalStepStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for global_step before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
72 73 74 75 76 77 78 79 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
75 76 77 78 79 |
|
LearningRateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for learning rate before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
62 63 64 65 66 67 68 69 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
65 66 67 68 69 |
|
OptimizerStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check optimizer states before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
on_megatron_step_start(step)
Get optimizer states as metadata.
Source code in bionemo/testing/testing_callbacks.py
187 188 189 190 191 192 193 194 195 196 197 198 |
|
StopAfterValidEpochEndCallback
Bases: Callback
A callback that raises a StopAndGoException after the validation epoch.
Use this callback for pytest based Stop and go tests.
Source code in bionemo/testing/testing_callbacks.py
32 33 34 35 36 37 38 39 40 41 |
|
TrainInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
99 100 101 102 103 104 105 106 107 108 |
|
TrainLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
156 157 158 159 160 161 162 163 164 165 166 167 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
159 160 161 162 163 164 165 166 167 |
|
TrainOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
129 130 131 132 133 134 135 136 137 138 |
|
TrainValInitConsumedSamplesStopAndGoCallback
Bases: AbstractStopAndGoCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
This is currently the only callback that doesn't fit with the new pattern of directly comparing continuous and interrupted training, since the dataloaders don't track their consumed_samples before and after checkpoint resumption.
Source code in bionemo/testing/testing_callbacks.py
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
get_metadata(trainer, pl_module)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
253 254 255 256 257 258 259 |
|
ValidInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
114 115 116 117 118 119 120 121 122 123 |
|
ValidLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
170 171 172 173 174 175 176 177 178 179 180 181 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
173 174 175 176 177 178 179 180 181 |
|
ValidOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
144 145 146 147 148 149 150 151 152 153 |
|