Testing callbacks
AbstractStopAndGoCallback
Bases: ABC
, BaseInterruptedVsContinuousCallback
Abstract base class for stop-and-go callback to compare metadata before pausing and after resuming training.
This base class provides utility methods to help streamline stop and go comparison.
Provided methods
- init: initializes the callback with the given mode.
- get_metadata: abstract method that should be overridden to get metadata from the trainer and pl_module.
Default behaviors
- in stop mode, metadata is gotten and compared on_validation_epoch_end.
- in go mode, metadata is gotten and saved on_train_epoch_start.
Override these behaviors if necessary.
Source code in bionemo/testing/testing_callbacks.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
|
__init__(mode=Mode.STOP)
Initialize StopAndGoCallback.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode
|
str
|
Mode to run in. Must be either Mode.STOP or Mode.RESUME. Defaults to Mode.STOP. |
STOP
|
Notes
User must override get_metadata to get metadata from the trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
237 238 239 240 241 242 243 244 245 246 247 248 249 |
|
get_metadata(trainer, pl_module)
abstractmethod
Get metadata from trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
251 252 253 254 |
|
BaseInterruptedVsContinuousCallback
Bases: Callback
, CallbackMethods
, IOMixin
Base class for serializable stop-and-go callback to compare continuous to interrupted training.
This class is used by extending a callback and collecting data into the self.data
attribute. This data is then
compared between continuous and interrupted training.
See nemo.lightning.megatron_parallel.CallbackMethods for the available callback methods.
Source code in bionemo/testing/testing_callbacks.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
__deepcopy__(memo)
Don't actually attempt to copy this data when this callback is being serialized.
Source code in bionemo/testing/testing_callbacks.py
77 78 79 |
|
__init__()
Initializes the callback.
Source code in bionemo/testing/testing_callbacks.py
73 74 75 |
|
ConsumedSamplesCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
102 103 104 105 106 107 108 109 110 111 112 113 |
|
on_megatron_step_start(step)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
105 106 107 108 109 110 111 112 113 |
|
GlobalStepStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for global_step before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
92 93 94 95 96 97 98 99 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
95 96 97 98 99 |
|
LearningRateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for learning rate before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
82 83 84 85 86 87 88 89 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
85 86 87 88 89 |
|
OptimizerStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check optimizer states before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
on_megatron_step_start(step)
Get optimizer states as metadata.
Source code in bionemo/testing/testing_callbacks.py
207 208 209 210 211 212 213 214 215 216 217 218 |
|
SignalAfterGivenStepCallback
Bases: Callback
, CallbackMethods
A callback that emits a given signal to the current process at the defined step.
Use this callback for pytest based Stop and go tests.
Source code in bionemo/testing/testing_callbacks.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
__init__(stop_step, signal_=signal.SIGUSR2)
Initializes the callback with the given stop_step.
Source code in bionemo/testing/testing_callbacks.py
52 53 54 55 |
|
on_megatron_step_start(step)
Stop training if the global step is greater than or equal to the stop_step.
Source code in bionemo/testing/testing_callbacks.py
57 58 59 60 61 |
|
StopAfterValidEpochEndCallback
Bases: Callback
, CallbackMethods
A callback that stops training after the validation epoch.
Use this callback for pytest based Stop and go tests.
Source code in bionemo/testing/testing_callbacks.py
34 35 36 37 38 39 40 41 42 43 |
|
TrainInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
119 120 121 122 123 124 125 126 127 128 |
|
TrainLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
176 177 178 179 180 181 182 183 184 185 186 187 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
179 180 181 182 183 184 185 186 187 |
|
TrainOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
149 150 151 152 153 154 155 156 157 158 |
|
TrainValInitConsumedSamplesStopAndGoCallback
Bases: AbstractStopAndGoCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
This is currently the only callback that doesn't fit with the new pattern of directly comparing continuous and interrupted training, since the dataloaders don't track their consumed_samples before and after checkpoint resumption.
Source code in bionemo/testing/testing_callbacks.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
|
get_metadata(trainer, pl_module)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
273 274 275 276 277 278 279 |
|
ValidInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
134 135 136 137 138 139 140 141 142 143 |
|
ValidLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
190 191 192 193 194 195 196 197 198 199 200 201 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
193 194 195 196 197 198 199 200 201 |
|
ValidOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
164 165 166 167 168 169 170 171 172 173 |
|