Callbacks
TEVCallback
Bases: Callback
Callback for logging TEV statistics before each optimizer step.
This callback handles different parallelism strategies: - Pipeline Parallelism: Only computes on first pipeline stage - Tensor Parallelism: Gathers embedding shards across TP ranks - Context Parallelism: Gathers across CP ranks - Data Parallelism: Only logs on rank 0 of each model parallel group
Source code in bionemo/evo2/utils/logging/callbacks.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
on_before_optimizer_step(trainer, pl_module, optimizer)
Called before each optimizer step during training.
This method calculates and logs Token Embedding Variance (TEV) statistics: 1. Gets embedding parameter only on pipeline rank 0 (where embeddings live) 2. Gathers embedding shards across tensor and context parallel ranks 3. Calculates the token embedding variance (TEV) 4. Logs the mean and standard deviation of TEV values only on data parallel rank 0
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trainer
|
The Lightning trainer instance |
required | |
pl_module
|
The current Lightning module being trained |
required | |
optimizer
|
The optimizer being used |
required |
Note
The callback assumes embeddings live on pipeline rank 0, which is the standard configuration in Megatron-LM.
Source code in bionemo/evo2/utils/logging/callbacks.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|