Loss
BERTMLMLossWithReduction
Bases: MegatronLossReduction
Source code in bionemo/llm/model/loss.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
forward(batch, forward_out)
Forward impl.
https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/megatron_parallel.py#L1733
Note that Method signature is slightly different from NeMo as the NeMo signature is incorrect.
Source code in bionemo/llm/model/loss.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
reduce(losses_reduced_per_micro_batch)
Loss reduction impl.
Taken from: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py#L534-L552 .
Source code in bionemo/llm/model/loss.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
DataParallelGroupLossAndIO
Bases: TypedDict
Average losses across the data parallel group + the original batch and inference output.
Source code in bionemo/llm/model/loss.py
55 56 57 58 59 60 |
|
PerTokenLossDict
Bases: TypedDict
Tensor dictionary for loss.
This is the return type for a loss that is computed per token in the batch, supporting microbatches of varying sizes.
Source code in bionemo/llm/model/loss.py
37 38 39 40 41 42 43 |
|
SameSizeLossDict
Bases: TypedDict
Tensor dictionary for loss.
This is the return type for a loss that is computed for the entire batch, where all microbatches are the same size.
Source code in bionemo/llm/model/loss.py
46 47 48 49 50 51 52 |
|
unreduced_token_loss_fn(logits, labels, cross_entropy_loss_fusion=False)
Computes the unreduced token loss given the logits and labels without regard to the loss mask.
WARNING: This function does not apply a loss mask. Also, it does inplace operation on the inputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logits
|
Tensor
|
The predicted logits of shape [sequence_length, batch_size, num_classes]. |
required |
labels
|
Tensor
|
The true labels of shape [batch_size, sequence_length]. |
required |
cross_entropy_loss_fusion
|
bool
|
If True, use the fused kernel version of vocab parallel cross entropy. This should generally be preferred for speed as it packs more operations into a single kernel on the GPU. However some users have observed reduced training stability when using this method. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Tensor |
Tensor
|
The unreduced token loss of shape [batch_size, sequence_length]. |
Source code in bionemo/llm/model/loss.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|