Loss
BERTMLMLossWithReduction
Bases: _Nemo2CompatibleLossReduceMixin
, MegatronLossReduction
Source code in bionemo/llm/model/loss.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
__init__(validation_step=False, val_drop_last=True, send_train_output=False, send_val_output=True)
Initializes the Model class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
validation_step
|
bool
|
Whether this object is being applied to the validation step. Defaults to False. |
False
|
val_drop_last
|
bool
|
Whether the last batch is configured to be dropped during validation. Defaults to True. |
True
|
send_train_output
|
bool
|
Whether to return the model output in training. Defaults to False. |
False
|
send_val_output
|
bool
|
Whether to return the model output in validation. Defaults to True. |
True
|
include_forward_output_for_metrics
|
bool
|
Some downstream metrics such as perplexity require this. It can be expensive to return however, so disable this if performance is a top consideration. |
required |
Source code in bionemo/llm/model/loss.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
forward(batch, forward_out)
Computes loss of labels
in the batch vs token_logits
in the forward output currently. In the future this will be extended
to handle other loss types like sequence loss if it is present in the forward_out and batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch
|
Dict[str, Tensor]
|
The batch of data. Each tensor should be of shape [batch_size, , ], and match the corresponding dimension for that particular key in the batch output. For example, the "labels" and "token_logits" key should have a tensor of shape [batch_size, sequence_length]. |
required |
forward_out
|
Dict[str, Tensor]
|
The forward output from the model. Each tensor should be of shape [batch_size, , ] |
required |
Taken from: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py#L951-L976 .
Source code in bionemo/llm/model/loss.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
|
DataParallelGroupLossAndIO
Bases: TypedDict
Average losses across the data parallel group + the original batch and inference output.
Source code in bionemo/llm/model/loss.py
57 58 59 60 61 62 |
|
PerTokenLossDict
Bases: TypedDict
Tensor dictionary for loss.
This is the return type for a loss that is computed per token in the batch, supporting microbatches of varying sizes.
Source code in bionemo/llm/model/loss.py
39 40 41 42 43 44 45 |
|
SameSizeLossDict
Bases: TypedDict
Tensor dictionary for loss.
This is the return type for a loss that is computed for the entire batch, where all microbatches are the same size.
Source code in bionemo/llm/model/loss.py
48 49 50 51 52 53 54 |
|
unreduced_token_loss_fn(logits, labels, cross_entropy_loss_fusion=False)
Computes the unreduced token loss given the logits and labels without regard to the loss mask.
WARNING: This function does not apply a loss mask. Also, it does inplace operation on the inputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logits
|
Tensor
|
The predicted logits of shape [sequence_length, batch_size, num_classes]. |
required |
labels
|
Tensor
|
The true labels of shape [batch_size, sequence_length]. |
required |
cross_entropy_loss_fusion
|
bool
|
If True, use the fused kernel version of vocab parallel cross entropy. This should generally be preferred for speed as it packs more operations into a single kernel on the GPU. However some users have observed reduced training stability when using this method. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Tensor |
Tensor
|
The unreduced token loss of shape [batch_size, sequence_length]. |
Source code in bionemo/llm/model/loss.py
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|