validation
Model validation and loss calculation utilities for single-GPU and multi-GPU setups.
Also provides helper functions for loss metrics, KL divergence, JS divergence, and similarity losses for knowledge distillation.
Classes
Functions
Do model forward on each batch and calculate LM loss. |
|
Kullback-Leibler Divergence for a single sample. |
- class LowMemorySparseTensor
Bases:
object- __init__(x)
- Parameters:
x (Tensor)
- to(*args)
- Return type:
Self
- to_dense()
- Return type:
Tensor
- calculate_batch_outputs(hidden_states, target_hidden_states, logits, target_logits, targets, return_hidden_states, calculate_full_score_ablations, calc_on_cpu)
- Parameters:
hidden_states (Tensor | None)
target_hidden_states (Tensor | None)
logits (Tensor)
target_logits (Tensor | None)
targets (Tensor)
return_hidden_states (bool)
calculate_full_score_ablations (bool)
calc_on_cpu (bool)
- Return type:
dict
- calculate_losses(model, dataloader, target_probs=None, return_probs=False, checkpoint_manager=None)
Do model forward on each batch and calculate LM loss.
Works on lit-llama models (single GPU) and HuggingFace models (can be multi-GPU). Does not support data-parallel.
Note
Anything related to probs and hidden states is not supported currently.
- Returns:
Tuple of
(outputs, None).outputsis a dict:{ "lm_loss": [float, ...], "token_accuracy_top_1": [float, ...], "token_accuracy_top_5": [float, ...], "token_accuracy_top_10": [float, ...], }
- Parameters:
model (Module)
dataloader (DataLoader)
target_probs (None)
return_probs (bool)
- Return type:
tuple[dict[str, dict], None] | tuple[None, None]
- cosine_embedding_loss(hidden_states, target_hidden_states)
- Parameters:
hidden_states (Tensor)
target_hidden_states (Tensor)
- Return type:
list[float]
- kl_div(logits, target_probs, clip_epsilon=ClipEpsilon.NO_CLIP, epsilon_factor=1.0)
Kullback-Leibler Divergence for a single sample. logits: [tokens, vocab] target_probs: [tokens, vocab]
- Parameters:
logits (Tensor)
target_probs (Tensor)
clip_epsilon (ClipEpsilon)
epsilon_factor (float)
- Return type:
float
- mse_loss(hidden_states, target_hidden_states)
- Parameters:
hidden_states (Tensor)
target_hidden_states (Tensor)
- Return type:
list[float]
- normalized_mse_loss(hidden_states, target_hidden_states)
- Parameters:
hidden_states (Tensor)
target_hidden_states (Tensor)
- Return type:
list[float]