validation

Model validation and loss calculation utilities for single-GPU and multi-GPU setups.

Also provides helper functions for loss metrics, KL divergence, JS divergence, and similarity losses for knowledge distillation.

Classes

LowMemorySparseTensor

Functions

`calculate_losses`	Do model forward on each batch and calculate LM loss.
`calculate_batch_outputs`
`cosine_embedding_loss`
`normalized_mse_loss`
`mse_loss`
`kl_div`	Kullback-Leibler Divergence for a single sample.

class LowMemorySparseTensor

Bases: object

__init__(x)

Parameters:: x (Tensor)

to(*args)

Return type:: Self

to_dense()

Return type:: Tensor

calculate_batch_outputs(hidden_states, target_hidden_states, logits, target_logits, targets, return_hidden_states, calculate_full_score_ablations, calc_on_cpu)

Parameters:

hidden_states (Tensor | None)
target_hidden_states (Tensor | None)
logits (Tensor)
target_logits (Tensor | None)
targets (Tensor)
return_hidden_states (bool)
calculate_full_score_ablations (bool)
calc_on_cpu (bool)

Return type:

dict

calculate_losses(model, dataloader, target_probs=None, return_probs=False, checkpoint_manager=None)

Do model forward on each batch and calculate LM loss.

Works on lit-llama models (single GPU) and HuggingFace models (can be multi-GPU). Does not support data-parallel.

Note

Anything related to probs and hidden states is not supported currently.

Returns:

Tuple of (outputs, None). outputs is a dict:

{
    "lm_loss": [float, ...],
    "token_accuracy_top_1": [float, ...],
    "token_accuracy_top_5": [float, ...],
    "token_accuracy_top_10": [float, ...],
}

Parameters:

model (Module)
dataloader (DataLoader)
target_probs (None)
return_probs (bool)

Return type:

tuple[dict[str, dict], None] | tuple[None, None]

cosine_embedding_loss(hidden_states, target_hidden_states)

Parameters:

hidden_states (Tensor)
target_hidden_states (Tensor)

Return type:

list[float]

kl_div(logits, target_probs, clip_epsilon=ClipEpsilon.NO_CLIP, epsilon_factor=1.0)

Kullback-Leibler Divergence for a single sample. logits: [tokens, vocab] target_probs: [tokens, vocab]

Parameters:

logits (Tensor)
target_probs (Tensor)
clip_epsilon (ClipEpsilon)
epsilon_factor (float)

Return type:

float

mse_loss(hidden_states, target_hidden_states)

Parameters:

hidden_states (Tensor)
target_hidden_states (Tensor)

Return type:

list[float]

normalized_mse_loss(hidden_states, target_hidden_states)

Parameters:

hidden_states (Tensor)
target_hidden_states (Tensor)

Return type:

list[float]