validation

Model validation and loss calculation utilities for single-GPU and multi-GPU setups.

Also provides helper functions for loss metrics, KL divergence, JS divergence, and similarity losses for knowledge distillation.

Classes

LowMemorySparseTensor

Functions

calculate_losses

Do model forward on each batch and calculate LM loss.

calculate_batch_outputs

cosine_embedding_loss

normalized_mse_loss

mse_loss

kl_div

Kullback-Leibler Divergence for a single sample.

class LowMemorySparseTensor

Bases: object

__init__(x)
Parameters:

x (Tensor)

to(*args)
Return type:

Self

to_dense()
Return type:

Tensor

calculate_batch_outputs(hidden_states, target_hidden_states, logits, target_logits, targets, return_hidden_states, calculate_full_score_ablations, calc_on_cpu)
Parameters:
  • hidden_states (Tensor | None)

  • target_hidden_states (Tensor | None)

  • logits (Tensor)

  • target_logits (Tensor | None)

  • targets (Tensor)

  • return_hidden_states (bool)

  • calculate_full_score_ablations (bool)

  • calc_on_cpu (bool)

Return type:

dict

calculate_losses(model, dataloader, target_probs=None, return_probs=False, checkpoint_manager=None)

Do model forward on each batch and calculate LM loss.

Works on lit-llama models (single GPU) and HuggingFace models (can be multi-GPU). Does not support data-parallel.

Note

Anything related to probs and hidden states is not supported currently.

Returns:

Tuple of (outputs, None). outputs is a dict:

{
    "lm_loss": [float, ...],
    "token_accuracy_top_1": [float, ...],
    "token_accuracy_top_5": [float, ...],
    "token_accuracy_top_10": [float, ...],
}

Parameters:
  • model (Module)

  • dataloader (DataLoader)

  • target_probs (None)

  • return_probs (bool)

Return type:

tuple[dict[str, dict], None] | tuple[None, None]

cosine_embedding_loss(hidden_states, target_hidden_states)
Parameters:
  • hidden_states (Tensor)

  • target_hidden_states (Tensor)

Return type:

list[float]

kl_div(logits, target_probs, clip_epsilon=ClipEpsilon.NO_CLIP, epsilon_factor=1.0)

Kullback-Leibler Divergence for a single sample. logits: [tokens, vocab] target_probs: [tokens, vocab]

Parameters:
  • logits (Tensor)

  • target_probs (Tensor)

  • clip_epsilon (ClipEpsilon)

  • epsilon_factor (float)

Return type:

float

mse_loss(hidden_states, target_hidden_states)
Parameters:
  • hidden_states (Tensor)

  • target_hidden_states (Tensor)

Return type:

list[float]

normalized_mse_loss(hidden_states, target_hidden_states)
Parameters:
  • hidden_states (Tensor)

  • target_hidden_states (Tensor)

Return type:

list[float]