validate_runtime_pipeline

Model evaluation utilities for models split across multiple GPUs in pipeline-parallel mode.

Coordinates forward passes and loss computation through model shards distributed across GPUs using sewing_kit’s StitchedModule framework. Relies on validation.py for core loss computation.

Used by validate_model.py during activation scoring for sharded models.

Classes

LMHead

Special class to allow FSDP wrapping without affecting other Linear layers in the model.

HiddenStatesAndLMHead

Functions

calculate_losses_pipeline

Do model forward on each batch and calculate LM loss.

perform_pipeline_stitches

Create pipeline stitches for distributed model evaluation.

class HiddenStatesAndLMHead

Bases: list

__init__(hidden_states, lm_head_weights)
Parameters:
  • hidden_states (list[Tensor])

  • lm_head_weights (Tensor)

class LMHead

Bases: Linear

Special class to allow FSDP wrapping without affecting other Linear layers in the model.

Small nn helpers for puzzletron pipeline code. Model configs come from HuggingFace AutoConfig (AnyModel). LMHead is a distinct nn.Linear subclass so pipeline / FSDP code can target it explicitly

calculate_losses_pipeline(stitched_model, dataloader, target_hidden_states_per_batch=None, return_hidden_states=False, calculate_full_score_ablations=False, calc_on_cpu=False, just_model_forward=False, checkpoint_manager=None, autocast_dtype=torch.bfloat16, descriptor=None, use_autocast=True)

Do model forward on each batch and calculate LM loss.

Optionally also calculate kl_div loss and other metrics from given target_hidden_states_per_batch. Optionally return hidden states per batch. Does not support data-parallel. just_model_forward: skip loss calculation, just forward the model (useful for activation hooks).

Returns:

Tuple of (losses, target_hidden_states_per_batch).

losses is a dict, e.g.:

{
    "lm_loss": {"avg": float, "per_sample": [float, ...]},
    ...  # more metrics if target_hidden_states_per_batch is provided
}

target_hidden_states_per_batch is returned when return_hidden_states is True.

Parameters:
  • stitched_model (StitchedModule)

  • dataloader (DataLoader | None)

  • target_hidden_states_per_batch (HiddenStatesAndLMHead | None)

  • return_hidden_states (bool)

  • calculate_full_score_ablations (bool)

  • calc_on_cpu (bool)

  • just_model_forward (bool)

  • autocast_dtype (torch.dtype)

  • descriptor (Type[ModelDescriptor])

  • use_autocast (bool)

Return type:

tuple[dict[str, dict], HiddenStatesAndLMHead | None] | tuple[None, None]

perform_pipeline_stitches(model, descriptor)

Create pipeline stitches for distributed model evaluation.

Parameters:
  • model – The model to stitch (any HuggingFace model with AnyModel descriptor).

  • descriptor (Type[ModelDescriptor]) – ModelDescriptor for layer naming.

Return type:

StitchedModule