validate_runtime_pipeline
Model evaluation utilities for models split across multiple GPUs in pipeline-parallel mode.
Coordinates forward passes and loss computation through model shards distributed across GPUs using sewing_kit’s StitchedModule framework. Relies on validation.py for core loss computation.
Used by validate_model.py during activation scoring for sharded models.
Classes
Special class to allow FSDP wrapping without affecting other Linear layers in the model. |
|
Functions
Do model forward on each batch and calculate LM loss. |
|
Create pipeline stitches for distributed model evaluation. |
- class HiddenStatesAndLMHead
Bases:
list- __init__(hidden_states, lm_head_weights)
- Parameters:
hidden_states (list[Tensor])
lm_head_weights (Tensor)
- class LMHead
Bases:
LinearSpecial class to allow FSDP wrapping without affecting other Linear layers in the model.
Small nn helpers for puzzletron pipeline code. Model configs come from HuggingFace
AutoConfig(AnyModel).LMHeadis a distinctnn.Linearsubclass so pipeline / FSDP code can target it explicitly
- calculate_losses_pipeline(stitched_model, dataloader, target_hidden_states_per_batch=None, return_hidden_states=False, calculate_full_score_ablations=False, calc_on_cpu=False, just_model_forward=False, checkpoint_manager=None, autocast_dtype=torch.bfloat16, descriptor=None, use_autocast=True)
Do model forward on each batch and calculate LM loss.
Optionally also calculate kl_div loss and other metrics from given target_hidden_states_per_batch. Optionally return hidden states per batch. Does not support data-parallel. just_model_forward: skip loss calculation, just forward the model (useful for activation hooks).
- Returns:
Tuple of
(losses, target_hidden_states_per_batch).lossesis a dict, e.g.:{ "lm_loss": {"avg": float, "per_sample": [float, ...]}, ... # more metrics if target_hidden_states_per_batch is provided }
target_hidden_states_per_batchis returned when return_hidden_states is True.- Parameters:
stitched_model (StitchedModule)
dataloader (DataLoader | None)
target_hidden_states_per_batch (HiddenStatesAndLMHead | None)
return_hidden_states (bool)
calculate_full_score_ablations (bool)
calc_on_cpu (bool)
just_model_forward (bool)
autocast_dtype (torch.dtype)
descriptor (Type[ModelDescriptor])
use_autocast (bool)
- Return type:
tuple[dict[str, dict], HiddenStatesAndLMHead | None] | tuple[None, None]
- perform_pipeline_stitches(model, descriptor)
Create pipeline stitches for distributed model evaluation.
- Parameters:
model – The model to stitch (any HuggingFace model with AnyModel descriptor).
descriptor (Type[ModelDescriptor]) – ModelDescriptor for layer naming.
- Return type: