bypass_distillation

Bypass distillation (blockwise local distillation) for the PUZZLE framework.

This module implements Stage 1 of the PUZZLE pipeline: training alternative transformer block configurations using per-block knowledge distillation from a teacher model.

Functions

launch_bypass_distillation

Top-level entry point for bypass distillation stage.

launch_bypass_distillation(hydra_cfg)

Top-level entry point for bypass distillation stage.

Runs sewing-kit pipeline-parallel per-block knowledge distillation.

Supports multiple bypass configurations via bypass.configs list. Each entry overrides bypass.model.model_config_overrides and optionally bypass.model_factory.keys_to_learn, then runs a full bypass training.

If bypass.configs is absent or empty, runs a single bypass training with the settings already in bypass.

Parameters:

hydra_cfg (DictConfig) – The full Hydra configuration with a ‘bypass’ section.

Return type:

None