bypass_distillation
Bypass distillation (blockwise local distillation) for the PUZZLE framework.
This module implements Stage 1 of the PUZZLE pipeline: training alternative transformer block configurations using per-block knowledge distillation from a teacher model.
Functions
Top-level entry point for bypass distillation stage. |
- launch_bypass_distillation(hydra_cfg)
Top-level entry point for bypass distillation stage.
Runs sewing-kit pipeline-parallel per-block knowledge distillation.
Supports multiple bypass configurations via
bypass.configslist. Each entry overridesbypass.model.model_config_overridesand optionallybypass.model_factory.keys_to_learn, then runs a full bypass training.If
bypass.configsis absent or empty, runs a single bypass training with the settings already inbypass.- Parameters:
hydra_cfg (DictConfig) – The full Hydra configuration with a ‘bypass’ section.
- Return type:
None