config

Configurations for distillation modes.

ModeloptConfig KDLossConfig

Bases: ModeloptBaseConfig

Configuration for the Knowledge-Distillation mode.

This mode is used to distill knowledge from a teacher model to a student model.

Show default config as JSON

Default config (JSON):

{
   "teacher_model": null,
   "criterion": null,
   "loss_balancer": null,
   "expose_minimal_state_dict": true
}

field criterion: _Loss | dict[tuple[str, str], _Loss] | None

Show details

If an instance of Loss class, a distillation loss will only be computed between outputs of a student and teacher; if a dictionary in the format {(student_layer_name, teacher_layer_name): loss_module}, a distillation loss will be computed for each specified student-teacher pair of layers using the corresponding loss_module.

field expose_minimal_state_dict: bool

Show details

Hide teacher model’s state_dict in the returned wrapped model. This reduces the checkpoint size by not re-storing the teacher unnecessarily again. .. note: Set to False if using FSDP

field loss_balancer: Any | None

Show details

A balancer to reduce distillation and non-distillation losses into a single value using some weighing scheme.

field teacher_model: type[Module] | tuple | Callable | None

Show details

The class or callable or tuple to initialize the teacher model using init_model_from_model_like. This cannot already be an instance of nn.Module.

model_dump(*args, **kwargs)

Dump the config to a dictionary but avoid serializing teacher model to dict.

This avoids issues when the teacher is a tuple with callable and args. If any of the args are Dataclasses, they are dumped as a dict and cannot be restored with their class.

Return type:: dict[str, Any]