config
Configurations for distillation modes.
- ModeloptConfig KDLossConfig
Bases:
ModeloptBaseConfig
Configuration for the Knowledge-Distillation mode.
This mode is used to distill knowledge from a teacher model to a student model.
Show default config as JSON
- Default config (JSON):
{ "teacher_model": null, "criterion": null, "loss_balancer": null, "expose_minimal_state_dict": true }
- field criterion: _Loss | Dict[Tuple[str, str], _Loss] | None
Show details
If an instance of Loss class, a distillation loss will only be computed between outputs of a student and teacher; if a dictionary in the format {(student_layer_name, teacher_layer_name): loss_module}, a distillation loss will be computed for each specified student-teacher pair of layers using the corresponding
loss_module
.
- field expose_minimal_state_dict: bool
Show details
Hide teacher model’s state_dict in the returned wrapped model. This reduces the checkpoint size by not re-storing the teacher unnecessarily again. .. note: Set to False if using FSDP
- field loss_balancer: Any | None
Show details
A balancer to reduce distillation and non-distillation losses into a single value using some weighing scheme.
- field teacher_model: Type[Module] | Tuple | Callable | None
Show details
The class or callable or tuple to initialize the teacher model using
init_model_from_model_like
. This cannot already be an instance of nn.Module.
- model_dump(*args, **kwargs)
Dump the config to a dictionary but avoid serializing teacher model to dict.
This avoids issues when the teacher is a tuple with callable and args. If any of the args are Dataclasses, they are dumped as a dict and cannot be restored with their class.
- Return type:
Dict[str, Any]