modelopt.torch.puzzletron.tools.bypassed_training.init_child_from_parent
- init_child_from_parent(descriptor, pruning_mixin, parent_checkpoint_dir, model_config_overrides_dict, output_checkpoint_dir, gqa_init_mode, mlp_init_mode, mlp_init_config_yaml, linear_init_mode, hidden_size_init_mode=None, channel_importance_path=None, max_workers=None, max_layer_workers=None)
Init child models from parent models in the style of bypass training, but without having to run the entire bypass pipeline.
Uses AnyModel approach with deci_x_patcher for heterogeneous layer configurations.
I/O Optimization Parameters: - max_workers: Number of threads for parallel file I/O (default: auto-calculate min(CPU count, num files)) - max_layer_workers: Number of threads for parallel layer processing (default: auto-calculate min(CPU count, num layers))
- Parameters:
descriptor (ModelDescriptor)
parent_checkpoint_dir (str)
model_config_overrides_dict (dict | str)
output_checkpoint_dir (str)
gqa_init_mode (GQAInitMode)
mlp_init_mode (MlpInitMode)
mlp_init_config_yaml (str | None)
linear_init_mode (LinearInitMode)
hidden_size_init_mode (HiddenSizeInitMode | None)
channel_importance_path (str | None)
max_workers (int | None)
max_layer_workers (int | None)
- Return type:
None