modelopt.torch.puzzletron.tools.bypassed_training.init_child_from_parent

init_child_from_parent(descriptor, pruning_mixin, parent_checkpoint_dir, model_config_overrides_dict, output_checkpoint_dir, gqa_init_mode, mlp_init_mode, mlp_init_config_yaml, linear_init_mode, hidden_size_init_mode=None, channel_importance_path=None, max_workers=None, max_layer_workers=None)

Init child models from parent models in the style of bypass training, but without having to run the entire bypass pipeline.

Uses AnyModel approach with deci_x_patcher for heterogeneous layer configurations.

I/O Optimization Parameters: - max_workers: Number of threads for parallel file I/O (default: auto-calculate min(CPU count, num files)) - max_layer_workers: Number of threads for parallel layer processing (default: auto-calculate min(CPU count, num layers))

Parameters:

descriptor (ModelDescriptor)
parent_checkpoint_dir (str)
model_config_overrides_dict (dict | str)
output_checkpoint_dir (str)
gqa_init_mode (GQAInitMode)
mlp_init_mode (MlpInitMode)
mlp_init_config_yaml (str | None)
linear_init_mode (LinearInitMode)
hidden_size_init_mode (HiddenSizeInitMode | None)
channel_importance_path (str | None)
max_workers (int | None)
max_layer_workers (int | None)

Return type:

None