modelopt.torch.puzzletron.tools.bypassed_training.init_child_from_parent

init_child_from_parent(descriptor, pruning_mixin, parent_checkpoint_dir, model_config_overrides_dict, output_checkpoint_dir, gqa_init_mode, mlp_init_mode, mlp_init_config_yaml, linear_init_mode, hidden_size_init_mode=None, channel_importance_path=None, max_workers=None, max_layer_workers=None)

Init child models from parent models in the style of bypass training, but without having to run the entire bypass pipeline.

Uses AnyModel approach with deci_x_patcher for heterogeneous layer configurations.

I/O Optimization Parameters: - max_workers: Number of threads for parallel file I/O (default: auto-calculate min(CPU count, num files)) - max_layer_workers: Number of threads for parallel layer processing (default: auto-calculate min(CPU count, num layers))

Parameters:
  • descriptor (ModelDescriptor)

  • parent_checkpoint_dir (str)

  • model_config_overrides_dict (dict | str)

  • output_checkpoint_dir (str)

  • gqa_init_mode (GQAInitMode)

  • mlp_init_mode (MlpInitMode)

  • mlp_init_config_yaml (str | None)

  • linear_init_mode (LinearInitMode)

  • hidden_size_init_mode (HiddenSizeInitMode | None)

  • channel_importance_path (str | None)

  • max_workers (int | None)

  • max_layer_workers (int | None)

Return type:

None