checkpoint_utils_hf
Utilities for loading and saving Hugging Face-format checkpoints (AutoConfig + optional block_configs).
Functions
Build a model from config on meta/uninitialized weights (used e.g. for subblock param counts). |
|
Load model configuration from a checkpoint directory. |
|
- force_cache_dynamic_modules(config, checkpoint_dir, trust_remote_code=False)
- Parameters:
config (PretrainedConfig)
checkpoint_dir (Path | str)
trust_remote_code (bool)
- init_model_from_config(config, *, trust_remote_code=False, **kwargs)
Build a model from config on meta/uninitialized weights (used e.g. for subblock param counts).
trust_remote_codedefaults to False (onlyAutoModelForCausalLM.from_configuses it). Pass True when loading configs that rely on custom modeling code from the checkpoint.- Parameters:
config (PretrainedConfig)
trust_remote_code (bool)
- Return type:
PreTrainedModel
- load_model_config(checkpoint_dir, model_config_overrides=None, ignore_unexpected_config_keys=False, trust_remote_code=False)
Load model configuration from a checkpoint directory.
- Parameters:
checkpoint_dir (Path | str) – Path to the checkpoint directory (e.g. containing config.json).
model_config_overrides (Mapping | None) – Optional mapping of config overrides.
ignore_unexpected_config_keys (bool) – If True, ignore unexpected config keys.
trust_remote_code (bool) – If True, allows execution of custom code from the model repository. This is a security risk if the model source is untrusted. Only set to True if you trust the source of the model. Defaults to False for security.
- Returns:
Loaded model configuration (PretrainedConfig).
- save_checkpoint(model, checkpoint_dir, descriptor)
- Parameters:
model (PreTrainedModel)
checkpoint_dir (Path | str)
descriptor (ModelDescriptor)
- Return type:
None
- save_model_config(model_config, checkpoint_dir)
- Parameters:
model_config (PretrainedConfig)
checkpoint_dir (Path | str)
- Return type:
None
- save_subblocks(state_dict, checkpoint_dir, weight_map=None, multi_threaded=True, max_workers=None)
- Parameters:
state_dict (dict[str, Tensor])
checkpoint_dir (Path | str)
weight_map (dict[str, str] | None)
multi_threaded (bool)
max_workers (int | None)
- Return type:
None