checkpoint_utils

Utilities for loading and initializing PyTorch model checkpoints (AnyModel / HF layouts).

Functions

`copy_tokenizer`	Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available) to avoid collision between transformers versions.
`init_empty_module`
`init_module_with_state_dict`
`is_valid_decilm_checkpoint`	True if the checkpoint config loads and defines `block_configs` (AnyModel / puzzletron layout).
`load_state_dict`
`skip_init`	Heavily inspired by torch.nn.utils.skip_init but does not require the module to accept a "device" kwarg.

copy_tokenizer(source_dir_or_tokenizer_name, target_dir, on_failure='raise')

Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available) to avoid collision between transformers versions.

Parameters:

Return type:

None

init_empty_module(module_cls, dtype, *init_args, **init_kwargs)

Parameters:

Return type:

NNModule

init_module_with_state_dict(state_dict, module_cls, *init_args, **init_kwargs)

Parameters:

Return type:

NNModule

is_valid_decilm_checkpoint(checkpoint_dir, trust_remote_code=False)

True if the checkpoint config loads and defines block_configs (AnyModel / puzzletron layout).

Parameters:

checkpoint_dir (Path | str) – Path to checkpoint directory
trust_remote_code (bool) – If True, allows execution of custom code from the model repository. This is a security risk if the model source is untrusted. Only set to True if you trust the source of the model. Defaults to False for security.

Returns:

True if the config has block_configs, False otherwise

Return type:

bool

load_state_dict(checkpoint_dir)

skip_init(module_cls, *args, **kwargs)

Heavily inspired by torch.nn.utils.skip_init but does not require the module to accept a “device” kwarg.