checkpoint_utils

Utilities for loading and initializing PyTorch model checkpoints (AnyModel / HF layouts).

Functions

copy_tokenizer

Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available) to avoid collision between transformers versions.

init_empty_module

init_module_with_state_dict

is_valid_decilm_checkpoint

True if the checkpoint config loads and defines block_configs (AnyModel / puzzletron layout).

load_state_dict

skip_init

Heavily inspired by torch.nn.utils.skip_init but does not require the module to accept a "device" kwarg.

copy_tokenizer(source_dir_or_tokenizer_name, target_dir, on_failure='raise')

Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available) to avoid collision between transformers versions.

Parameters:
  • source_dir_or_tokenizer_name (Path | str)

  • target_dir (Path | str)

  • on_failure (Literal['raise', 'warn'])

Return type:

None

init_empty_module(module_cls, dtype, *init_args, **init_kwargs)
Parameters:
  • module_cls (type[NNModule])

  • dtype (dtype)

Return type:

NNModule

init_module_with_state_dict(state_dict, module_cls, *init_args, **init_kwargs)
Parameters:
  • state_dict (dict[str, Tensor])

  • module_cls (type[NNModule])

Return type:

NNModule

is_valid_decilm_checkpoint(checkpoint_dir, trust_remote_code=False)

True if the checkpoint config loads and defines block_configs (AnyModel / puzzletron layout).

Parameters:
  • checkpoint_dir (Path | str) – Path to checkpoint directory

  • trust_remote_code (bool) – If True, allows execution of custom code from the model repository. This is a security risk if the model source is untrusted. Only set to True if you trust the source of the model. Defaults to False for security.

Returns:

True if the config has block_configs, False otherwise

Return type:

bool

load_state_dict(checkpoint_dir)
Parameters:

checkpoint_dir (Path | str)

Return type:

dict[str, Tensor]

skip_init(module_cls, *args, **kwargs)

Heavily inspired by torch.nn.utils.skip_init but does not require the module to accept a “device” kwarg.

Return type:

Module