base
Classes
Base class for converting HuggingFace models to Puzzletron/AnyModel format. |
- class Converter
Bases:
ABCBase class for converting HuggingFace models to Puzzletron/AnyModel format.
- classmethod convert(descriptor, input_dir, output_dir)
Convert a HuggingFace model to AnyModel format.
- Parameters:
descriptor (ModelDescriptor) – Model descriptor for the model type.
input_dir (Path) – Path to the input HuggingFace checkpoint.
output_dir (Path) – Path to the output AnyModel checkpoint.
- classmethod convert_configs_in_dirs(input_dir, output_dir, trust_remote_code=False)
Convert config and add block_configs.
- Parameters:
input_dir (Path)
output_dir (Path)
trust_remote_code (bool)
- classmethod convert_model_weights(input_dir, output_dir, descriptor, num_hidden_layers)
Convert model weights to subblock format.
- Parameters:
input_dir (Path)
output_dir (Path)
descriptor (ModelDescriptor)
num_hidden_layers (int)
- static convert_weight_name(name)
Convert weight names during checkpoint conversion.
This method can be overridden by subclasses to apply model-specific weight name transformations when converting checkpoints from HuggingFace format to Puzzletron format.
Default implementation returns the name unchanged (identity function).
- Parameters:
name (str) – Original weight name from HuggingFace checkpoint
- Returns:
Converted weight name for Puzzletron format
- Return type:
str
Example
For Qwen2.5-VL, this converts: - visual.* → model.visual.* - model.* → model.language_model.*
- static copy_checkpoint_files(input_dir, output_dir)
Copy checkpoint files except model weights (which will be converted).
- Parameters:
input_dir (Path)
output_dir (Path)
- abstract static create_block_configs_from_main_config(config)
Create per-layer BlockConfig list from a HuggingFace model config.
This method extracts layer-specific parameters (e.g., intermediate_size, num_key_value_heads) from the main model config and creates a BlockConfig for each layer. These BlockConfigs enable layer-specific pruning and modifications during the compression pipeline.
- Parameters:
config (PreTrainedConfig) – HuggingFace PretrainedConfig (e.g., LlamaConfig, Qwen2Config)
- Returns:
AttentionConfig: attention settings (no_op, num_key_value_heads)
FFNConfig: FFN settings (no_op, intermediate_size)
- Return type:
List of BlockConfig, one per hidden layer. Each BlockConfig contains
Example
- For a model with uniform layers (e.g., Llama):
return [BlockConfig(…)] * config.num_hidden_layers
- For a model with heterogeneous layers (e.g., NemotronH with Mamba/Attention):
return [BlockConfig(…) for layer_idx in range(num_layers)]