model_config_utils
Common utils for the ModelConfig.
Functions
Converts the quantized weight to the target torch_dtype format. |
|
Merges the qkv fields in model_config from QKVConfig to a single LinearConfig. |
|
Merges the qkv fields in model_config from QKVConfig to a single LinearConfig. |
|
Load a dict to a ModelConfig instance. |
|
Converts the instance to a python dict. |
|
Generates a constant scaling factor (1) with target quantization. |
|
Packs the quantized linear weights in the model_config to the quantized format. |
|
Returns the padded weights to tp_size. |
|
Processes per layer quantization information for TRTLLM export to quant_cfg.json. |
|
Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights. |
|
Util function to split the weights or any torch.Tensor in nested config to weights. |
|
Converts the weight to the quantized (packed) format. |
- from_quantized_weight(weight, weights_scaling_factor, quantization, torch_dtype)
Converts the quantized weight to the target torch_dtype format.
- Parameters:
weight (Tensor) –
weights_scaling_factor (Tensor) –
quantization (str) –
- merge_fc1_gate(model_config)
Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.
- merge_qkv(model_config)
Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.
- model_config_from_dict(d)
Load a dict to a ModelConfig instance.
- Parameters:
d (dict) –
- Return type:
- model_config_to_dict(model_config)
Converts the instance to a python dict.
- Parameters:
model_config (ModelConfig) –
- Return type:
dict
- naive_quantization(config)
Generates a constant scaling factor (1) with target quantization.
This is for debugging and performance measurement only.
- Parameters:
config (ModelConfig) –
- pack_linear_weights(model_config)
Packs the quantized linear weights in the model_config to the quantized format.
- Parameters:
model_config (ModelConfig) –
- pad_weights(weights, tp_size)
Returns the padded weights to tp_size.
- process_layer_quant_config(layer_config_dict)
Processes per layer quantization information for TRTLLM export to quant_cfg.json.
- restore_model_config(model_config, weights)
Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.
- Parameters:
weights (Dict[str, ndarray | Tensor]) –
- split_config_and_weights(config, weights, prefix='transformer', layer_config_dict={})
Util function to split the weights or any torch.Tensor in nested config to weights.
A weight id starts with transformers or lm_head will also be generated to link the original key to the weights dict. The weights in the weights dict are contiguous.
layer_config_dict: A dictionary containing layerwise quantization format information and awq_block_size information when relevant. It is used to export quantization.json for auto_quant checkpoint.
- Parameters:
weights (Dict[str, tensor]) –
prefix (str) –
layer_config_dict (dict) –
- to_quantized_weight(weight, weights_scaling_factor, quantization, weights_scaling_factor2=None, block_size=None)
Converts the weight to the quantized (packed) format.
- Parameters:
weight (Tensor) –
weights_scaling_factor (Tensor) –
quantization (str) –
weights_scaling_factor2 (Tensor | None) –
block_size (int | None) –