model_config_utils

Common utils for the ModelConfig.

Functions

from_quantized_weight

Converts the quantized weight to the target torch_dtype format.

merge_fc1_gate

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

merge_qkv

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

model_config_from_dict

Load a dict to a ModelConfig instance.

model_config_to_dict

Converts the instance to a python dict.

naive_quantization

Generates a constant scaling factor (1) with target quantization.

pack_linear_weights

Packs the quantized linear weights in the model_config to the quantized format.

pad_weights

Returns the padded weights to tp_size.

process_layer_quant_config

Processes per layer quantization information for TRTLLM export to quant_cfg.json.

restore_model_config

Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.

split_config_and_weights

Util function to split the weights or any torch.Tensor in nested config to weights.

to_quantized_weight

Converts the weight to the quantized (packed) format.

from_quantized_weight(weight, weights_scaling_factor, quantization, torch_dtype)

Converts the quantized weight to the target torch_dtype format.

Parameters:
  • weight (Tensor) –

  • weights_scaling_factor (Tensor) –

  • quantization (str) –

merge_fc1_gate(model_config)

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

merge_qkv(model_config)

Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

model_config_from_dict(d)

Load a dict to a ModelConfig instance.

Parameters:

d (dict) –

Return type:

ModelConfig

model_config_to_dict(model_config)

Converts the instance to a python dict.

Parameters:

model_config (ModelConfig) –

Return type:

dict

naive_quantization(config)

Generates a constant scaling factor (1) with target quantization.

This is for debugging and performance measurement only.

Parameters:

config (ModelConfig) –

pack_linear_weights(model_config)

Packs the quantized linear weights in the model_config to the quantized format.

Parameters:

model_config (ModelConfig) –

pad_weights(weights, tp_size)

Returns the padded weights to tp_size.

process_layer_quant_config(layer_config_dict)

Processes per layer quantization information for TRTLLM export to quant_cfg.json.

restore_model_config(model_config, weights)

Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.

Parameters:

weights (Dict[str, ndarray | Tensor]) –

split_config_and_weights(config, weights, prefix='transformer', layer_config_dict={})

Util function to split the weights or any torch.Tensor in nested config to weights.

A weight id starts with transformers or lm_head will also be generated to link the original key to the weights dict. The weights in the weights dict are contiguous.

layer_config_dict: A dictionary containing layerwise quantization format information and awq_block_size information when relevant. It is used to export quantization.json for auto_quant checkpoint.

Parameters:
  • weights (Dict[str, tensor]) –

  • prefix (str) –

  • layer_config_dict (dict) –

to_quantized_weight(weight, weights_scaling_factor, quantization, weights_scaling_factor2=None, block_size=None)

Converts the weight to the quantized (packed) format.

Parameters:
  • weight (Tensor) –

  • weights_scaling_factor (Tensor) –

  • quantization (str) –

  • weights_scaling_factor2 (Tensor | None) –

  • block_size (int | None) –