model_config_utils

Common utils for the ModelConfig.

Functions

`merge_gate_fc`	Postprocess the MLP config for TensorRT-LLM export.
`merge_qkv`	Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.
`model_config_from_dict`	Load a dict to a ModelConfig instance.
`model_config_to_dict`	Converts the instance to a python dict.
`pack_linear_weights`	Packs the quantized linear weights in the model_config to the quantized format.
`pad_weights`	Returns the padded weights to tp_size.
`restore_model_config`	Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.
`split_config_and_weights`	Util function to split the weights or any torch.Tensor in nested config to weights.

merge_gate_fc(model_config): Postprocess the MLP config for TensorRT-LLM export.

merge_qkv(model_config): Merges the qkv fields in model_config from QKVConfig to a single LinearConfig.

model_config_from_dict(d)

Load a dict to a ModelConfig instance.

Parameters:: d (dict)
Return type:: ModelConfig

model_config_to_dict(model_config)

Converts the instance to a python dict.

Parameters:: model_config (ModelConfig)
Return type:: dict

pack_linear_weights(model_config)

Packs the quantized linear weights in the model_config to the quantized format.

Parameters:: model_config (ModelConfig)

pad_weights(weights, tp_size): Returns the padded weights to tp_size.

restore_model_config(model_config, weights)

Recursively restores the model_config from json and loads np.ndarray or torch.Tensor weights from weights.

Parameters:: weights (dict[str, ndarray | Tensor])

split_config_and_weights(config, weights, prefix='transformer', layer_config_dict={})

Util function to split the weights or any torch.Tensor in nested config to weights.

A weight id starts with transformers or lm_head will also be generated to link the original key to the weights dict. The weights in the weights dict are contiguous.

layer_config_dict: A dictionary containing layerwise quantization format information and awq_block_size information when relevant. It is used to export quantization.json for auto_quant checkpoint.

Parameters:

weights (dict[str, tensor])
prefix (str)
layer_config_dict (dict)