convert_hf_config

Convert modelopt quantization export config to align with llm-compressor config format.

Functions

convert_hf_quant_config_format

Converts modelopt quantization config dictionary to align with llm-compressor config format.

convert_hf_quant_config_format(input_config)

Converts modelopt quantization config dictionary to align with llm-compressor config format.

Parameters:: input_config (dict[str, Any]) – The original quantization config dictionary.
Return type:: dict[str, Any]

Note

The “targets” field specifies which PyTorch module types to quantize. Compressed-tensors works with any PyTorch module type and uses dynamic matching against module.__class__.__name__. Typically this includes “Linear” modules, but can also include “Embedding” and other types.

See: https://github.com/neuralmagic/compressed-tensors/blob/fa6a48f1da6b47106912bcd25eba7171ba7cfec7/src/sparsetensors/quantization/quant_scheme.py#L29 Example usage: https://github.com/neuralmagic/compressed-tensors/blob/9938a6ec6e10498d39a3071dfd1c40e3939ee80b/tests/test_quantization/lifecycle/test_apply.py#L118

Example

{
    "producer": {"name": "modelopt", "version": "0.19.0"},
    "quantization": {
        "quant_algo": "FP8",
        "kv_cache_quant_algo": "FP8",
        "exclude_modules": ["lm_head"],
    },
}

Returns:

A new dictionary in the target format.

Example (for FP8 input):

{
    "config_groups": {
        "group_0": {
            "input_activations": {"dynamic": False, "num_bits": 8, "type": "float"},
            "weights": {"dynamic": False, "num_bits": 8, "type": "float"},
        }
    },
    "ignore": ["lm_head"],
    "quant_algo": "FP8",
    "kv_cache_scheme": "FP8",
    "producer": {"name": "modelopt", "version": "0.29.0"},
}

Parameters:

input_config (dict[str, Any])

Return type:

dict[str, Any]