config
This document lists the quantization formats supported by Model Optimizer and example quantization configs.
Quantization Formats
The following table lists the quantization formats supported by Model Optimizer and the corresponding quantization config. See Quantization Configs for the specific quantization config definitions.
Please see choosing the right quantization formats to learn more about the formats and their use-cases.
Note
The recommended configs given below are for LLM models. For CNN models, only INT8 quantization
is supported. Please use quantization config INT8_DEFAULT_CFG for CNN models.
Quantization Format |
Model Optimizer config |
|---|---|
INT8 |
|
FP8 |
|
INT4 Weights only AWQ (W4A16) |
|
INT4-FP8 AWQ (W4A8) |
|
Quantization Configs
Quantization config is a dictionary with two top-level keys:
"quant_cfg": an ordered list ofQuantizerCfgEntrydicts that specify which quantizers to configure and how."algorithm": the calibration algorithm passed tocalibrate.
Please see QuantizeConfig for the full config schema.
quant_cfg — Entry Format
Each entry in the quant_cfg list is a QuantizerCfgEntry with the following fields:
quantizer_name(required): a wildcard string matched against quantizer module names. Quantizer modules are instances ofTensorQuantizerand have names ending withweight_quantizer,input_quantizer, etc.parent_class(optional): restricts matching to quantizers whose immediate parent module is of this PyTorch class (e.g."nn.Linear"). If omitted, all matching quantizers are targeted regardless of their parent class.cfg(optional): a dict of quantizer attributes as defined byQuantizerAttributeConfig, or a list of such dicts. When a list is given, the matchedTensorQuantizeris replaced with aSequentialQuantizerthat applies each format in sequence. This is used for example in W4A8 quantization where weights are quantized first in INT4 and then in FP8.enable(optional): toggles matched quantizers on (True) or off (False), independently ofcfg. Whencfgis present andenableis absent, the quantizer is implicitly enabled. Whenenableis the only field (nocfg), it only flips the on/off state — all other attributes remain unchanged.
quant_cfg — Ordering and Precedence
Entries are applied in list order; later entries override earlier ones for any quantizer they match. The recommended pattern is:
Start with a deny-all entry
{"quantizer_name": "*", "enable": False}(provided as_base_disable_all) to disable every quantizer by default.Follow with format-specific entries that selectively enable and configure the desired quantizers.
Append
_default_disabled_quantizer_cfgto enforce standard exclusions (e.g. BatchNorm layers, LM head, MoE routers).
To get the string representation of a module class for use in parent_class, do:
from modelopt.torch.quantization import QuantModuleRegistry
# Get the class name for nn.Conv2d
class_name = QuantModuleRegistry.get_key(nn.Conv2d)
Here is an example of a quantization config:
MY_QUANT_CFG = {
"quant_cfg": [
# Deny all quantizers by default
{"quantizer_name": "*", "enable": False},
# Enable and configure weight and input quantizers
{"quantizer_name": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_name": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
# Disable input quantizers specifically for LeakyReLU layers
{"quantizer_name": "*input_quantizer", "parent_class": "nn.LeakyReLU", "enable": False},
]
}
Example Quantization Configurations
These example configs can be accessed as attributes of modelopt.torch.quantization and can be given as
input to mtq.quantize(). For example:
import modelopt.torch.quantization as mtq
model = mtq.quantize(model, mtq.INT8_DEFAULT_CFG, forward_loop)
You can also create your own config by following these examples.
For instance, if you want to quantize a model with int4 AWQ algorithm, but need to skip quantizing
the layer named lm_head, you can create a custom config and quantize your model as following:
# Create custom config
CUSTOM_INT4_AWQ_CFG = copy.deepcopy(mtq.INT4_AWQ_CFG)
CUSTOM_INT4_AWQ_CFG["quant_cfg"].append({"quantizer_name": "*lm_head*", "enable": False})
# quantize model
model = mtq.quantize(model, CUSTOM_INT4_AWQ_CFG, forward_loop)
Classes
The config for |
|
The config for |
|
The config for |
|
Default configuration for |
|
The config for GPTQ quantization. |
|
Nested config for layer-by-layer calibration behavior. |
|
Configuration for local Hessian-weighted MSE calibration. |
|
The config for max calibration algorithm. |
|
Configuration for per-tensor MSE calibration. |
|
Calibration algorithm config base. |
|
Default configuration for |
|
Quantizer attribute type. |
|
A single entry in a |
|
Configuration for rotating quantizer input via Hadamard transform (RHT/QuaRot/SpinQuant). |
|
The config for SVDQuant. |
|
The config for |
Functions
Find the last entry in a |
|
Check if calibration is needed for the given config. |
|
Normalize a raw quant_cfg into a list of |
- class AWQClipCalibConfig
Bases:
QuantizeAlgorithmConfigThe config for
awq_clip(AWQ clip) algorithm.AWQ clip searches clipped amax for per-group quantization, This search requires much more compute compared to AWQ lite. To avoid any OOM, the linear layer weights are batched along the
out_featuresdimension of batch sizemax_co_batch_size. AWQ clip calibration also takes longer than AWQ lite.- debug: bool | None
- max_co_batch_size: int | None
- max_tokens_per_batch: int | None
- method: Literal['awq_clip']
- min_clip_ratio: float | None
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- shrink_step: float | None
- class AWQFullCalibConfig
Bases:
AWQLiteCalibConfig,AWQClipCalibConfigThe config for
awqorawq_fullalgorithm (AWQ full).AWQ full performs
awq_litefollowed byawq_clip.- debug: bool | None
- method: Literal['awq_full']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class AWQLiteCalibConfig
Bases:
QuantizeAlgorithmConfigThe config for
awq_lite(AWQ lite) algorithm.AWQ lite applies a channel-wise scaling factor which minimizes the output difference after quantization. See AWQ paper for more details.
- alpha_step: float | None
- debug: bool | None
- method: Literal['awq_lite']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class CompressConfig
Bases:
ModeloptBaseConfigDefault configuration for
compressmode.- compress: dict[str, bool]
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- quant_gemm: bool
- class GPTQCalibConfig
Bases:
QuantizeAlgorithmConfigThe config for GPTQ quantization.
GPTQ minimizes the layer-wise quantization error by using second-order (Hessian) information to perform blockwise weight updates that compensate for rounding loss. Layers are quantized sequentially so that each layer’s Hessian is computed from activations that already reflect the quantization of preceding layers.
The default values are taken from the official GPTQ implementation: https://github.com/IST-DASLab/FP-Quant/blob/d2e3092f968262c4de5fb050e1aef568a280dadd/src/quantization/gptq.py#L35
- block_size: int | None
- fused: bool
- method: Literal['gptq']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- perc_damp: float | None
- class LayerwiseConfig
Bases:
ModeloptBaseConfigNested config for layer-by-layer calibration behavior.
- checkpoint_dir: str | None
- enable: bool
- get_qdq_activations_from_prev_layer: bool
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- save_every: int
- class LocalHessianCalibConfig
Bases:
_SharedStatesConfig,QuantizeAlgorithmConfigConfiguration for local Hessian-weighted MSE calibration.
This algorithm uses activation information to optimize per-block scales for weight quantization. It minimizes the output reconstruction error by weighting the loss with the local Hessian matrix computed from input activations.
The local Hessian loss for each block is:
(dw @ H @ dw.T)where: -dw = weight - quantized_weight(weight reconstruction error per block) -H = X @ X.Tis the local Hessian computed from input activations X- block_size: int | None
- debug: bool | None
- distributed_sync: bool | None
- fp8_scale_sweep: bool | None
- method: Literal['local_hessian']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- start_multiplier: float | None
- step_size: float | None
- stop_multiplier: float | None
- class MaxCalibConfig
Bases:
_SharedStatesConfig,QuantizeAlgorithmConfigThe config for max calibration algorithm.
Max calibration estimates max values of activations or weights and use this max values to set the quantization scaling factor. See Integer Quantization for the concepts.
- distributed_sync: bool | None
- method: Literal['max']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sync_expert_weight_amax: bool
- class MseCalibConfig
Bases:
_SharedStatesConfig,QuantizeAlgorithmConfigConfiguration for per-tensor MSE calibration.
Finds a scale s (via amax a, with s = a / q_max) that minimizes the reconstruction error of a tensor after uniform Q→DQ:
s* = argmin_s E[(W - DQ(Q(W; s)))^2], W ∈ weights
When fp8_scale_sweep is enabled for a supported FP8-scale format, step_size is ignored.
- distributed_sync: bool | None
- fp8_scale_sweep: bool | None
- method: Literal['mse']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- start_multiplier: float | None
- step_size: float | None
- stop_multiplier: float | None
- class QuantizeAlgorithmConfig
Bases:
ModeloptBaseConfigCalibration algorithm config base.
- layerwise: LayerwiseConfig
- method: Literal[None]
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- moe_calib_experts_ratio: float | None
- validate_layerwise_checkpoint_dir()
Raise if layerwise.checkpoint_dir is set but layerwise.enable is False.
- class QuantizeConfig
Bases:
ModeloptBaseConfigDefault configuration for
quantizemode.- algorithm: str | dict | QuantizeAlgorithmConfig | None | list[str | dict | QuantizeAlgorithmConfig | None]
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod normalize_quant_cfg(v)
Normalize raw quant_cfg input into a
list[QuantizerCfgEntry].Delegates to
normalize_quant_cfg_list(), which accepts every supported input shape (new-format list, legacy single-key-dict list, legacy flat dict, and lists containing already-validatedQuantizerCfgEntryinstances) and rejects anything else with a clearValueErrorbefore pydantic’s field-type check would see it.- Parameters:
v (Sequence[QuantizerCfgEntry] | Sequence[Mapping[str, Any]] | Mapping[str, Any])
- Return type:
list[QuantizerCfgEntry]
- quant_cfg: list[QuantizerCfgEntry]
- class QuantizerAttributeConfig
Bases:
ModeloptBaseConfigQuantizer attribute type.
- axis: int | tuple[int, ...] | None
- backend: str | None
- backend_extra_args: dict | None
- bias: dict[int | str, Literal['static', 'dynamic'] | Literal['mean', 'max_min'] | tuple[int, ...] | bool | int | None] | None
- block_sizes: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None
- calibrator: str | Callable | tuple
- enable: bool
- fake_quant: bool
- learn_amax: bool
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- narrow_range: bool
- num_bits: int | tuple[int, int] | str
- pass_through_bwd: bool
- rotate: bool | RotateConfig
- trt_high_precision_dtype: str
- type: str
- unsigned: bool
- use_constant_amax: bool
- classmethod validate_bias(v)
Validate bias.
- classmethod validate_block_sizes(v, info)
Validate block sizes.
- Parameters:
info (ValidationInfo)
- classmethod validate_calibrator(v, info)
Validate calibrator.
- Parameters:
info (ValidationInfo)
- classmethod validate_config(values)
Validate quantizer config.
- classmethod validate_learn_amax(v)
Validate learn_amax.
- validate_num_bits()
Validate num_bits.
- class QuantizerCfgEntry
Bases:
ModeloptBaseConfigA single entry in a
quant_cfglist.- cfg: QuantizerAttributeConfig | list[QuantizerAttributeConfig] | None
- enable: bool
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- parent_class: str | None
- quantizer_name: str
- class RotateConfig
Bases:
ModeloptBaseConfigConfiguration for rotating quantizer input via Hadamard transform (RHT/QuaRot/SpinQuant).
See
normalized_hadamard_transformfor transform details.- block_size: int | None
- enable: bool
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- rotate_fp32: bool
- classmethod validate_block_size(v)
Validate block_size is a positive int (mode=before to catch bool before int coercion).
- class SVDQuantConfig
Bases:
QuantizeAlgorithmConfigThe config for SVDQuant.
Refer to the SVDQuant paper for more details.
- lowrank: int | None
- method: Literal['svdquant']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class SmoothQuantCalibConfig
Bases:
QuantizeAlgorithmConfigThe config for
smoothquantalgorithm (SmoothQuant).SmoothQuant applies a smoothing factor which balances the scale of outliers in weights and activations. See SmoothQuant paper for more details.
- alpha: float | None
- method: Literal['smoothquant']
- model_config = {'extra': 'forbid', 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- find_quant_cfg_entry_by_path(quant_cfg_list, quantizer_name)
Find the last entry in a
quant_cfglist whosequantizer_namekey equals the query.This performs an exact string comparison against the
quantizer_namefield of each entry — it does not applyfnmatchpattern matching. For example, passing"*input_quantizer"will only match entries whosequantizer_nameis literally"*input_quantizer", not entries with a different wildcard that would match the same module names at apply time.Returns the last match because entries are applied in list order and later entries override earlier ones, so the last match represents the effective configuration.
- Parameters:
quant_cfg_list (list[QuantizerCfgEntry]) – A list of
QuantizerCfgEntrydicts.quantizer_name (str) – The exact
quantizer_namestring to search for.
- Returns:
The last entry whose
quantizer_nameequals quantizer_name.- Raises:
KeyError – If no entry with the given
quantizer_nameis found.- Return type:
- need_calibration(config)
Check if calibration is needed for the given config.
- Parameters:
config (QuantizeConfig | Mapping[str, Any])
- Return type:
bool
- normalize_quant_cfg_list(v)
Normalize a raw quant_cfg into a list of
QuantizerCfgEntryinstances.Supports the following input forms:
A
listof entries in any of the per-entry forms below.A legacy flat
dict({"*": ..., "*weight_quantizer": ...}) — each key/value pair is converted to a single-key dict entry and then normalized.
Per-entry forms (when input is a list):
New format:
{"quantizer_name": ..., "enable": ..., "cfg": ...}— passed through.Legacy single-key format:
{"<quantizer_name>": <cfg_or_dict>}— converted to new format.Legacy
nn.*-scoped format:{"nn.<Class>": {"<quantizer_name>": <cfg>}}— converted to a new-format entry withparent_classset.
Each normalized dict is then constructed into a
QuantizerCfgEntry, whose own validator enforces that every entry specifiescfg,enable, or both, and that anycfgfor an enabled quantizer is a non-empty dict or non-empty list of non-empty dicts.- Parameters:
v (Sequence[QuantizerCfgEntry] | Sequence[Mapping[str, Any]] | Mapping[str, Any]) – A list of raw quant_cfg entries in any supported format, or a legacy flat dict.
- Returns:
A list of validated
QuantizerCfgEntryinstances.- Raises:
ValueError – If any entry’s shape is not recognized, or if it fails
QuantizerCfgEntryvalidation (missing instruction or invalidcfg).- Return type:
list[QuantizerCfgEntry]