config

This document lists the quantization formats supported by Model Optimizer and example quantization configs.

Quantization Formats

The following table lists the quantization formats supported by Model Optimizer and the corresponding quantization config. See Quantization Configs for the specific quantization config definitions.

Please see choosing the right quantization formats to learn more about the formats and their use-cases.

Note

The recommended configs given below are for LLM models. For CNN models, only INT8 quantization is supported. Please use quantization config INT8_DEFAULT_CFG for CNN models.

Quantization Format	Model Optimizer config
INT8	`INT8_SMOOTHQUANT_CFG`
FP8	`FP8_DEFAULT_CFG`
INT4 Weights only AWQ (W4A16)	`INT4_AWQ_CFG`
INT4-FP8 AWQ (W4A8)	`W4A8_AWQ_BETA_CFG`

Quantization Configs

Quantization config is a dictionary with two top-level keys:

"quant_cfg": an ordered list of QuantizerCfgEntry dicts that specify which quantizers to configure and how.
"algorithm": the calibration algorithm passed to calibrate.

Please see QuantizeConfig for the full config schema.

`quant_cfg` — Entry Format

Each entry in the quant_cfg list is a QuantizerCfgEntry with the following fields:

quantizer_name (required): a wildcard string matched against quantizer module names. Quantizer modules are instances of TensorQuantizer and have names ending with weight_quantizer, input_quantizer, etc.
parent_class (optional): restricts matching to quantizers whose immediate parent module is of this PyTorch class (e.g. "nn.Linear"). If omitted, all matching quantizers are targeted regardless of their parent class.
cfg (optional): a dict of quantizer attributes as defined by QuantizerAttributeConfig, or a list of such dicts. When a list is given, the matched TensorQuantizer is replaced with a SequentialQuantizer that applies each format in sequence. This is used for example in W4A8 quantization where weights are quantized first in INT4 and then in FP8.
enable (optional): toggles matched quantizers on (True) or off (False), independently of cfg. When cfg is present and enable is absent, the quantizer is implicitly enabled. When enable is the only field (no cfg), it only flips the on/off state — all other attributes remain unchanged.

`quant_cfg` — Ordering and Precedence

Entries are applied in list order; later entries override earlier ones for any quantizer they match. The recommended pattern is:

Start with a deny-all entry {"quantizer_name": "*", "enable": False} (provided as _base_disable_all) to disable every quantizer by default.
Follow with format-specific entries that selectively enable and configure the desired quantizers.
Append _default_disabled_quantizer_cfg to enforce standard exclusions (e.g. BatchNorm layers, LM head, MoE routers).

To get the string representation of a module class for use in parent_class, do:

from modelopt.torch.quantization import QuantModuleRegistry

# Get the class name for nn.Conv2d
class_name = QuantModuleRegistry.get_key(nn.Conv2d)

Here is an example of a quantization config:

MY_QUANT_CFG = {
    "quant_cfg": [
        # Deny all quantizers by default
        {"quantizer_name": "*", "enable": False},

        # Enable and configure weight and input quantizers
        {"quantizer_name": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
        {"quantizer_name": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},

        # Disable input quantizers specifically for LeakyReLU layers
        {"quantizer_name": "*input_quantizer", "parent_class": "nn.LeakyReLU", "enable": False},
    ]
}

Example Quantization Configurations

These example configs can be accessed as attributes of modelopt.torch.quantization and can be given as input to mtq.quantize(). For example:

import modelopt.torch.quantization as mtq
model = mtq.quantize(model, mtq.INT8_DEFAULT_CFG, forward_loop)

You can also create your own config by following these examples. For instance, if you want to quantize a model with int4 AWQ algorithm, but need to skip quantizing the layer named lm_head, you can create a custom config and quantize your model as following:

# Create custom config
CUSTOM_INT4_AWQ_CFG = copy.deepcopy(mtq.INT4_AWQ_CFG)
CUSTOM_INT4_AWQ_CFG["quant_cfg"].append({"quantizer_name": "*lm_head*", "enable": False})

# quantize model
model = mtq.quantize(model, CUSTOM_INT4_AWQ_CFG, forward_loop)

Classes

`AWQClipCalibConfig`	The config for `awq_clip` (AWQ clip) algorithm.
`AWQFullCalibConfig`	The config for `awq` or `awq_full` algorithm (AWQ full).
`AWQLiteCalibConfig`	The config for `awq_lite` (AWQ lite) algorithm.
`CompressConfig`	Default configuration for `compress` mode.
`GPTQCalibConfig`	The config for GPTQ quantization.
`LayerwiseConfig`	Nested config for layer-by-layer calibration behavior.
`LocalHessianCalibConfig`	Configuration for local Hessian-weighted MSE calibration.
`MaxCalibConfig`	The config for max calibration algorithm.
`MseCalibConfig`	Configuration for per-tensor MSE calibration.
`QuantizeAlgorithmConfig`	Calibration algorithm config base.
`QuantizeConfig`	Default configuration for `quantize` mode.
`QuantizerAttributeConfig`	Quantizer attribute type.
`QuantizerCfgEntry`	A single entry in a `quant_cfg` list.
`RotateConfig`	Configuration for rotating quantizer input via Hadamard transform (RHT/QuaRot/SpinQuant).
`SVDQuantConfig`	The config for SVDQuant.
`SmoothQuantCalibConfig`	The config for `smoothquant` algorithm (SmoothQuant).

Functions

`find_quant_cfg_entry_by_path`	Find the last entry in a `quant_cfg` list whose `quantizer_name` key equals the query.
`need_calibration`	Check if calibration is needed for the given config.
`normalize_quant_cfg_list`	Normalize a raw quant_cfg into a list of `QuantizerCfgEntry` instances.

class AWQClipCalibConfig

Bases: QuantizeAlgorithmConfig

The config for awq_clip (AWQ clip) algorithm.

AWQ clip searches clipped amax for per-group quantization, This search requires much more compute compared to AWQ lite. To avoid any OOM, the linear layer weights are batched along the out_features dimension of batch size max_co_batch_size. AWQ clip calibration also takes longer than AWQ lite.

debug: bool | None

max_co_batch_size: int | None

max_tokens_per_batch: int | None

method: Literal['awq_clip']

min_clip_ratio: float | None

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

shrink_step: float | None

class AWQFullCalibConfig

Bases: AWQLiteCalibConfig, AWQClipCalibConfig

The config for awq or awq_full algorithm (AWQ full).

AWQ full performs awq_lite followed by awq_clip.

debug: bool | None

method: Literal['awq_full']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class AWQLiteCalibConfig

Bases: QuantizeAlgorithmConfig

The config for awq_lite (AWQ lite) algorithm.

AWQ lite applies a channel-wise scaling factor which minimizes the output difference after quantization. See AWQ paper for more details.

alpha_step: float | None

debug: bool | None

method: Literal['awq_lite']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class CompressConfig

Bases: ModeloptBaseConfig

Default configuration for compress mode.

compress: dict[str, bool]

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

quant_gemm: bool

class GPTQCalibConfig

Bases: QuantizeAlgorithmConfig

The config for GPTQ quantization.

GPTQ minimizes the layer-wise quantization error by using second-order (Hessian) information to perform blockwise weight updates that compensate for rounding loss. Layers are quantized sequentially so that each layer’s Hessian is computed from activations that already reflect the quantization of preceding layers.

The default values are taken from the official GPTQ implementation: https://github.com/IST-DASLab/FP-Quant/blob/d2e3092f968262c4de5fb050e1aef568a280dadd/src/quantization/gptq.py#L35

block_size: int | None

fused: bool

method: Literal['gptq']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

perc_damp: float | None

class LayerwiseConfig

Bases: ModeloptBaseConfig

Nested config for layer-by-layer calibration behavior.

checkpoint_dir: str | None

enable: bool

get_qdq_activations_from_prev_layer: bool

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

save_every: int

class LocalHessianCalibConfig

Bases: _SharedStatesConfig, QuantizeAlgorithmConfig

Configuration for local Hessian-weighted MSE calibration.

This algorithm uses activation information to optimize per-block scales for weight quantization. It minimizes the output reconstruction error by weighting the loss with the local Hessian matrix computed from input activations.

The local Hessian loss for each block is: (dw @ H @ dw.T) where: - dw = weight - quantized_weight (weight reconstruction error per block) - H = X @ X.T is the local Hessian computed from input activations X

block_size: int | None

debug: bool | None

distributed_sync: bool | None

fp8_scale_sweep: bool | None

method: Literal['local_hessian']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

start_multiplier: float | None

step_size: float | None

stop_multiplier: float | None

class MaxCalibConfig

Bases: _SharedStatesConfig, QuantizeAlgorithmConfig

The config for max calibration algorithm.

Max calibration estimates max values of activations or weights and use this max values to set the quantization scaling factor. See Integer Quantization for the concepts.

distributed_sync: bool | None

method: Literal['max']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sync_expert_weight_amax: bool

class MseCalibConfig

Bases: _SharedStatesConfig, QuantizeAlgorithmConfig

Configuration for per-tensor MSE calibration.

Finds a scale s (via amax a, with s = a / q_max) that minimizes the reconstruction error of a tensor after uniform Q→DQ:

s* = argmin_s E[(W - DQ(Q(W; s)))^2], W ∈ weights

When fp8_scale_sweep is enabled for a supported FP8-scale format, step_size is ignored.

distributed_sync: bool | None

fp8_scale_sweep: bool | None

method: Literal['mse']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

start_multiplier: float | None

step_size: float | None

stop_multiplier: float | None

class QuantizeAlgorithmConfig

Bases: ModeloptBaseConfig

Calibration algorithm config base.

layerwise: LayerwiseConfig

method: Literal[None]

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

moe_calib_experts_ratio: float | None

validate_layerwise_checkpoint_dir(): Raise if layerwise.checkpoint_dir is set but layerwise.enable is False.

class QuantizeConfig

Bases: ModeloptBaseConfig

Default configuration for quantize mode.

algorithm: str | dict | QuantizeAlgorithmConfig | None | list[str | dict | QuantizeAlgorithmConfig | None]

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod normalize_quant_cfg(v)

Normalize raw quant_cfg input into a list[QuantizerCfgEntry].

Delegates to normalize_quant_cfg_list(), which accepts every supported input shape (new-format list, legacy single-key-dict list, legacy flat dict, and lists containing already-validated QuantizerCfgEntry instances) and rejects anything else with a clear ValueError before pydantic’s field-type check would see it.

Parameters:: v (Sequence[QuantizerCfgEntry] | Sequence[Mapping[str, Any]] | Mapping[str, Any])
Return type:: list[QuantizerCfgEntry]

quant_cfg: list[QuantizerCfgEntry]

class QuantizerAttributeConfig

Bases: ModeloptBaseConfig

Quantizer attribute type.

axis: int | tuple[int, ...] | None

backend: str | None

backend_extra_args: dict | None

bias: dict[int | str, Literal['static', 'dynamic'] | Literal['mean', 'max_min'] | tuple[int, ...] | bool | int | None] | None

block_sizes: dict[int | str, int | tuple[int, int] | str | dict[int, int] | None] | None

calibrator: str | Callable | tuple

enable: bool

fake_quant: bool

learn_amax: bool

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

narrow_range: bool

num_bits: int | tuple[int, int] | str

pass_through_bwd: bool

rotate: bool | RotateConfig

trt_high_precision_dtype: str

type: str

unsigned: bool

use_constant_amax: bool

classmethod validate_bias(v): Validate bias.

classmethod validate_block_sizes(v, info)

Validate block sizes.

Parameters:: info (ValidationInfo)

classmethod validate_calibrator(v, info)

Validate calibrator.

Parameters:: info (ValidationInfo)

classmethod validate_config(values): Validate quantizer config.

classmethod validate_learn_amax(v): Validate learn_amax.

validate_num_bits(): Validate num_bits.

class QuantizerCfgEntry

Bases: ModeloptBaseConfig

A single entry in a quant_cfg list.

cfg: QuantizerAttributeConfig | list[QuantizerAttributeConfig] | None

enable: bool

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parent_class: str | None

quantizer_name: str

class RotateConfig

Bases: ModeloptBaseConfig

Configuration for rotating quantizer input via Hadamard transform (RHT/QuaRot/SpinQuant).

See normalized_hadamard_transform for transform details.

block_size: int | None

enable: bool

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

rotate_fp32: bool

classmethod validate_block_size(v): Validate block_size is a positive int (mode=before to catch bool before int coercion).

class SVDQuantConfig

Bases: QuantizeAlgorithmConfig

The config for SVDQuant.

Refer to the SVDQuant paper for more details.

lowrank: int | None

method: Literal['svdquant']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class SmoothQuantCalibConfig

Bases: QuantizeAlgorithmConfig

The config for smoothquant algorithm (SmoothQuant).

SmoothQuant applies a smoothing factor which balances the scale of outliers in weights and activations. See SmoothQuant paper for more details.

alpha: float | None

method: Literal['smoothquant']

model_config = {'extra': 'forbid', 'validate_assignment': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

find_quant_cfg_entry_by_path(quant_cfg_list, quantizer_name)

Find the last entry in a quant_cfg list whose quantizer_name key equals the query.

This performs an exact string comparison against the quantizer_name field of each entry — it does not apply fnmatch pattern matching. For example, passing "*input_quantizer" will only match entries whose quantizer_name is literally "*input_quantizer", not entries with a different wildcard that would match the same module names at apply time.

Returns the last match because entries are applied in list order and later entries override earlier ones, so the last match represents the effective configuration.

Parameters:

quant_cfg_list (list[QuantizerCfgEntry]) – A list of QuantizerCfgEntry dicts.
quantizer_name (str) – The exact quantizer_name string to search for.

Returns:

The last entry whose quantizer_name equals quantizer_name.

Raises:

KeyError – If no entry with the given quantizer_name is found.

Return type:

QuantizerCfgEntry

need_calibration(config)

Check if calibration is needed for the given config.

Parameters:: config (QuantizeConfig | Mapping[str, Any])
Return type:: bool

normalize_quant_cfg_list(v)

Normalize a raw quant_cfg into a list of QuantizerCfgEntry instances.

Supports the following input forms:

A list of entries in any of the per-entry forms below.
A legacy flat dict ({"*": ..., "*weight_quantizer": ...}) — each key/value pair is converted to a single-key dict entry and then normalized.

Per-entry forms (when input is a list):

New format: {"quantizer_name": ..., "enable": ..., "cfg": ...} — passed through.
Legacy single-key format: {"<quantizer_name>": <cfg_or_dict>} — converted to new format.
Legacy nn.*-scoped format: {"nn.<Class>": {"<quantizer_name>": <cfg>}} — converted to a new-format entry with parent_class set.

Each normalized dict is then constructed into a QuantizerCfgEntry, whose own validator enforces that every entry specifies cfg, enable, or both, and that any cfg for an enabled quantizer is a non-empty dict or non-empty list of non-empty dicts.

Parameters:: v (Sequence[QuantizerCfgEntry] | Sequence[Mapping[str, Any]] | Mapping[str, Any]) – A list of raw quant_cfg entries in any supported format, or a legacy flat dict.
Returns:: A list of validated QuantizerCfgEntry instances.
Raises:: ValueError – If any entry’s shape is not recognized, or if it fails QuantizerCfgEntry validation (missing instruction or invalid cfg).
Return type:: list[QuantizerCfgEntry]

config

Quantization Formats

Quantization Configs

quant_cfg — Entry Format

quant_cfg — Ordering and Precedence

Example Quantization Configurations

`quant_cfg` — Entry Format

`quant_cfg` — Ordering and Precedence