conversion

Quantization conversion/restore utilities.

Functions

register

Register a quantized class for the given un-quantized original class.

replace_quant_module

Recursively replace the module with quantized module.

set_quantizer_attributes_full

Set quantizer attributes by wildcard or filter function, fully overwriting existing attributes.

set_quantizer_attributes_partial

Update a subset of quantizer attributes by wildcard or filter function, merging with existing attributes.

set_quantizer_by_cfg

Apply a quantization config list to the quantizers in quant_model.

set_quantizer_by_cfg_context

Context manager that temporarily applies a quantization config and restores the original state on exit.

unregister

Unregister the quantized class for the given un-quantized original class.

register(original_cls, quantized_cls)

Register a quantized class for the given un-quantized original class.

Parameters:
  • original_cls (Module) – The original un-quantized class.

  • quantized_cls (Module) – The quantized class. This class should have a _setup method which initializes various quantizers called in the forward. The forward function of the quantized class should call the quantizers at the correct location.

Here is an example of defining a quantized class and registering it:

import modelopt.torch.quantization as mtq
from modelopt.torch.quantization.nn import TensorQuantizer


class QuantLayerNorm(nn.LayerNorm):
    def __init__(self, normalized_shape):
        super().__init__(normalized_shape)
        self._setup()

    def _setup(self):
        # Method to setup the quantizers
        self.input_quantizer = TensorQuantizer()
        self.weight_quantizer = TensorQuantizer()

    def forward(self, input):
        input = self.input_quantizer(input)
        weight = self.weight_quantizer(self.weight)
        return F.layer_norm(input, self.normalized_shape, weight, self.bias, self.eps)


# Register the custom quantized module
mtq.register(original_cls=nn.LayerNorm, quantized_cls=QuantLayerNorm)
replace_quant_module(model, version=None, registry=<modelopt.torch.opt.dynamic._DMRegistryCls object>)

Recursively replace the module with quantized module.

Parameters:

model (Module)

set_quantizer_attributes_full(quant_model, wildcard_or_filter_func, attributes, parent_class=None)

Set quantizer attributes by wildcard or filter function, fully overwriting existing attributes.

Unlike set_quantizer_attributes_partial(), this function requires a complete QuantizerAttributeConfig and replaces the matched quantizer’s attributes entirely rather than merging with existing ones.

Parameters:
  • quant_model (Module) – A pytorch model.

  • wildcard_or_filter_func (str | Callable) – A wildcard string or a filter function. The wildcard string is matched against the quantizer module names. The quantizer modules are instances of TensorQuantizer. The filter function takes a quantizer module name as input and returns True if the quantizer should be adjusted and False otherwise.

  • attributes (QuantizerAttributeConfig | list[QuantizerAttributeConfig]) – A QuantizerAttributeConfig (or a list of them) that fully replaces the matched quantizer’s current attributes. All fields of the config are applied — unspecified fields revert to their defaults. If attributes is a list, the matched TensorQuantizer modules will be replaced with SequentialQuantizer modules having one quantizer per attribute instance in the list. See set_from_attribute_config() for details on supported attributes and their types.

  • parent_class (type[Module] | None) – (Optional) Restrict matching to quantizers whose immediate parent module is an instance of this class. If None, all quantizers matching wildcard_or_filter_func are adjusted.

set_quantizer_attributes_partial(quant_model, wildcard_or_filter_func, partial_attributes, parent_class=None)

Update a subset of quantizer attributes by wildcard or filter function, merging with existing attributes.

Unlike set_quantizer_attributes_full(), this function accepts an arbitrary subset of quantizer attributes as a plain dict and merges them into the matched quantizer’s current attributes, leaving unspecified attributes unchanged.

Parameters:
  • quant_model (Module) – A pytorch model.

  • wildcard_or_filter_func (str | Callable) – A wildcard string or a filter function. The wildcard string is matched against the quantizer module names. The quantizer modules are instances of TensorQuantizer. The filter function takes a quantizer module name as input and returns True if the quantizer should be adjusted and False otherwise.

  • partial_attributes (dict[str, Any] | list[dict[str, Any]]) – A dict (or a list of dict) containing only the attributes to update. Keys must be valid fields of QuantizerAttributeConfig. Only the specified keys are written; all other attributes on the quantizer remain unchanged. When a dict is passed and the matched module is a SequentialQuantizer, the dict is broadcast to every sub-quantizer. When a list is passed, the matched module must already be a SequentialQuantizer — unlike set_quantizer_attributes_full(), this function will not replace a TensorQuantizer with a SequentialQuantizer. See set_from_attribute_config() for details on supported attributes and their types.

  • parent_class (type[Module] | None) – (Optional) Restrict matching to quantizers whose immediate parent module is an instance of this class. If None, all quantizers matching wildcard_or_filter_func are adjusted.

set_quantizer_by_cfg(quant_model, quant_cfg)

Apply a quantization config list to the quantizers in quant_model.

quant_cfg is an ordered list of QuantizerCfgEntry dicts. Each entry has the following fields:

  • quantizer_path (required): wildcard matched against quantizer module names via fnmatch().

  • cfg (optional): a dict of QuantizerAttributeConfig fields, or a list of such dicts for sequential quantization.

  • enable (optional): True or False to toggle matched quantizers on or off. When omitted but cfg is present, defaults to True. Every entry must specify at least one of cfg or enable — an entry with only quantizer_path is invalid.

  • parent_class (optional): restricts matching to quantizers whose immediate parent module is of this PyTorch class name.

Ordering and atomicity: entries are applied in list order; later entries override earlier ones for any quantizer they match. Each entry with a cfg is a complete replacement — unspecified attributes revert to their defaults rather than inheriting from a prior entry. The typical pattern is to deny all first ({"quantizer_path": "*", "enable": False}), then selectively enable and configure target quantizers in subsequent entries.

``enable`` and ``cfg`` are independent:

  • An entry with cfg (and optionally enable) fully replaces the matched quantizer’s attributes. If enable is omitted, the quantizer is implicitly enabled.

  • {"enable": False} without cfg only toggles the matched quantizers off, leaving all other attributes unchanged.

  • {"enable": True} without cfg only toggles the matched quantizers on, using whatever attributes they currently have (or their defaults if never configured).

See Quantization Configuration (quant_cfg) for the full format reference and common patterns.

Parameters:
set_quantizer_by_cfg_context(quant_model, quant_cfg)

Context manager that temporarily applies a quantization config and restores the original state on exit.

Calls set_quantizer_by_cfg() on entry and reverts every TensorQuantizer in quant_model to its original attributes on exit.

Caution

Changing stateful attributes such as calibrator inside this context may produce unexpected behavior because those objects are not deep-copied during save/restore.

Parameters:
  • quant_model (Module) – A quantized PyTorch model whose quantizers will be temporarily reconfigured.

  • quant_cfg (list[QuantizerCfgEntry]) – A quantization config (or list of QuantizerCfgEntry dicts) passed directly to set_quantizer_by_cfg(). Sequential cfg lists are not allowed.

Yields:

None — the context body runs with the new quantizer attributes active.

unregister(original_cls)

Unregister the quantized class for the given un-quantized original class.

Parameters:

original_cls (Module) – The original un-quantized class.