conversion
Quantization conversion/restore utilities.
Functions
Register a quantized class for the given un-quantized original class. |
|
Recursively replace the module with quantized module. |
|
Set quantizer attributes by wildcard or filter function, fully overwriting existing attributes. |
|
Update a subset of quantizer attributes by wildcard or filter function, merging with existing attributes. |
|
Apply a quantization config list to the quantizers in |
|
Context manager that temporarily applies a quantization config and restores the original state on exit. |
|
Unregister the quantized class for the given un-quantized original class. |
- register(original_cls, quantized_cls)
Register a quantized class for the given un-quantized original class.
- Parameters:
original_cls (Module) – The original un-quantized class.
quantized_cls (Module) – The quantized class. This class should have a _setup method which initializes various quantizers called in the forward. The forward function of the quantized class should call the quantizers at the correct location.
Here is an example of defining a quantized class and registering it:
import modelopt.torch.quantization as mtq from modelopt.torch.quantization.nn import TensorQuantizer class QuantLayerNorm(nn.LayerNorm): def __init__(self, normalized_shape): super().__init__(normalized_shape) self._setup() def _setup(self): # Method to setup the quantizers self.input_quantizer = TensorQuantizer() self.weight_quantizer = TensorQuantizer() def forward(self, input): input = self.input_quantizer(input) weight = self.weight_quantizer(self.weight) return F.layer_norm(input, self.normalized_shape, weight, self.bias, self.eps) # Register the custom quantized module mtq.register(original_cls=nn.LayerNorm, quantized_cls=QuantLayerNorm)
- replace_quant_module(model, version=None, registry=<modelopt.torch.opt.dynamic._DMRegistryCls object>)
Recursively replace the module with quantized module.
- Parameters:
model (Module)
- set_quantizer_attributes_full(quant_model, wildcard_or_filter_func, attributes, parent_class=None)
Set quantizer attributes by wildcard or filter function, fully overwriting existing attributes.
Unlike
set_quantizer_attributes_partial(), this function requires a completeQuantizerAttributeConfigand replaces the matched quantizer’s attributes entirely rather than merging with existing ones.- Parameters:
quant_model (Module) – A pytorch model.
wildcard_or_filter_func (str | Callable) – A wildcard string or a filter function. The wildcard string is matched against the quantizer module names. The quantizer modules are instances of
TensorQuantizer. The filter function takes a quantizer module name as input and returnsTrueif the quantizer should be adjusted andFalseotherwise.attributes (QuantizerAttributeConfig | list[QuantizerAttributeConfig]) – A
QuantizerAttributeConfig(or a list of them) that fully replaces the matched quantizer’s current attributes. All fields of the config are applied — unspecified fields revert to their defaults. Ifattributesis a list, the matchedTensorQuantizermodules will be replaced withSequentialQuantizermodules having one quantizer per attribute instance in the list. Seeset_from_attribute_config()for details on supported attributes and their types.parent_class (type[Module] | None) – (Optional) Restrict matching to quantizers whose immediate parent module is an instance of this class. If
None, all quantizers matchingwildcard_or_filter_funcare adjusted.
- set_quantizer_attributes_partial(quant_model, wildcard_or_filter_func, partial_attributes, parent_class=None)
Update a subset of quantizer attributes by wildcard or filter function, merging with existing attributes.
Unlike
set_quantizer_attributes_full(), this function accepts an arbitrary subset of quantizer attributes as a plaindictand merges them into the matched quantizer’s current attributes, leaving unspecified attributes unchanged.- Parameters:
quant_model (Module) – A pytorch model.
wildcard_or_filter_func (str | Callable) – A wildcard string or a filter function. The wildcard string is matched against the quantizer module names. The quantizer modules are instances of
TensorQuantizer. The filter function takes a quantizer module name as input and returnsTrueif the quantizer should be adjusted andFalseotherwise.partial_attributes (dict[str, Any] | list[dict[str, Any]]) – A
dict(or a list ofdict) containing only the attributes to update. Keys must be valid fields ofQuantizerAttributeConfig. Only the specified keys are written; all other attributes on the quantizer remain unchanged. When adictis passed and the matched module is aSequentialQuantizer, the dict is broadcast to every sub-quantizer. When alistis passed, the matched module must already be aSequentialQuantizer— unlikeset_quantizer_attributes_full(), this function will not replace aTensorQuantizerwith aSequentialQuantizer. Seeset_from_attribute_config()for details on supported attributes and their types.parent_class (type[Module] | None) – (Optional) Restrict matching to quantizers whose immediate parent module is an instance of this class. If
None, all quantizers matchingwildcard_or_filter_funcare adjusted.
- set_quantizer_by_cfg(quant_model, quant_cfg)
Apply a quantization config list to the quantizers in
quant_model.quant_cfgis an ordered list ofQuantizerCfgEntrydicts. Each entry has the following fields:quantizer_path(required): wildcard matched against quantizer module names viafnmatch().cfg(optional): a dict ofQuantizerAttributeConfigfields, or a list of such dicts for sequential quantization.enable(optional):TrueorFalseto toggle matched quantizers on or off. When omitted butcfgis present, defaults toTrue. Every entry must specify at least one ofcfgorenable— an entry with onlyquantizer_pathis invalid.parent_class(optional): restricts matching to quantizers whose immediate parent module is of this PyTorch class name.
Ordering and atomicity: entries are applied in list order; later entries override earlier ones for any quantizer they match. Each entry with a
cfgis a complete replacement — unspecified attributes revert to their defaults rather than inheriting from a prior entry. The typical pattern is to deny all first ({"quantizer_path": "*", "enable": False}), then selectively enable and configure target quantizers in subsequent entries.``enable`` and ``cfg`` are independent:
An entry with
cfg(and optionallyenable) fully replaces the matched quantizer’s attributes. Ifenableis omitted, the quantizer is implicitly enabled.{"enable": False}withoutcfgonly toggles the matched quantizers off, leaving all other attributes unchanged.{"enable": True}withoutcfgonly toggles the matched quantizers on, using whatever attributes they currently have (or their defaults if never configured).
See Quantization Configuration (quant_cfg) for the full format reference and common patterns.
- Parameters:
quant_model (Module)
quant_cfg (list[QuantizerCfgEntry])
- set_quantizer_by_cfg_context(quant_model, quant_cfg)
Context manager that temporarily applies a quantization config and restores the original state on exit.
Calls
set_quantizer_by_cfg()on entry and reverts everyTensorQuantizerinquant_modelto its original attributes on exit.Caution
Changing stateful attributes such as
calibratorinside this context may produce unexpected behavior because those objects are not deep-copied during save/restore.- Parameters:
quant_model (Module) – A quantized PyTorch model whose quantizers will be temporarily reconfigured.
quant_cfg (list[QuantizerCfgEntry]) – A quantization config (or list of
QuantizerCfgEntrydicts) passed directly toset_quantizer_by_cfg(). Sequentialcfglists are not allowed.
- Yields:
None — the context body runs with the new quantizer attributes active.
- unregister(original_cls)
Unregister the quantized class for the given un-quantized original class.
- Parameters:
original_cls (Module) – The original un-quantized class.