quant_module

Base class for quantization modules.

Classes

`QuantInputBase`	Base class for modules where the input is quantized.
`QuantLinearConvBase`	Base class for quantized linear modules.
`QuantModule`	A base class for quantized modules.

class QuantInputBase

Bases: QuantModule

Base class for modules where the input is quantized.

default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False, pass_through_bwd=False, backend=None, backend_extra_args=None)

default_quant_desc_output = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False, pass_through_bwd=False, backend=None, backend_extra_args=None)

forward(input, *args, **kwargs): Quantize the input before calling the original forward method.

class QuantLinearConvBase

Bases: QuantInputBase

Base class for quantized linear modules.

Quantized linear modules are modules where both the input and the weight are quantized.

default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False, pass_through_bwd=False, backend=None, backend_extra_args=None)

forward(input, *args, **kwargs): Quantize the input and the weight before calling the original forward method.

class QuantModule

Bases: DynamicModule

A base class for quantized modules.

In addition, the class also provides parallel_state attribute that can be used to access the parallel state of the module.

classmethod convert(module, **setup_kwargs)

Convert the module to a dynamic module.

Parameters:

Return type:

QuantModule

fold_weight(keep_attrs=False)

Fold the weight for faster eval.

modelopt_post_restore(prefix='')

Post-restore to correctly configure the TensorQuantizer states.

TensorQuantizer states are restored to their shape before saving. Now we need to further configure them.

For non-sharded modules this simply involves moving the TensorQuantizer states to the right device.
This applies for regular Pytorch models and HuggingFace models.
For sharded modules the restored states of TensorQuantizer could be incorrect. This is because
parallelism such as TP might have been changed between saving and resoring. So we need to re-calculate the state shapes. Hence such modules should override this and implement their own logic.

property parallel_state: ParallelState | None: Return the parallel state of the quant module.