quant_module

Base class for quantization modules.

Classes

QuantInputBase

Base class for modules where the input is quantized.

QuantLinearConvBase

Base class for quantized linear modules.

QuantModule

A base class for quantized modules.

class QuantInputBase

Bases: QuantModule

Base class for modules where the input is quantized.

default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
default_quant_desc_output = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
forward(input, *args, **kwargs)

Quantize the input before calling the original forward method.

input_quantizer: TensorQuantizer
output_quantizer: TensorQuantizer
class QuantLinearConvBase

Bases: QuantInputBase

Base class for quantized linear modules.

Quantized linear modules are modules where both the input and the weight are quantized.

default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
forward(input, *args, **kwargs)

Quantize the input and the weight before calling the original forward method.

static initialize_real_qtensor_with_dummy_weight(module)

Initalize the real qunatized tensors.

quantize_weight()

Context in which self.weight is quantized.

weight_quantizer: TensorQuantizer | SequentialQuantizer
class QuantModule

Bases: DynamicModule

A base class for quantized modules.

fold_weight()

Fold the weight for faster eval.

modelopt_post_restore(prefix='')

Post-restore to correctly configure the TensorQuantizer states.

TensorQuantizer states are restored to their shape before saving. Now we need to further configure them.
  1. For non-sharded modules this simply involves moving the TensorQuantizer states to the right device and

    dtype. This applies for regular Pytorch models and HuggingFace models.

  2. For sharded modules the restored states of TensorQuantizer could be incorrect. This is because

    parallelism such as TP might have been changed between saving and resoring. So we need to re-calculate the state shapes. Hence such modules should override this and implement their own logic.

Parameters:

prefix (str) –

property mopt_ckpt_versn

Checkpoint version of the modelopt.