quant_module

Base class for quantization modules.

Classes

QuantInputBase

Base class for modules where the input is quantized.

QuantLinearConvBase

Base class for quantized linear modules.

class QuantInputBase

Bases: DynamicModule

Base class for modules where the input is quantized.

default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
default_quant_desc_output = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
forward(input, *args, **kwargs)

Quantize the input before calling the original forward method.

input_quantizer: TensorQuantizer | SequentialQuantizer
output_quantizer: TensorQuantizer | SequentialQuantizer
class QuantLinearConvBase

Bases: QuantInputBase

Base class for quantized linear modules.

Quantized linear modules are modules where both the input and the weight are quantized.

default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
forward(input, *args, **kwargs)

Quantize the input and the weight before calling the original forward method.

static initialize_quantizer_with_dummy_states(module)

Initialize the quantizer states with dummy values with the correct type and device.

static initialize_real_qtensor_with_dummy_weight(module)

Initalize the real qunatized tensors.

quantize_weight()

Context in which self.weight is quantized.

static sanitize_dummy_weight(module)

Replace nan values with ones in dummy tensors.

weight_quantizer: TensorQuantizer | SequentialQuantizer