quant_module
Base class for quantization modules.
Classes
Base class for modules where the input is quantized. |
|
Base class for quantized linear modules. |
- class QuantInputBase
Bases:
DynamicModule
Base class for modules where the input is quantized.
- default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
- default_quant_desc_output = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
- forward(input, *args, **kwargs)
Quantize the input before calling the original forward method.
- input_quantizer: TensorQuantizer | SequentialQuantizer
- output_quantizer: TensorQuantizer | SequentialQuantizer
- class QuantLinearConvBase
Bases:
QuantInputBase
Base class for quantized linear modules.
Quantized linear modules are modules where both the input and the weight are quantized.
- default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, trt_high_precision_dtype='Float', calibrator='max')
- forward(input, *args, **kwargs)
Quantize the input and the weight before calling the original forward method.
- static initialize_quantizer_with_dummy_states(module)
Initialize the quantizer states with dummy values with the correct type and device.
- static initialize_real_qtensor_with_dummy_weight(module)
Initalize the real qunatized tensors.
- quantize_weight()
Context in which self.weight is quantized.
- static sanitize_dummy_weight(module)
Replace nan values with ones in dummy tensors.
- weight_quantizer: TensorQuantizer | SequentialQuantizer