quant_module
Base class for quantization modules.
Classes
Base class for modules where the input is quantized. |
|
Base class for quantized linear modules. |
|
A base class for quantized modules. |
- class QuantInputBase
Bases:
QuantModule
Base class for modules where the input is quantized.
- default_quant_desc_input = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
- default_quant_desc_output = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
- forward(input, *args, **kwargs)
Quantize the input before calling the original forward method.
- input_quantizer: TensorQuantizer
- output_quantizer: TensorQuantizer
- class QuantLinearConvBase
Bases:
QuantInputBase
Base class for quantized linear modules.
Quantized linear modules are modules where both the input and the weight are quantized.
- default_quant_desc_weight = QuantizerAttributeConfig(enable=True, num_bits=8, axis=None, fake_quant=True, unsigned=False, narrow_range=False, learn_amax=False, type='static', block_sizes=None, bias=None, trt_high_precision_dtype='Float', calibrator='max', rotate=False)
- forward(input, *args, **kwargs)
Quantize the input and the weight before calling the original forward method.
- static initialize_real_qtensor_with_dummy_weight(module)
Initalize the real qunatized tensors.
- quantize_weight()
Context in which self.weight is quantized.
- weight_quantizer: TensorQuantizer | SequentialQuantizer
- class QuantModule
Bases:
DynamicModule
A base class for quantized modules.
- fold_weight()
Fold the weight for faster eval.
- modelopt_post_restore(prefix='')
Post-restore to correctly configure the TensorQuantizer states.
- TensorQuantizer states are restored to their shape before saving. Now we need to further configure them.
- For non-sharded modules this simply involves moving the TensorQuantizer states to the right device and
dtype. This applies for regular Pytorch models and HuggingFace models.
- For sharded modules the restored states of TensorQuantizer could be incorrect. This is because
parallelism such as TP might have been changed between saving and resoring. So we need to re-calculate the state shapes. Hence such modules should override this and implement their own logic.
- Parameters:
prefix (str) –
- property mopt_ckpt_versn
Checkpoint version of the modelopt.