tensor_quantizer
TensorQuantizer Module.
Classes
Tensor quantizer module. |
|
A sequential container for |
- class SequentialQuantizer
Bases:
Sequential
A sequential container for
TensorQuantizer
modules.This modules is used to quantize a tensor in multiple formats sequentially. It takes as input
TensorQuantizer
modules and containerize them similar totorch.nn.Sequential
.- Parameters:
quantizers (TensorQuantizer) –
TensorQuantizer
modules to be added to the container.
- __init__(*quantizers)
Initialize SequentialQuantizer module.
- Parameters:
quantizers (TensorQuantizer) –
- disable()
Disable the quantizer modules.
- get_modelopt_state()
Get meta state to be saved in checkpoint.
- Return type:
Dict[str, Any]
- static replace_sequential_quantizer_with_single_quantizer(model, indx=0)
Replace instances of
SequentialQuantizer
in the model with single quantizers.The quantizer indexed by
indx
from the sequential quantizer is used to replace it. This method is useful for individually calibrating the quantizers in a sequential quantizer.- Parameters:
indx (int) –
- reset_amax()
Reset amax of the quantizers.
- set_from_attribute_config(attributes)
Set the attributes of contained quantizers from a list of attribute_dicts.
- Parameters:
attributes (List[Dict[str, Any] | QuantizerAttributeConfig] | Dict[str, Any] | QuantizerAttributeConfig) –
- static tensor_quantizer_iterator(quantizers)
Iterator for the quantizers in the container (but yield itself if its a TensorQuantizer).
- class TensorQuantizer
Bases:
Module
Tensor quantizer module.
This module manages quantization and calibration of input tensor. It can perform fake (simulated quantization) or real quantization for various precisions and formats such as FP8 per-tensor, INT8 per-channel, INT4 per-block etc.
If quantization is enabled, it calls the appropriate quantization functional and returns the quantized tensor. The quantized tensor data type will be same as the input tensor data type for fake quantization. During calibration mode, the module collects the statistics using its calibrator.
The quantization parameters are as described in
QuantizerAttributeConfig
. They can be set at initialization usingquant_attribute_cfg
or later by callingset_from_attribute_config()
.- Parameters:
quant_attribute_cfg – An instance of
QuantizerAttributeConfig
or None. If None, default values are used.if_quant – A boolean. If True, quantization is enabled in the forward path.
if_clip – A boolean. If True, clipping (with
_learn_amax
) is enabled in the forward path.if_calib – A boolean. If True, calibration is enabled in the forward path.
amax – None or an array like object such as list, tuple, numpy array, scalar which can be used to construct amax tensor.
- __init__(quant_attribute_cfg=None, if_quant=True, if_clip=False, if_calib=False, amax=None)
Initialize quantizer and set up required variables.
- property amax
Return amax for quantization.
- property axis
Return axis for quantization.
- property block_sizes
Return block_sizes for quantization.
- clean_up_after_set_from_modelopt_state(prefix='')
Clean up temporary variables created during set_from_modelopt_state.
- dequantize(qtensor)
De-quantize a real quantized tensor to a given dtype.
- Parameters:
qtensor (BaseQuantizedTensor) –
- disable()
Bypass the module.
Neither of calibration, clipping and quantization will be performed if the module is disabled.
- disable_calib()
Disable calibration.
- disable_clip()
Disable clip stage.
- disable_quant()
Disable quantization.
- enable()
Enable the module.
- enable_calib()
Enable calibration.
- enable_clip()
Enable clip stage.
- enable_quant()
Enable quantization.
- export_amax()
Export correctly formatted/shaped amax.
- Return type:
Tensor | None
- extra_repr()
Set the extra information about this module.
- property fake_quant
Return True if fake quantization is used.
- forward(inputs)
Apply tensor_quant function to inputs.
- Parameters:
inputs – A Tensor of type float32/float16/bfloat16.
- Returns:
A Tensor of type output_dtype
- Return type:
outputs
- get_modelopt_state(properties_only=False)
Get meta state to be saved in checkpoint.
If properties_only is True, only the quantizer properties such as num_bits, axis etc are included. For restoring the quantizer fully, use properties_only=False.
- Parameters:
properties_only (bool) –
- Return type:
Dict[str, Any]
- init_learn_amax()
Initialize learned amax from fixed amax.
- property is_enabled
Return true if the modules is not disabled.
- property is_mx_format
Check if is MX formats.
- load_calib_amax(*args, **kwargs)
Load amax from calibrator.
Updates the amax buffer with value computed by the calibrator, creating it if necessary.
*args
and**kwargs
are directly passed tocompute_amax
, except"strict"
inkwargs
. Refer tocompute_amax
for more details.
- property maxbound
Return maxbound for quantization.
- property narrow_range
Return True if symmetric integer range for signed quantization is used.
- property num_bits
Return num_bits for quantization.
- property pre_quant_scale
Return pre_quant_scale used for smoothquant.
- reset_amax()
Reset amax to None.
- set_from_attribute_config(attribute_cfg)
Set quantizer attributes from attribute_dict.
The attributes are defined in
QuantizerAttributeConfig
.- Parameters:
attribute_cfg (QuantizerAttributeConfig | Dict) –
- set_from_modelopt_state(modelopt_state, prefix='')
Set meta state from checkpoint.
- property step_size
Return step size for integer quantization.
- sync_amax_across_distributed_group(parallel_group)
Synchronize the amax across all ranks in the given group.
- Parameters:
parallel_group (DistributedProcessGroup) –
- property trt_high_precision_dtype
Return True if FP16 AMAX is used when exporting the model.
- property unsigned
Return True if unsigned quantization is used.