tensor_quantizer

TensorQuantizer Module.

Classes

TensorQuantizer

Tensor quantizer module.

SequentialQuantizer

A sequential container for TensorQuantizer modules.

class SequentialQuantizer

Bases: Sequential

A sequential container for TensorQuantizer modules.

This modules is used to quantize a tensor in multiple formats sequentially. It takes as input TensorQuantizer modules and containerize them similar to torch.nn.Sequential.

Parameters:

quantizers (TensorQuantizer) – TensorQuantizer modules to be added to the container.

__init__(*quantizers)

Initialize SequentialQuantizer module.

Parameters:

quantizers (TensorQuantizer) –

disable()

Disable the quantizer modules.

get_modelopt_state()

Get meta state to be saved in checkpoint.

Return type:

Dict[str, Any]

static replace_sequential_quantizer_with_single_quantizer(model, indx=0)

Replace instances of SequentialQuantizer in the model with single quantizers.

The quantizer indexed by indx from the sequential quantizer is used to replace it. This method is useful for individually calibrating the quantizers in a sequential quantizer.

Parameters:

indx (int) –

set_from_attribute_dict(attributes)

Set the attributes of contained quantizers from a list of attribute_dicts.

Parameters:

attributes (List[Dict[str, Any]]) –

static tensor_quantizer_iterator(quantizers)

Iterator for the quantizers in the container (but yield itself if its a TensorQuantizer).

class TensorQuantizer

Bases: Module

Tensor quantizer module.

This module uses tensor_quant or fake_tensor_quant function to quantize a tensor. And wrappers variable, moving statistics we’d want when training a quantized network.

Experimental features:
  • clip stage learns range before enabling quantization.

  • calib stage runs calibration

Parameters:
  • quant_desc – An instance of QuantDescriptor.

  • disabled – A boolean. If True, by pass the whole module returns input. Default False.

  • if_quant – A boolean. If True, run main quantization body. Default True.

  • if_clip – A boolean. If True, clip before quantization and learn amax. Default False.

  • if_calib – A boolean. If True, run calibration. Not implemented yet. Settings of calibration will probably go to QuantDescriptor.

Readonly Properties:
  • axis:

  • fake_quant:

  • scale:

  • step_size:

Mutable Properties:
  • num_bits:

  • unsigned:

  • amax:

__init__(quant_desc=<modelopt.torch.quantization.tensor_quant.ScaledQuantDescriptor object>, disabled=False, if_quant=True, if_clip=False, if_calib=False)

Initialize quantizer and set up required variables.

property amax

Return amax for quantization.

property axis

Return axis for quantization.

property block_sizes

Return block_sizes for quantization.

clean_up_after_set_from_modelopt_state(prefix='')

Clean up temporary variables created during set_from_modelopt_state.

dequantize(qtensor, dtype)

De-quantize a real quantized tensor to a given dtype.

Parameters:
disable()

Bypass the module.

Neither of calibration, clipping and quantization will be performed if the module is disabled.

disable_calib()

Disable calibration.

disable_clip()

Disable clip stage.

disable_quant()

Disable quantization.

enable()

Enable the module.

enable_calib()

Enable calibration.

enable_clip()

Enable clip stage.

enable_quant()

Enable quantization.

export_amax()

Export correctly formatted/shaped amax.

Return type:

Tensor | None

extra_repr()

Set the extra information about this module.

property fake_quant

Return True if fake quantization is used.

forward(inputs)

Apply tensor_quant function to inputs.

Parameters:

inputs – A Tensor of type float32.

Returns:

A Tensor of type output_dtype

Return type:

outputs

get_modelopt_state()

Get meta state to be saved in checkpoint.

Return type:

Dict[str, Any]

init_learn_amax()

Initialize learned amax from fixed amax.

property is_enabled

Return true if the modules is not disabled.

load_calib_amax(*args, **kwargs)

Load amax from calibrator.

Updates the amax buffer with value computed by the calibrator, creating it if necessary. *args and **kwargs are directly passed to compute_amax, except "strict" in kwargs. Refer to compute_amax for more details.

property maxbound

Return maxbound for quantization.

property narrow_range

Return True if symmetric integer range for signed quantization is used.

property num_bits

Return num_bits for quantization.

property pre_quant_scale

Return pre_quant_scale used for smoothquant.

reset_amax()

Reset amax to None.

property scale

Return scale used for quantization.

set_from_attribute_dict(attribute_dict)

Set quantizer attributes from attribute_dict.

Parameters:

attribute_dict (Dict[str, Any]) –

set_from_modelopt_state(modelopt_state, prefix='')

Set meta state from checkpoint.

property step_size

Return step size for integer quantization.

sync_amax_across_distributed_group(parallel_group)

Synchronize the amax across all ranks in the given group.

Parameters:

parallel_group (DistributedProcessGroup) –

property unsigned

Return True if unsigned quantization is used.