base_qtensor

Base Class for Real Quantized Tensor.

Classes

BaseQuantizedTensor

Base class for quantized tensors, providing methods for quantization and dequantization.

QTensorWrapper

A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter.

Functions

pack_real_quantize_weight

Pack real quantized tensors to a compressed format and set proper load_state_dict function.

class BaseQuantizedTensor

Bases: object

Base class for quantized tensors, providing methods for quantization and dequantization.

This class should be subclassed to implement specific types of quantized tensors. It handles the storage of quantized data along with the necessary configurations and original attributes.

original_meta_tensor

Original meta to keep attributes of original tensors.

Type:

torch.Tensor

quantized_data

Storage for the quantized tensor data. Quantized_data dtype is customized per QuantizedTensor implementation.

Type:

torch.Tensor

__init__(original_shape, original_dtype, quantized_data)

Initialize data attributes.

Parameters:
  • original_shape (Size) –

  • original_dtype (dtype) –

  • quantized_data (Tensor) –

dequantize(dtype=None, **kwarg)

Converts the quantized tensor back to a standard torch.Tensor.

Returns:

The dequantized tensor.

Return type:

torch.Tensor

classmethod quantize(input, block_size)

Pack a fake torch.Tensor into a real quantized tensor.

Parameters:
  • fake_quant_tensor (torch.Tensor) – The fake quantized tensor.

  • input (Tensor) –

  • block_size (int) –

Returns:

A real quantized tensor, scales.

class QTensorWrapper

Bases: Parameter

A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter.

Parameters:

qtensor (BaseQuantizedTensor) – The quantized tensor to be wrapped.

static __new__(cls, qtensor)

Create a new QTensorWrapper instance.

Parameters:

qtensor (BaseQuantizedTensor) –

dim()

Return the number of dimensions of the meta_tensor.

get_qtensor()

Get the quantized tensor class from QTensorWrapper.

to(*args, **kwargs)

Override the to method to move real quantized tensors to the specified device.

pack_real_quantize_weight(module, force_quantize=False)

Pack real quantized tensors to a compressed format and set proper load_state_dict function.

Parameters:

force_quantize (bool) –