base_qtensor

Base Class for Real Quantized Tensor.

Classes

`BaseQuantizedTensor`	Base class for quantized tensors, providing methods for quantization and dequantization.
`QTensorWrapper`	A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter.

Functions

pack_real_quantize_weight

Pack real quantized tensors to a compressed format and set proper load_state_dict function.

class BaseQuantizedTensor

Bases: object

Base class for quantized tensors, providing methods for quantization and dequantization.

This class should be subclassed to implement specific types of quantized tensors. It handles the storage of quantized data along with the necessary configurations and original attributes.

original_meta_tensor

Original meta to keep attributes of original tensors.

Type:: torch.Tensor

quantized_data

Storage for the quantized tensor data. Quantized_data dtype is customized per QuantizedTensor implementation.

Type:: torch.Tensor

__init__(original_shape, original_dtype, quantized_data)

Initialize data attributes.

Parameters:

original_shape (Size)
original_dtype (dtype)
quantized_data (Tensor)

dequantize(dtype=None, **kwarg)

Converts the quantized tensor back to a standard torch.Tensor.

Returns:: The dequantized tensor.
Return type:: torch.Tensor
Parameters:: dtype (Tensor)

classmethod quantize(input, block_size)

Pack a fake torch.Tensor into a real quantized tensor.

Parameters:

fake_quant_tensor (torch.Tensor) – The fake quantized tensor.
input (Tensor)
block_size (int)

Returns:

A real quantized tensor, scales.

class QTensorWrapper

Bases: Parameter

A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter.

Parameters:: qtensor (BaseQuantizedTensor) – The quantized tensor to be wrapped.

static __new__(cls, qtensor, metadata=None)

Create a new QTensorWrapper instance.

Parameters:

qtensor (BaseQuantizedTensor | Tensor)
metadata (dict | None)

dim(): Return the number of dimensions of the meta_tensor.

get_qtensor(): Get the quantized tensor class from QTensorWrapper.

get_state(): Get the state of the QTensorWrapper.

to(*args, **kwargs): Override the to method to move real quantized tensors to the specified device.

pack_real_quantize_weight(module, force_quantize=False)

Pack real quantized tensors to a compressed format and set proper load_state_dict function.

Parameters:: force_quantize (bool)