base_qtensor
Base Class for Real Quantized Tensor.
Classes
Base class for quantized tensors, providing methods for quantization and dequantization. |
|
A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter. |
Functions
Pack real quantized tensors to a compressed format and set proper load_state_dict function. |
- class BaseQuantizedTensor
Bases:
object
Base class for quantized tensors, providing methods for quantization and dequantization.
This class should be subclassed to implement specific types of quantized tensors. It handles the storage of quantized data along with the necessary configurations and original attributes.
- original_meta_tensor
Original meta to keep attributes of original tensors.
- Type:
torch.Tensor
- quantized_data
Storage for the quantized tensor data. Quantized_data dtype is customized per QuantizedTensor implementation.
- Type:
torch.Tensor
- __init__(original_shape, original_dtype, quantized_data)
Initialize data attributes.
- Parameters:
original_shape (Size) –
original_dtype (dtype) –
quantized_data (Tensor) –
- dequantize(dtype=None, **kwarg)
Converts the quantized tensor back to a standard torch.Tensor.
- Returns:
The dequantized tensor.
- Return type:
torch.Tensor
- classmethod quantize(input, block_size)
Pack a fake torch.Tensor into a real quantized tensor.
- Parameters:
fake_quant_tensor (torch.Tensor) – The fake quantized tensor.
input (Tensor) –
block_size (int) –
- Returns:
A real quantized tensor, scales.
- class QTensorWrapper
Bases:
Parameter
A wrapper class for quantized tensors to make them compatible with torch.nn.Parameter.
- Parameters:
qtensor (BaseQuantizedTensor) – The quantized tensor to be wrapped.
- static __new__(cls, qtensor)
Create a new QTensorWrapper instance.
- Parameters:
qtensor (BaseQuantizedTensor) –
- dim()
Return the number of dimensions of the meta_tensor.
- get_qtensor()
Get the quantized tensor class from QTensorWrapper.
- to(*args, **kwargs)
Override the to method to move real quantized tensors to the specified device.
- pack_real_quantize_weight(module, force_quantize=False)
Pack real quantized tensors to a compressed format and set proper load_state_dict function.
- Parameters:
force_quantize (bool) –