int8_tensor
Implements INT8 quantization for efficient tensor storage and computation.
Classes
Implements the INT8 quantization on tensors for more efficient storage or computation. |
- class INT8QTensor
Bases:
BaseQuantizedTensor
Implements the INT8 quantization on tensors for more efficient storage or computation.
- quantized_data
The quantized data stored as an INT8 tensor.
- Type:
torch.Tensor
- dequantize(dtype=None, **kwarg)
Dequantize INT8 packed tensor to a target dtype.
- Parameters:
dtype (dtype)
- classmethod quantize(input, scales=None, axis=None, block_sizes=None)
Converting a tensor to a quantized format based on INT8 quantization.
- Parameters:
input (torch.Tensor) – The input tensor to be quantized.
scales (torch.Tensor) – The scales for quantization.
axis (tuple | int | None) – The dimensions to reduce for quantization. None or int or tuple of ints.
block_sizes (dict) – A dictionary specifying the block size for each dimension.
- Return type:
tuple
Note: One can only provide axis or block_sizes for INT8 quantization.
- Returns:
INT8QTensor, scales
- Return type:
tuple
- Parameters:
input (Tensor)
scales (Tensor | None)
axis (tuple | int | None)
block_sizes (dict | None)