int8_tensor
Implements INT8 quantization for efficient tensor storage and computation.
Classes
| Implements the INT8 quantization on tensors for more efficient storage or computation. | 
- class INT8QTensor
- Bases: - BaseQuantizedTensor- Implements the INT8 quantization on tensors for more efficient storage or computation. - quantized_data
- The quantized data stored as an INT8 tensor. - Type:
- torch.Tensor 
 
 - dequantize(dtype=None, **kwarg)
- Dequantize INT8 packed tensor to a target dtype. - Parameters:
- dtype (dtype) 
 
 - classmethod quantize(input, scales=None, axis=None, block_sizes=None)
- Converting a tensor to a quantized format based on INT8 quantization. - Parameters:
- input (torch.Tensor) – The input tensor to be quantized. 
- scales (torch.Tensor) – The scales for quantization. 
- axis (tuple | int | None) – The dimensions to reduce for quantization. None or int or tuple of ints. 
- block_sizes (dict) – A dictionary specifying the block size for each dimension. 
 
- Return type:
- tuple 
 - Note: One can only provide axis or block_sizes for INT8 quantization. - Returns:
- INT8QTensor, scales 
- Return type:
- tuple 
- Parameters:
- input (Tensor) 
- scales (Tensor | None) 
- axis (tuple | int | None) 
- block_sizes (dict | None)