fp8_tensor
Implements FP8 quantization for efficient tensor storage and computation.
Classes
Implements the FP8 quantization on tensors for more efficient storage or computation. |
- class FP8QTensor
Bases:
BaseQuantizedTensor
Implements the FP8 quantization on tensors for more efficient storage or computation.
- quantized_data
The quantized data stored as a packed fp8 tensor.
- Type:
torch.Tensor
- dequantize(dtype=None, **kwarg)
Dequantze FP8 packed tensor to a target dtype.
- Parameters:
dtype (dtype) –
- classmethod quantize(input, scales=None, axis=None, block_sizes=None)
Converting a tensor to a quantized format based on FP8 quantization. Only E4M3 is supported.
- Parameters:
input (torch.Tensor) – The input tensor to be quantized.
scales (torch.Tensor) – The scales for quantization.
axis (tuple | int | None) – The dimensions to reduce for quantization. None or int or tuple of ints.
block_sizes (dict) – A dictionary specifying the block size for each dimension.
- Return type:
tuple
Note: One can only provide axis or block_sizes for FP8 quantization.
- Returns:
FP8QTensor, scales
- Return type:
tuple
- Parameters:
input (Tensor) –
scales (Tensor) –
axis (tuple | int | None) –
block_sizes (dict) –