mxfp4_tensor
Implements MXFP4 quantization for efficient tensor storage and computation.
Classes
Implements the MXFP4 quantization on tensors for more efficient storage or computation. |
- class MXFP4QTensor
Bases:
BaseQuantizedTensor
Implements the MXFP4 quantization on tensors for more efficient storage or computation.
- quantized_data
The quantized data stored as a packed fp8 tensor.
- Type:
torch.Tensor
- E2M1_bounds = tensor([0.2500, 0.7500, 1.2500, 1.7500, 2.5000, 3.5000, 5.0000])
- E2M1_max = 6.0
- E2M1_values = [0, 0.5, 1, 1.5, 2, 3, 4, 6]
- dequantize(dtype=None, **kwarg)
Dequantze MXFP4 packed tensor to a target dtype.
- Parameters:
dtype (dtype)
- classmethod quantize(input, block_size)
Converting a tensor to a quantized format based on MXFP4 quantization. Only E4M3 is supported.
- Parameters:
input (torch.Tensor) – The input tensor to be quantized.
block_sizes (dict | None) – The block sizes for quantization.
block_size (int | None)
- Return type:
tuple