mxfp4_tensor

Implements MXFP4 quantization for efficient tensor storage and computation.

Classes

MXFP4QTensor

Implements the MXFP4 quantization on tensors for more efficient storage or computation.

class MXFP4QTensor

Bases: BaseQuantizedTensor

Implements the MXFP4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed fp8 tensor.

Type:

torch.Tensor

E2M1_bounds = tensor([0.2500, 0.7500, 1.2500, 1.7500, 2.5000, 3.5000, 5.0000])
E2M1_max = 6.0
E2M1_values = [0, 0.5, 1, 1.5, 2, 3, 4, 6]
dequantize(dtype=None, **kwarg)

Dequantze MXFP4 packed tensor to a target dtype.

Parameters:

dtype (dtype)

classmethod quantize(input, block_size)

Converting a tensor to a quantized format based on MXFP4 quantization. Only E4M3 is supported.

Parameters:
  • input (torch.Tensor) – The input tensor to be quantized.

  • block_sizes (dict | None) – The block sizes for quantization.

  • block_size (int | None)

Return type:

tuple