mxfp4_tensor

Implements MXFP4 quantization for efficient tensor storage and computation.

Classes

Implements the MXFP4 quantization on tensors for more efficient storage or computation.

class MXFP4QTensor

Bases: BaseQuantizedTensor

Implements the MXFP4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed fp8 tensor.

E2M1_bounds = tensor([0.2500, 0.7500, 1.2500, 1.7500, 2.5000, 3.5000, 5.0000])

dequantize(dtype=None, **kwarg)

Dequantze MXFP4 packed tensor to a target dtype.

classmethod quantize(input, block_size)

Converting a tensor to a quantized format based on MXFP4 quantization. Only E4M3 is supported.

Parameters:

Return type:

tuple