nvfp4_tensor

Implements NVFP4 quantization for efficient tensor storage and computation.

Classes

NVFP4QTensor

Implements the INT4 quantization on tensors for more efficient storage or computation.

class NVFP4QTensor

Bases: BaseQuantizedTensor

Implements the INT4 quantization on tensors for more efficient storage or computation.

quantized_data

The quantized data stored as a packed uint8 tensor.

Type:

torch.Tensor

dequantize(dtype=None, **kwarg)

Dequantze NVFP4 packed tensor to a target dtype.

Parameters:

dtype (dtype)

e2m1_values_on_device = {}
classmethod get_activation_scaling_factor(quantizer)

Returns the activation scaling factor for export.

classmethod get_e2m1_values(device)

Returns the e2m1 values on the device.

classmethod get_weights_scaling_factor(input, block_size, weights_scaling_factor_2=None, keep_high_precision=False)

Returns quantized per block weight scaling factor.

Parameters:
  • input (Tensor)

  • block_size (int)

  • weights_scaling_factor_2 (Tensor | None)

  • keep_high_precision (bool)

classmethod get_weights_scaling_factor_2(input)

Returns per tensor weight scaling factor.

Parameters:

input (Tensor)

classmethod get_weights_scaling_factor_2_from_quantizer(weight_quantizer)

Returns per tensor weight scaling factor from the weight_quantizer amax.

classmethod quantize(input, block_size, weights_scaling_factor=None, weights_scaling_factor_2=None, keep_high_precision=False, try_tensorrt=False)

Converting a tensor to a quantized format based on NVFP4 quantization.

Parameters:
  • input (torch.Tensor) – The input tensor to be quantized.

  • block_size (int) – The size of each block for quantization.

  • weights_scaling_factor (torch.Tensor) – The scaling factor for the weights.

  • weights_scaling_factor_2 (torch.Tensor) – The scaling factor for the weights.

  • keep_high_precision (bool) – Whether to keep output scales at high precision.

  • try_tensorrt (bool)

Returns: tuple: Contains quantized data, quantized per block scaling factor, and per tensor scaling factor.