fp4_kernel

NVFP4 Fake Quantization Triton Implementation.

This module provides high-performance GPU implementations of NVFP4 fake quantization operations using Triton kernels.

Functions

fp4_fake_quant_block

Applies FP4 fake quantization on the input tensor.

fp4_fake_quant_block(x, global_amax, block_size=16, tile_size=128)

Applies FP4 fake quantization on the input tensor.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (M, N)

  • global_scale (float) – Global scaling factor

  • block_size (int) – Size of FP4 quantization blocks

  • tile_size (int) – Size of processing blocks

  • global_amax (float) –

Returns:

Quantized tensor of the same shape as input

Return type:

torch.Tensor