fp4_kernel

NVFP4 Fake Quantization Triton Implementation.

This module provides high-performance GPU implementations of NVFP4 fake quantization operations using Triton kernels.

Functions

Applies FP4 fake quantization on the input tensor.

fp4_fake_quant_block(x, global_amax, block_size=16, tile_size=128)

Applies FP4 fake quantization on the input tensor.

Parameters:

Returns:

Quantized tensor of the same shape as input

Return type:

torch.Tensor