fp4_kernel
NVFP4 Fake Quantization Triton Implementation.
This module provides high-performance GPU implementations of NVFP4 fake quantization operations using Triton kernels.
Functions
Applies FP4 fake quantization on the input tensor. |
- fp4_fake_quant_block(x, global_amax, block_size=16, tile_size=128)
Applies FP4 fake quantization on the input tensor.
- Parameters:
x (torch.Tensor) – Input tensor of shape (M, N)
global_scale (float) – Global scaling factor
block_size (int) – Size of FP4 quantization blocks
tile_size (int) – Size of processing blocks
global_amax (float) –
- Returns:
Quantized tensor of the same shape as input
- Return type:
torch.Tensor