nvfp4_quant
Composable Triton JIT functions for NVFP4 (E2M1) fake quantization.
- Single source of truth for FP4 decision-boundary rounding. Used by:
../gemm/fp4_kernel.py(standalone blockwise fake quant)../gemm/fp4_kernel_hopper.py(Hopper block-pointer variant)../gemm/gptq_fused_kernel.py(fused GPTQ scalar path)../attention/p_qdq.py(softmax-P qdq in the flash-attention kernel)
FP4 (E2M1) representable magnitudes: {0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0}