Fp4 Quantize#
- void trt_edgellm::kernel::fp4Quantize( )#
Quantize BF16/FP16 to packed FP4 with swizzled FP8 E4M3 scale factors. SM100+: hardware E2M1 conversion. Pre-SM100: software fallback. input shape [M, N]: M must be multiple of 128, N must be multiple of 16. input dataType must be kBF16 or kHALF.
globalSFis the forward-direction activation global scale (e.g.max|x|/(448*6)); the reciprocal consumed by the FP4 mapping is computed inside the kernel via a single IEEE divide per thread.