Fp4 Quantize#

void trt_edgellm::kernel::fp4Quantize( rt::Tensor const&, rt::Tensor const&, rt::Tensor&, rt::Tensor&, cudaStream_t )#

Quantize BF16/FP16 to packed FP4 with swizzled FP8 E4M3 scale factors. SM100+: hardware E2M1 conversion. Pre-SM100: software fallback. input shape [M, N]: M must be multiple of 128, N must be multiple of 16. input dataType must be kBF16 or kHALF.

globalSF is the forward-direction activation global scale (e.g. max|x|/(448*6)); the reciprocal consumed by the FP4 mapping is computed inside the kernel via a single IEEE divide per thread.