Fp4 Quantize#

void trt_edgellm::kernel::fp4Quantize(
rt::Tensor const&,
rt::Tensor const&,
rt::Tensor&,
rt::Tensor&,
cudaStream_t
)#

Quantize BF16/FP16 to packed FP4 with swizzled FP8 E4M3 scale factors. SM100+: hardware E2M1 conversion. Pre-SM100: software fallback. input shape [M, N]: M must be multiple of 128, N must be multiple of 16. input dataType must be kBF16 or kHALF.

globalSF is the forward-direction activation global scale (e.g. max|x|/(448*6)); the reciprocal consumed by the FP4 mapping is computed inside the kernel via a single IEEE divide per thread.