fp8_quant
Composable Triton JIT functions for FP8 (E4M3) fake quantization.
Counterpart of nvfp4_quant.py for per-tensor FP8. Used by the unified
flash-attention kernel’s softmax-P qdq (common/attention/triton_fa.py).
Composable Triton JIT functions for FP8 (E4M3) fake quantization.
Counterpart of nvfp4_quant.py for per-tensor FP8. Used by the unified
flash-attention kernel’s softmax-P qdq (common/attention/triton_fa.py).