attention

Modules

modelopt.torch.kernels.quantization.attention.p_qdq

Softmax-P quant-dequant helpers for the unified flash attention kernel.

Quantization-specific attention kernel pieces.

p_qdq.py holds the softmax-P (p_bmm_quantizer) quant-dequant @triton.jit helpers invoked by the unified flash-attention kernel in common/attention/triton_fa.py under its P_QDQ constexpr guard. Only NVFP4 needs a P-specific helper (tiling and block-amax policy on top of quantization/common/nvfp4_quant.py); the FP8 mode uses quantization/common/fp8_quant.fp8_scalar_qdq directly.