attention

Modules

modelopt.torch.kernels.quantization.attention.p_qdq

Softmax-P quant-dequant helper for flash attention.

modelopt.torch.kernels.quantization.attention.v_qdq

Value-operand (V) quant-dequant helper for flash attention.

Quantization-specific attention kernel pieces.

p_qdq.py holds the softmax-P (p_bmm_quantizer) quant-dequant @triton.jit helpers invoked by the unified flash-attention kernel in common/attention/triton_fa.py under its P_QDQ constexpr guard. Only NVFP4 needs a P-specific helper (tiling and block-amax policy on top of quantization/common/nvfp4_quant.py); the FP8 mode uses quantization/common/fp8_quant.fp8_scalar_qdq directly.