v_qdqļ
Value-operand (V) quant-dequant helper for flash attention.
The in-kernel counterpart of v_bmm_quantizer: _v_qdq_nvfp4() fake-quantizes
the V operand of the P @ V matmul (BMM2). It is the sibling of
attention/p_qdq._p_qdq_nvfp4 (the P operand) ā both block 16 along the key
dimension, the contraction axis of P @ V, which a per-token cache write cannot
produce. Vās keys axis is axis 0 of the loaded tile and V is signed, so unlike P its
block amax uses abs. Called under the V_QDQ constexpr guard from the baseline
flash-attention kernel (common/attention/triton_fa.py) and the paged decode kernel
(common/attention/decode_attention.py); per-tensor FP8 uses
quantization/common/fp8_quant.fp8_scalar_qdq directly.