v_qdq

Value-operand (V) quant-dequant helper for flash attention.

The in-kernel counterpart of v_bmm_quantizer: _v_qdq_nvfp4() fake-quantizes the V operand of the P @ V matmul (BMM2). It is the sibling of attention/p_qdq._p_qdq_nvfp4 (the P operand) — both block 16 along the key dimension, the contraction axis of P @ V, which a per-token cache write cannot produce. V’s keys axis is axis 0 of the loaded tile and V is signed, so unlike P its block amax uses abs. Called under the V_QDQ constexpr guard from the baseline flash-attention kernel (common/attention/triton_fa.py) and the paged decode kernel (common/attention/decode_attention.py); per-tensor FP8 uses quantization/common/fp8_quant.fp8_scalar_qdq directly.