v_qdq
Value-operand NVFP4 helpers for flash attention.
Functions
NVFP4-finalize complete block-16 groups in |
- fake_quant_v_onwrite(v_cache, block_table, v_lo, v_hi, *, max_new_tokens, page_size=16, v_qdq_scale=1.0, decode=False)
NVFP4-finalize complete block-16 groups in
[v_lo, v_hi)in place.max_new_tokensis host metadata used to size the masked launch grid. Decode uses one fixed group per request. Eager prefill covers every group that the largest query chunk can complete without reading device metadata.v_loandv_himust describe aligned, completed block-16 boundaries; their device values are not host-validated.- Parameters:
v_cache (Tensor)
block_table (Tensor)
v_lo (Tensor)
v_hi (Tensor)
max_new_tokens (int)
page_size (int)
v_qdq_scale (float)
decode (bool)
- Return type:
None