Gemma4 Audio Attention#

struct Gemma4AudioAttentionParams#

Attention-window configuration for the Gemma 4 audio-encoder chunked local attention.

Holds only the parameters that are not derivable from the tensor shapes (the batch B, sequence S, head count H, head dim D, and relative-position length P are read from the input tensors’ shapes). Defaults for the Gemma 4 audio tower: chunkSize=12, leftHorizon=12 (= attention_context_left - 1), contextSize=24 (= chunk + left + right), logitCap=50. The kernel is specialized to this exact config (see gemma4AudioAttentionForward).

Public Members

int chunkSize#

C (query block size)

int leftHorizon#

L (effective left context, attention_context_left - 1)

int contextSize#

M (gathered K/V context size, chunk + left + right)

float logitCap#

tanh soft-cap on logits