Rope Utils#
-
namespace trt_edgellm
Functions
- inline int64_t getRotaryDim(
- nlohmann::json const &configJson,
- int64_t headDim
Return the RoPE cos/sin cache width for one attention head.
headDimis the full per-head Q/K dimension.rotaryDimis the width of the runtimerope_rotary_cos_sincache consumed by the attention plugin. For the usual partial-RoPE representation, it is also the number of head channels that receive RoPE values; channels outsiderotaryDimbypass rotation in the attention kernel.rope_scalingis the normalized runtime/HuggingFace object that selects the RoPE variant and optional scaling parameters. It is not Gemma-specific: Gemma4 sliding/full RoPE configs, LongRoPE, MRoPE, dynamic scaling, and proportional RoPE all use this common field shape after export normalization.Most RoPE variants use
partial_rotary_factorto shrinkrotaryDimto the rotated prefix of the head. Proportional RoPE is the exception: it keeps a full-head cache (rotaryDim == headDim) and consumespartial_rotary_factorincollectRopeConfig()as the fraction of full-head angle slots that receive non-identity cos/sin values. The inactive slots are materialized as identity values (cos=1,sin=0). The attention plugin does not need a Gemma-specific path for this case because it reads the cache width from the binding shape and already acceptsrotaryDim <= headDim.