Rope Cache#

class RopeCache#

Pool of immutable RoPE cos/sin cache tensors with automatic deduplication.

getOrCreate(config, ...) returns a reference to an existing GPU tensor if the same RoPE configuration has been seen before, otherwise creates and caches a new one. This saves GPU memory when multiple engines share the same RoPE parameters (e.g. base + draft in EAGLE).

Implementation note: entries are stored in a std::deque so that previously returned Tensor& references remain stable when new entries are added.

Public Functions

RopeCache() = default#

Default constructor.

RopeCache(RopeCache const&) = delete#

Deleted copy to avoid accidental duplication of GPU resources.

RopeCache &operator=(RopeCache const&) = delete#
RopeCache(RopeCache&&) noexcept = default#

Allow move.

RopeCache &operator=(RopeCache&&) noexcept = default#
rt::Tensor &getOrCreate(
RopeConfig const &config,
int32_t rotaryDim,
int32_t maxSeqLen,
cudaStream_t stream
)#

Obtain (or create) a RoPE cos/sin cache tensor for the given configuration.

Parameters:
  • config – RoPE configuration (type, theta, scale, maxPositionEmbeddings, …).

  • rotaryDim – Number of dimensions that undergo rotation.

  • maxSeqLen – Maximum sequence length for the cache.

  • stream – CUDA stream used when a new tensor must be initialized.

Returns:

Reference to the cached GPU tensor (stable across future calls).

size_t size() const noexcept#

Return the number of cached entries.