Rope Cache#
-
class RopeCache#
Pool of immutable RoPE cos/sin cache tensors with automatic deduplication.
getOrCreate(config, ...)returns a reference to an existing GPU tensor if the same RoPE configuration has been seen before, otherwise creates and caches a new one. This saves GPU memory when multiple engines share the same RoPE parameters (e.g. base + draft in EAGLE).Implementation note: entries are stored in a
std::dequeso that previously returnedTensor&references remain stable when new entries are added.Public Functions
-
RopeCache() = default#
Default constructor.
-
RopeCache(RopeCache const&) = delete#
Deleted copy to avoid accidental duplication of GPU resources.
- rt::Tensor &getOrCreate(
- RopeConfig const &config,
- int32_t rotaryDim,
- int32_t maxSeqLen,
- cudaStream_t stream
Obtain (or create) a RoPE cos/sin cache tensor for the given configuration.
- Parameters:
config – RoPE configuration (type, theta, scale, maxPositionEmbeddings, …).
rotaryDim – Number of dimensions that undergo rotation.
maxSeqLen – Maximum sequence length for the cache.
stream – CUDA stream used when a new tensor must be initialized.
- Returns:
Reference to the cached GPU tensor (stable across future calls).
-
size_t size() const noexcept#
Return the number of cached entries.
-
RopeCache() = default#