Batch Evict Kernels#

void trt_edgellm::kernel::compactKVCache(
rt::Tensor const &batchMapping,
rt::Tensor &kvCacheBuffer,
rt::Tensor &kvCacheLengths,
int32_t oldActiveBatch,
int32_t newActiveBatch,
cudaStream_t stream
)#

Compact KV Cache by removing evicted batches.

This kernel moves KV Cache data for active batches to dense consecutive positions.

Note

This function updates kvCacheBuffer and kvCacheLengths in-place with compacted values

Parameters:
  • batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1 (evict)

  • kvCacheBuffer – [numLayers, maxBatch, 2 (K/V), numHeads, maxSeq, headDim] (input/output)

  • kvCacheLengths – [maxBatch] (input/output), compacted in-place

  • oldActiveBatch – Number of batches before eviction

  • newActiveBatch – Number of batches after eviction

  • stream – CUDA stream

void trt_edgellm::kernel::compactTensorBatch(
rt::Tensor const &src,
rt::Tensor const &batchMapping,
rt::Tensor &dst,
int32_t oldActiveBatch,
int32_t newActiveBatch,
cudaStream_t stream
)#

Generic tensor compaction along batch dimension.

This kernel compacts a tensor by removing evicted batches.

Note

Assumes batch dimension is the first dimension (dim 0)

Note

For in-place operation, pass the same tensor as both src and dst

Parameters:
  • src – Source tensor (const input)

  • batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1

  • dst – Destination tensor (output, can be same as src for in-place operation)

  • oldActiveBatch – Number of batches before eviction

  • newActiveBatch – Number of batches after eviction

  • stream – CUDA stream