Batch Evict Kernels#

void trt_edgellm::kernel::compactKVCache( rt::Tensor const &batchMapping, rt::Tensor &kvCacheBuffer, rt::Tensor &kvCacheLengths, int32_t oldActiveBatch, int32_t newActiveBatch, cudaStream_t stream )#

Compact KV Cache by removing evicted batches.

This kernel moves KV Cache data for active batches to dense consecutive positions.

Note

This function updates kvCacheBuffer and kvCacheLengths in-place with compacted values

Parameters:

batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1 (evict)
kvCacheBuffer – [numLayers, maxBatch, 2 (K/V), numHeads, maxSeq, headDim] (input/output)
kvCacheLengths – [maxBatch] (input/output), compacted in-place
oldActiveBatch – Number of batches before eviction
newActiveBatch – Number of batches after eviction
stream – CUDA stream

void trt_edgellm::kernel::compactTensorBatch( rt::Tensor const &src, rt::Tensor const &batchMapping, rt::Tensor &dst, int32_t oldActiveBatch, int32_t newActiveBatch, cudaStream_t stream )#

Generic tensor compaction along batch dimension.

This kernel compacts a tensor by removing evicted batches.

Note

Assumes batch dimension is the first dimension (dim 0)

Note

For in-place operation, pass the same tensor as both src and dst

Parameters:

src – Source tensor (const input)
batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1
dst – Destination tensor (output, can be same as src for in-place operation)
oldActiveBatch – Number of batches before eviction
newActiveBatch – Number of batches after eviction
stream – CUDA stream