Batch Evict Kernels#
- void trt_edgellm::kernel::compactKVCache(
- rt::Tensor const &batchMapping,
- rt::Tensor &kvCacheBuffer,
- rt::Tensor &kvCacheLengths,
- int32_t oldActiveBatch,
- int32_t newActiveBatch,
- cudaStream_t stream
Compact KV Cache by removing evicted batches.
This kernel moves KV Cache data for active batches to dense consecutive positions.
Note
This function updates kvCacheBuffer and kvCacheLengths in-place with compacted values
- Parameters:
batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1 (evict)
kvCacheBuffer – [numLayers, maxBatch, 2 (K/V), numHeads, maxSeq, headDim] (input/output)
kvCacheLengths – [maxBatch] (input/output), compacted in-place
oldActiveBatch – Number of batches before eviction
newActiveBatch – Number of batches after eviction
stream – CUDA stream
- void trt_edgellm::kernel::compactTensorBatch(
- rt::Tensor const &src,
- rt::Tensor const &batchMapping,
- rt::Tensor &dst,
- int32_t oldActiveBatch,
- int32_t newActiveBatch,
- cudaStream_t stream
Generic tensor compaction along batch dimension.
This kernel compacts a tensor by removing evicted batches.
Note
Assumes batch dimension is the first dimension (dim 0)
Note
For in-place operation, pass the same tensor as both src and dst
- Parameters:
src – Source tensor (const input)
batchMapping – [oldActiveBatch] GPU tensor (const input), mapping[i] = newBatchIdx or -1
dst – Destination tensor (output, can be same as src for in-place operation)
oldActiveBatch – Number of batches before eviction
newActiveBatch – Number of batches after eviction
stream – CUDA stream