KV Cache Utils Kernels#

Warning

doxygenfunction: Unable to resolve function “trt_edgellm::kernel::incrementLengthTensor” with arguments None in doxygen xml output for project “TensorRT Edge-LLM” from directory: ../cpp_docs/xml. Potential matches:

- void incrementLengthTensor(rt::Tensor &lengthTensor, int32_t increment, cudaStream_t stream)
- void incrementLengthTensor(rt::Tensor &lengthTensor, rt::Tensor const &newIncrementTensor, cudaStream_t stream)

void trt_edgellm::kernel::instantiateKVCacheFromTensor( rt::Tensor &dstKVCacheBuffer, rt::Tensor const &srcKVCacheTensor, int32_t batchIdx, cudaStream_t stream )#

Instantiate the KVCache from a pre-computed KVCache tensor.

Helper function to instantiate the KVCache from a pre-computed KVCache tensor. Used to support KVCache reuse across multiple inference requests to speedup prefill step.

Parameters:

dstKVCacheBuffer – [inout] The KVCache buffer to be instantiated. Layout: [numDecoderLayers, maxBatchSize, 2, numKVHeads, maxSequenceLength, headDim]
srcKVCacheTensor – [in] The pre-computed KVCache tensor. Layout: [numDecoderLayers, 2, numKVHeads, sequenceLength, headDim]
batchIdx – [in] The batch index of the KVCache to be instantiated
stream – [in] The CUDA stream to be used

void trt_edgellm::kernel::saveKVCacheIntoTensor( rt::Tensor &dstKVCacheTensor, rt::Tensor const &srcKVCacheBuffer, int32_t batchIdx, cudaStream_t stream )#

Save the KVCache into a tensor.

Helper function to save the KVCache into a tensor. Used to support KVCache reuse across multiple inference requests to speedup prefill step. SequenceLength of dstKVCacheTensor must be saved from the srcKVCacheBuffer.

Parameters:

dstKVCacheTensor – [out] The KVCache tensor to be saved. Layout: [numDecoderLayers, 2, numKVHeads, sequenceLength, headDim]
srcKVCacheBuffer – [in] The KVCache buffer to be saved. Layout: [numDecoderLayers, maxBatchSize, 2, numKVHeads, maxSequenceLength, headDim]
batchIdx – [in] The batch index of the KVCache to be saved
stream – [in] The CUDA stream to be used