KV Cache Utils Kernels#
Warning
doxygenfunction: Unable to resolve function “trt_edgellm::kernel::incrementLengthTensor” with arguments None in doxygen xml output for project “TensorRT Edge-LLM” from directory: ../cpp_docs/xml. Potential matches:
- void incrementLengthTensor(rt::Tensor &lengthTensor, int32_t increment, cudaStream_t stream)
- void incrementLengthTensor(rt::Tensor &lengthTensor, rt::Tensor const &newIncrementTensor, cudaStream_t stream)
- void trt_edgellm::kernel::instantiateKVCacheFromTensor(
- rt::Tensor &dstKVCacheBuffer,
- rt::Tensor const &srcKVCacheTensor,
- int32_t batchIdx,
- cudaStream_t stream
Instantiate the KVCache from a pre-computed KVCache tensor.
Helper function to instantiate the KVCache from a pre-computed KVCache tensor. Used to support KVCache reuse across multiple inference requests to speedup prefill step.
- Parameters:
dstKVCacheBuffer – [inout] The KVCache buffer to be instantiated. Layout: [numDecoderLayers, maxBatchSize, 2, numKVHeads, maxSequenceLength, headDim]
srcKVCacheTensor – [in] The pre-computed KVCache tensor. Layout: [numDecoderLayers, 2, numKVHeads, sequenceLength, headDim]
batchIdx – [in] The batch index of the KVCache to be instantiated
stream – [in] The CUDA stream to be used
- void trt_edgellm::kernel::saveKVCacheIntoTensor(
- rt::Tensor &dstKVCacheTensor,
- rt::Tensor const &srcKVCacheBuffer,
- int32_t batchIdx,
- cudaStream_t stream
Save the KVCache into a tensor.
Helper function to save the KVCache into a tensor. Used to support KVCache reuse across multiple inference requests to speedup prefill step. SequenceLength of dstKVCacheTensor must be saved from the srcKVCacheBuffer.
- Parameters:
dstKVCacheTensor – [out] The KVCache tensor to be saved. Layout: [numDecoderLayers, 2, numKVHeads, sequenceLength, headDim]
srcKVCacheBuffer – [in] The KVCache buffer to be saved. Layout: [numDecoderLayers, maxBatchSize, 2, numKVHeads, maxSequenceLength, headDim]
batchIdx – [in] The batch index of the KVCache to be saved
stream – [in] The CUDA stream to be used