Embedding Kernels#
- void trt_edgellm::kernel::embeddingLookup(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::Tensor &output,
- cudaStream_t stream
Standard embedding lookup kernel.
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Embedding table with shape [vocabSize, hiddenSize]
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- void trt_edgellm::kernel::embeddingLookupWithImageInsertion(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::Tensor const &imageEmbeds,
- rt::Tensor &output,
- cudaStream_t stream
Embedding lookup with image embedding insertion following PromptTuningEmbedding logic.
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Embedding table with shape [vocabSize, hiddenSize]
imageEmbeds – [in] Image embeddings with shape [imageTokenLen, hiddenSize]
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- void trt_edgellm::kernel::assembleDeepstackEmbedding(
- rt::Tensor const &inputIds,
- rt::Tensor const &deepstackFeatures,
- int32_t vocabSize,
- rt::Tensor &deepstackEmbeds,
- cudaStream_t stream
Assemble deepstack embeddings by extracting image token embeddings from deepstack features.
This function processes input token IDs and selectively extracts embeddings for image tokens from the provided deepstack features. Image tokens are identified by token IDs >= vocabSize. Regular text tokens (IDs < vocabSize) are assigned zero embeddings. This is typically used in multi-modal models where deepstack visual features need to be combined with text embeddings.
Token ID Mapping:
Token ID < vocabSize: Zero embedding (text tokens are handled separately)
Token ID >= vocabSize: Extract from deepstackFeatures at index (tokenId - vocabSize)
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
deepstackFeatures – [in] Deepstack image features with shape [numImageTokens, hiddenSize]
vocabSize – [in] Vocabulary size threshold for distinguishing text vs image tokens
deepstackEmbeds – [out] Output embeddings with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- void trt_edgellm::kernel::embeddingLookupMultimodal(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::OptionalInputTensor multimodalIndices,
- std::optional<int32_t> imageTokenId,
- rt::OptionalInputTensor imageEmbeds,
- std::optional<int32_t> audioTokenId,
- rt::OptionalInputTensor audioEmbeds,
- rt::Tensor &output,
- cudaStream_t stream
Embedding lookup with optional image and audio embeddings for multimodal models.
This kernel handles up to three types of tokens:
Normal text tokens (0 <= tokenId < vocabSize): lookup from embeddingTable
Image tokens (tokenId == imageTokenId): lookup from imageEmbeds using multimodalIndices (optional)
Audio tokens (tokenId == audioTokenId): lookup from audioEmbeds using multimodalIndices (optional)
The multimodalIndices provides pre-computed indices into audioEmbeds/imageEmbeds for each position. For text tokens, the multimodalIndices value is not used. To indicate the presence of a modality, both token ID and the corresponding embedding tensor must be provided.
Note
audioTokenId and imageTokenId are allowed to be smaller than vocabSize, as in the case of Qwen3.
Note
Embeddings should contain data in the order specified by multimodalIndices
Note
When a modality is not needed, pass std::nullopt for both its tokenId and embeds
Note
multimodalIndices can be std::nullopt only when both imageEmbeds and audioEmbeds are std::nullopt
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Text embedding table with shape [vocabSize, hiddenSize]
multimodalIndices – [in] Pre-computed indices for audio/image embeddings [batchSize, seqLen], can be std::nullopt if no image/audio inputs are provided
imageTokenId – [in] Special token ID for image (e.g., 151655 in Qwen3), or std::nullopt if no image
imageEmbeds – [in] Image embeddings with shape [totalImageTokens, hiddenSize], or std::nullopt if no image
audioTokenId – [in] Special token ID for audio (e.g., 151675 in Qwen3), or std::nullopt if no audio
audioEmbeds – [in] Audio embeddings with shape [totalAudioTokens, hiddenSize], or std::nullopt if no audio
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution