Embedding Kernels#
- void trt_edgellm::kernel::embeddingLookup(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::Tensor &output,
- cudaStream_t stream
Standard embedding lookup kernel.
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Embedding table with shape [vocabSize, hiddenSize]
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- Throws:
std::runtime_error – if tensor shapes or data types are invalid
- void trt_edgellm::kernel::embeddingLookupWithImageInsertion(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::Tensor const &imageEmbeds,
- rt::Tensor &output,
- cudaStream_t stream
Embedding lookup with image embedding insertion following PromptTuningEmbedding logic.
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Embedding table with shape [vocabSize, hiddenSize]
imageEmbeds – [in] Image embeddings with shape [imageTokenLen, hiddenSize]
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- Throws:
std::runtime_error – if tensor shapes or data types are invalid
- void trt_edgellm::kernel::assembleDeepstackEmbedding(
- rt::Tensor const &inputIds,
- rt::Tensor const &deepstackFeatures,
- int32_t vocabSize,
- rt::Tensor &deepstackEmbeds,
- cudaStream_t stream,
- int32_t imageTokenId = 0,
- rt::OptionalInputTensor multimodalIndices = std::nullopt
Assemble deepstack embeddings by extracting image token embeddings from deepstack features.
This function processes input token IDs and selectively extracts embeddings for image tokens from the provided deepstack features. Image tokens are identified in two ways:
Legacy: token IDs >= vocabSize (Qwen2.5-VL where image tokens start at vocabSize)
Explicit: token ID == imageTokenId (Qwen3-Omni where image tokens are within vocab)
When multimodalIndices is provided, it is used to index into deepstackFeatures (required for Qwen3-Omni where all image tokens share the same ID). Otherwise falls back to tokenId - vocabSize.
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
deepstackFeatures – [in] Deepstack image features with shape [numImageTokens, hiddenSize]
vocabSize – [in] Vocabulary size (legacy threshold for image token detection)
imageTokenId – [in] Explicit image token ID (0 = not set, use legacy >= vocabSize detection)
multimodalIndices – [in] Pre-computed indices for image embeddings [batchSize, seqLen], or std::nullopt to use legacy tokenId - vocabSize indexing
deepstackEmbeds – [out] Output embeddings with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution
- Throws:
std::runtime_error – if tensor shapes or data types are invalid
- void trt_edgellm::kernel::embeddingLookupMultimodal(
- rt::Tensor const &inputIds,
- rt::Tensor const &embeddingTable,
- rt::OptionalInputTensor multimodalIndices,
- std::optional<int32_t> imageTokenId,
- rt::OptionalInputTensor imageEmbeds,
- std::optional<int32_t> audioTokenId,
- rt::OptionalInputTensor audioEmbeds,
- rt::Tensor &output,
- cudaStream_t stream
Embedding lookup with optional image and audio embeddings for multimodal models.
This kernel handles up to three types of tokens:
Normal text tokens (0 <= tokenId < vocabSize): lookup from embeddingTable
Image tokens (tokenId == imageTokenId): lookup from imageEmbeds using multimodalIndices (optional)
Audio tokens (tokenId == audioTokenId): lookup from audioEmbeds using multimodalIndices (optional)
The multimodalIndices provides pre-computed indices into audioEmbeds/imageEmbeds for each position. For text tokens, the multimodalIndices value is not used. To indicate the presence of a modality, both token ID and the corresponding embedding tensor must be provided.
Note
audioTokenId and imageTokenId are allowed to be smaller than vocabSize, as in the case of Qwen3.
Note
Embeddings should contain data in the order specified by multimodalIndices
Note
When a modality is not needed, pass std::nullopt for both its tokenId and embeds
Note
multimodalIndices can be std::nullopt only when both imageEmbeds and audioEmbeds are std::nullopt
- Parameters:
inputIds – [in] Input token IDs with shape [batchSize, seqLen]
embeddingTable – [in] Text embedding table with shape [vocabSize, hiddenSize]
multimodalIndices – [in] Pre-computed indices for audio/image embeddings [batchSize, seqLen], can be std::nullopt if no image/audio inputs are provided
imageTokenId – [in] Special token ID for image (e.g., 151655 in Qwen3), or std::nullopt if no image
imageEmbeds – [in] Image embeddings with shape [totalImageTokens, hiddenSize], or std::nullopt if no image
audioTokenId – [in] Special token ID for audio (e.g., 151675 in Qwen3), or std::nullopt if no audio
audioEmbeds – [in] Audio embeddings with shape [totalAudioTokens, hiddenSize], or std::nullopt if no audio
output – [out] Hidden states with shape [batchSize, seqLen, hiddenSize]
stream – [in] CUDA stream for execution