Embedding Kernels#

void trt_edgellm::kernel::embeddingLookup(
rt::Tensor const &inputIds,
rt::Tensor const &embeddingTable,
rt::Tensor &output,
cudaStream_t stream
)#

Standard embedding lookup kernel.

Parameters:
  • inputIds[in] Input token IDs with shape [batchSize, seqLen]

  • embeddingTable[in] Embedding table with shape [vocabSize, hiddenSize]

  • output[out] Hidden states with shape [batchSize, seqLen, hiddenSize]

  • stream[in] CUDA stream for execution

void trt_edgellm::kernel::embeddingLookupWithImageInsertion(
rt::Tensor const &inputIds,
rt::Tensor const &embeddingTable,
rt::Tensor const &imageEmbeds,
rt::Tensor &output,
cudaStream_t stream
)#

Embedding lookup with image embedding insertion following PromptTuningEmbedding logic.

Parameters:
  • inputIds[in] Input token IDs with shape [batchSize, seqLen]

  • embeddingTable[in] Embedding table with shape [vocabSize, hiddenSize]

  • imageEmbeds[in] Image embeddings with shape [imageTokenLen, hiddenSize]

  • output[out] Hidden states with shape [batchSize, seqLen, hiddenSize]

  • stream[in] CUDA stream for execution

void trt_edgellm::kernel::assembleDeepstackEmbedding(
rt::Tensor const &inputIds,
rt::Tensor const &deepstackFeatures,
int32_t vocabSize,
rt::Tensor &deepstackEmbeds,
cudaStream_t stream
)#

Assemble deepstack embeddings by extracting image token embeddings from deepstack features.

This function processes input token IDs and selectively extracts embeddings for image tokens from the provided deepstack features. Image tokens are identified by token IDs >= vocabSize. Regular text tokens (IDs < vocabSize) are assigned zero embeddings. This is typically used in multi-modal models where deepstack visual features need to be combined with text embeddings.

Token ID Mapping:

  • Token ID < vocabSize: Zero embedding (text tokens are handled separately)

  • Token ID >= vocabSize: Extract from deepstackFeatures at index (tokenId - vocabSize)

Parameters:
  • inputIds[in] Input token IDs with shape [batchSize, seqLen]

  • deepstackFeatures[in] Deepstack image features with shape [numImageTokens, hiddenSize]

  • vocabSize[in] Vocabulary size threshold for distinguishing text vs image tokens

  • deepstackEmbeds[out] Output embeddings with shape [batchSize, seqLen, hiddenSize]

  • stream[in] CUDA stream for execution

void trt_edgellm::kernel::embeddingLookupMultimodal(
rt::Tensor const &inputIds,
rt::Tensor const &embeddingTable,
rt::OptionalInputTensor multimodalIndices,
std::optional<int32_t> imageTokenId,
rt::OptionalInputTensor imageEmbeds,
std::optional<int32_t> audioTokenId,
rt::OptionalInputTensor audioEmbeds,
rt::Tensor &output,
cudaStream_t stream
)#

Embedding lookup with optional image and audio embeddings for multimodal models.

This kernel handles up to three types of tokens:

  • Normal text tokens (0 <= tokenId < vocabSize): lookup from embeddingTable

  • Image tokens (tokenId == imageTokenId): lookup from imageEmbeds using multimodalIndices (optional)

  • Audio tokens (tokenId == audioTokenId): lookup from audioEmbeds using multimodalIndices (optional)

The multimodalIndices provides pre-computed indices into audioEmbeds/imageEmbeds for each position. For text tokens, the multimodalIndices value is not used. To indicate the presence of a modality, both token ID and the corresponding embedding tensor must be provided.

Note

audioTokenId and imageTokenId are allowed to be smaller than vocabSize, as in the case of Qwen3.

Note

Embeddings should contain data in the order specified by multimodalIndices

Note

When a modality is not needed, pass std::nullopt for both its tokenId and embeds

Note

multimodalIndices can be std::nullopt only when both imageEmbeds and audioEmbeds are std::nullopt

Parameters:
  • inputIds[in] Input token IDs with shape [batchSize, seqLen]

  • embeddingTable[in] Text embedding table with shape [vocabSize, hiddenSize]

  • multimodalIndices[in] Pre-computed indices for audio/image embeddings [batchSize, seqLen], can be std::nullopt if no image/audio inputs are provided

  • imageTokenId[in] Special token ID for image (e.g., 151655 in Qwen3), or std::nullopt if no image

  • imageEmbeds[in] Image embeddings with shape [totalImageTokens, hiddenSize], or std::nullopt if no image

  • audioTokenId[in] Special token ID for audio (e.g., 151675 in Qwen3), or std::nullopt if no audio

  • audioEmbeds[in] Audio embeddings with shape [totalAudioTokens, hiddenSize], or std::nullopt if no audio

  • output[out] Hidden states with shape [batchSize, seqLen, hiddenSize]

  • stream[in] CUDA stream for execution