Gemma4 Embedding Preprocessor#

class Gemma4EmbeddingPreprocessor#

Runtime preprocessor for Gemma4 E-model per-layer embeddings (PLE).

Loads the token-identity PLE table from ple_embedding.safetensors, gathers one [batch, seq_len, ple_hidden_size] tensor per decoder layer from token IDs, and binds those tensors as ple_token_embeds_{layer_idx} engine inputs.

Public Functions

Gemma4EmbeddingPreprocessor( std::filesystem::path const &engineDir, LLMEngineConfig const &config, int32_t maxBatchSize, int32_t maxSeqLen, TensorMap &tensorMap, cudaStream_t stream, )#

void embed(Tensor const &tokenIds, cudaStream_t stream)#: Gather PLE tensors for the current token-id tensor shape.

void reshapeOutputs(int64_t batchSize, int64_t seqLen)#: Reshape already-bound output tensors for a CUDA-graph capture shape.