Gemma4 Embedding Preprocessor#

class Gemma4EmbeddingPreprocessor#

Runtime preprocessor for Gemma4 E-model per-layer embeddings (PLE).

Loads the token-identity PLE table from ple_embedding.safetensors, gathers one [batch, seq_len, ple_hidden_size] tensor per decoder layer from token IDs, and binds those tensors as ple_token_embeds_{layer_idx} engine inputs.

Public Functions

Gemma4EmbeddingPreprocessor(
std::filesystem::path const &engineDir,
LLMEngineConfig const &config,
int32_t maxBatchSize,
int32_t maxSeqLen,
TensorMap &tensorMap,
cudaStream_t stream
)#
void embed(Tensor const &tokenIds, cudaStream_t stream)#

Gather PLE tensors for the current token-id tensor shape.

void reshapeOutputs(int64_t batchSize, int64_t seqLen)#

Reshape already-bound output tensors for a CUDA-graph capture shape.