Gemma4 Embedding Preprocessor#
-
class Gemma4EmbeddingPreprocessor#
Runtime preprocessor for Gemma4 E-model per-layer embeddings (PLE).
Loads the token-identity PLE table from ple_embedding.safetensors, gathers one [batch, seq_len, ple_hidden_size] tensor per decoder layer from token IDs, and binds those tensors as ple_token_embeds_{layer_idx} engine inputs.
Public Functions
- Gemma4EmbeddingPreprocessor(
- std::filesystem::path const &engineDir,
- LLMEngineConfig const &config,
- int32_t maxBatchSize,
- int32_t maxSeqLen,
- TensorMap &tensorMap,
- cudaStream_t stream
-
void embed(Tensor const &tokenIds, cudaStream_t stream)#
Gather PLE tensors for the current token-id tensor shape.
-
void reshapeOutputs(int64_t batchSize, int64_t seqLen)#
Reshape already-bound output tensors for a CUDA-graph capture shape.