Step Preparer#

class StepPreparer#

Prepares per-step sequence metadata (selectTokenIndices, contextLengths).

Extracted from the former LLMEngineRunner::executePrefillStep() and vanillaDecodingStepPrepareInputs() / vanillaDecodingStepBindTensors() methods. The class is stateless beyond configuration and a small host scratch buffer for selectTokenIndices computation.

Binding management is NOT this class’s concern:

kvcache_start_index is a static registry binding whose per-phase shape comes from InferenceDims::startIndexLen.
deepstack_embeds_* bindings are owned by DeepstackBinding (the runtime calls useRealFeatures / useZeroBroadcast directly).

Public Functions

explicit StepPreparer(LLMEngineConfig const &config)#: Construct with the engine configuration.

void prepare( InferencePhase phase, int32_t batchSize, HybridCacheManager &kvCache, PipelineIO &io, cudaStream_t stream )#

Prepare per-step sequence metadata (selectTokenIndices, contextLengths) for the given phase. Does not modify tensorMap.

Fills PipelineIO:

selectTokenIndices: prefill = ctxLen-1, decode = 0
contextLengths: prefill = H2D copy from hostContextLengths, decode = KV lengths (plugin) or zeros (native) + 1

Parameters:

phase – Inference phase (Prefill or Decode).
batchSize – Active batch size for this step.
kvCache – KV cache to query for lengths.
io – Pipeline I/O — selectTokenIndices & contextLengths are written.
stream – CUDA stream for async operations.