Step Preparer#

class StepPreparer#

Prepares per-step sequence metadata (selectTokenIndices, contextLengths).

Extracted from the former LLMEngineRunner::executePrefillStep() and vanillaDecodingStepPrepareInputs() / vanillaDecodingStepBindTensors() methods. The class is stateless beyond configuration and a small host scratch buffer for selectTokenIndices computation.

Binding management is NOT this class’s concern:

  • kvcache_start_index is a static registry binding whose per-phase shape comes from InferenceDims::startIndexLen.

  • deepstack_embeds_* bindings are owned by DeepstackBinding (the runtime calls useRealFeatures / useZeroBroadcast directly).

Public Functions

explicit StepPreparer(LLMEngineConfig const &config)#

Construct with the engine configuration.

void prepare(
InferencePhase phase,
int32_t batchSize,
HybridCacheManager &kvCache,
PipelineIO &io,
cudaStream_t stream
)#

Prepare per-step sequence metadata (selectTokenIndices, contextLengths) for the given phase. Does not modify tensorMap.

Fills PipelineIO:

  • selectTokenIndices: prefill = ctxLen-1, decode = 0

  • contextLengths: prefill = H2D copy from hostContextLengths, decode = KV lengths (plugin) or zeros (native) + 1

Parameters:
  • phase – Inference phase (Prefill or Decode).

  • batchSize – Active batch size for this step.

  • kvCache – KV cache to query for lengths.

  • io – Pipeline I/O — selectTokenIndices & contextLengths are written.

  • stream – CUDA stream for async operations.