Step Preparer#
-
class StepPreparer#
Prepares per-step sequence metadata (selectTokenIndices, contextLengths).
Extracted from the former
LLMEngineRunner::executePrefillStep()andvanillaDecodingStepPrepareInputs()/vanillaDecodingStepBindTensors()methods. The class is stateless beyond configuration and a small host scratch buffer for selectTokenIndices computation.Binding management is NOT this class’s concern:
kvcache_start_indexis a static registry binding whose per-phase shape comes fromInferenceDims::startIndexLen.deepstack_embeds_*bindings are owned byDeepstackBinding(the runtime callsuseRealFeatures/useZeroBroadcastdirectly).
Public Functions
-
explicit StepPreparer(LLMEngineConfig const &config)#
Construct with the engine configuration.
- void prepare(
- InferencePhase phase,
- int32_t batchSize,
- HybridCacheManager &kvCache,
- PipelineIO &io,
- cudaStream_t stream
Prepare per-step sequence metadata (
selectTokenIndices,contextLengths) for the given phase. Does not modifytensorMap.Fills PipelineIO:
selectTokenIndices: prefill = ctxLen-1, decode = 0
contextLengths: prefill = H2D copy from hostContextLengths, decode = KV lengths (plugin) or zeros (native) + 1
- Parameters:
phase – Inference phase (Prefill or Decode).
batchSize – Active batch size for this step.
kvCache – KV cache to query for lengths.
io – Pipeline I/O — selectTokenIndices & contextLengths are written.
stream – CUDA stream for async operations.