Deployment Config#

struct SpecDecodeDraftingConfig#

User-supplied drafting parameters for speculative decoding.

Caller-side input to createDeploymentConfig. The factory consumes this together with the parsed engine configs to produce the consolidated SpecDecodeConfig stored on DeploymentConfig::specConfig.

Public Members

int32_t draftingTopK = {0}#: Tokens to select from one predecessor during draft expansion.

int32_t draftingStep = {0}#: Number of drafting steps with draft model.

int32_t verifySize = {0}#: Number of proposal tokens for base model verification.

struct SpecDecodeConfig#

Consolidated speculative decoding deployment configuration.

Holds every draft/verify speculative decoding value the runtime needs in one place, sourced from three inputs:

the base engine’s parsed config (baseOutputHiddenDim, maxVerifySize),
the draft engine’s parsed config (draftHiddenSize, maxDraftProposalSize),
the caller-supplied SpecDecodeDraftingConfig drafting parameters (draftingTopK, draftingStep, verifySize).

createDeploymentConfig populates this struct after both engine configs are parsed and validates the requested drafting shape against the engine capacities.

Public Members

int32_t baseOutputHiddenDim = {}#: Base engine output hidden dim as seen by the draft (= the third dim of the draft’s hidden_states_input binding). Sourced from the draft config’s base_model_hidden_size: base.hiddenSize * 3 for EAGLE-3, base.hiddenSize for MTP. NOT a base.hiddenSize * 3 derivation.

int32_t draftHiddenSize = {}#: Draft engine hidden dim (= draft.hiddenSize). Shared across all spec-decode strategies; the actual value differs per strategy (EAGLE-3 draft has its own independent hidden size; MTP draft equals base hidden size).

int32_t maxVerifySize = {}#: Max seq_len the base engine accepts for proposal verification.

int32_t maxDraftProposalSize = {}#: Max seq_len the draft engine accepts for proposal generation.

int32_t draftingTopK = {}#: Tokens to select from one predecessor during draft expansion.

int32_t draftingStep = {}#: Number of drafting steps with draft model.

int32_t verifySize = {}#: Number of proposal tokens for base model verification.

struct DeploymentConfig#

Complete deployment configuration: the base engine’s config, optional draft engine config, and optional consolidated speculative decoding settings.

For non-speculative deployments draft and specConfig are both absent. When specConfig is present draft must also be present — the factory enforces this invariant.

Public Functions

int32_t maxRuntimeBatchSize() const#: Maximum runtime batch size across the bundle. Returns the base engine’s maxSupportedBatchSize when there is no draft; otherwise returns the min of base and draft. Logs a warning if base and draft disagree.

int32_t effectiveMaxDraftProposalSize() const#: Effective maximum proposal size across drafting and verification. Returns max(specConfig->maxDraftProposalSize, specConfig->verifySize). Speculative decode only — throws std::runtime_error if specConfig is not set.

SpecDecodeMode specDecodeMode() const noexcept#: Return the concrete speculative decoding mode declared by the engine bundle.

Public Members

LLMEngineConfig base#: Parsed base engine configuration.

std::optional<LLMEngineConfig> draft#: Parsed draft engine configuration.

std::optional<SpecDecodeConfig> specConfig#: Consolidated speculative decoding settings.

DeploymentConfig trt_edgellm::rt::createDeploymentConfig( std::filesystem::path const &baseConfigPath, std::optional<std::filesystem::path> const &draftConfigPath, std::optional<SpecDecodeDraftingConfig> const &draftingConfig )#

Create a DeploymentConfig from engine config paths and optional user-side drafting.

Parses baseConfigPath via parseEngineConfig.
If draftConfigPath is set, parses it via parseDraftEngineConfig.
If draftingConfig is set, draftConfigPath must also be set (else throws).
If draftingConfig is set, builds specConfig by combining the engines’ capacities with the user-supplied drafting parameters, and validates them against the engines’ capacities:
- specConfig->verifySize <= specConfig->maxVerifySize
- specConfig->draftingStep * specConfig->draftingTopK <= specConfig->maxDraftProposalSize Throws with named-fields message on violation.

Throws:: std::runtime_error – on any validation failure or parse failure.