Deployment Config#
-
struct SpecDecodeDraftingConfig#
User-supplied drafting parameters for speculative decoding.
Caller-side input to
createDeploymentConfig. The factory consumes this together with the parsed engine configs to produce the consolidatedSpecDecodeConfigstored onDeploymentConfig::specConfig.
-
struct SpecDecodeConfig#
Consolidated speculative decoding deployment configuration.
Holds every draft/verify speculative decoding value the runtime needs in one place, sourced from three inputs:
the base engine’s parsed config (
baseOutputHiddenDim,maxVerifySize),the draft engine’s parsed config (
draftHiddenSize,maxDraftProposalSize),the caller-supplied
SpecDecodeDraftingConfigdrafting parameters (draftingTopK,draftingStep,verifySize).
createDeploymentConfigpopulates this struct after both engine configs are parsed and validates the requested drafting shape against the engine capacities.Public Members
-
int32_t baseOutputHiddenDim = {}#
Base engine output hidden dim as seen by the draft (= the third dim of the draft’s
hidden_states_inputbinding). Sourced from the draft config’sbase_model_hidden_size:base.hiddenSize * 3for EAGLE-3,base.hiddenSizefor MTP. NOT abase.hiddenSize * 3derivation.
-
int32_t draftHiddenSize = {}#
Draft engine hidden dim (= draft.hiddenSize). Shared across all spec-decode strategies; the actual value differs per strategy (EAGLE-3 draft has its own independent hidden size; MTP draft equals base hidden size).
-
int32_t maxVerifySize = {}#
Max seq_len the base engine accepts for proposal verification.
-
int32_t maxDraftProposalSize = {}#
Max seq_len the draft engine accepts for proposal generation.
-
int32_t draftingTopK = {}#
Tokens to select from one predecessor during draft expansion.
-
int32_t draftingStep = {}#
Number of drafting steps with draft model.
-
int32_t verifySize = {}#
Number of proposal tokens for base model verification.
-
struct DeploymentConfig#
Complete deployment configuration: the base engine’s config, optional draft engine config, and optional consolidated speculative decoding settings.
For non-speculative deployments
draftandspecConfigare both absent. WhenspecConfigis presentdraftmust also be present — the factory enforces this invariant.Public Functions
-
int32_t maxRuntimeBatchSize() const#
Maximum runtime batch size across the bundle. Returns the base engine’s
maxSupportedBatchSizewhen there is no draft; otherwise returns theminof base and draft. Logs a warning if base and draft disagree.
-
int32_t effectiveMaxDraftProposalSize() const#
Effective maximum proposal size across drafting and verification. Returns
max(specConfig->maxDraftProposalSize, specConfig->verifySize). Speculative decode only — throwsstd::runtime_errorifspecConfigis not set.
-
SpecDecodeMode specDecodeMode() const noexcept#
Return the concrete speculative decoding mode declared by the engine bundle.
Public Members
-
LLMEngineConfig base#
Parsed base engine configuration.
-
std::optional<LLMEngineConfig> draft#
Parsed draft engine configuration.
-
std::optional<SpecDecodeConfig> specConfig#
Consolidated speculative decoding settings.
-
int32_t maxRuntimeBatchSize() const#
- DeploymentConfig trt_edgellm::rt::createDeploymentConfig(
- std::filesystem::path const &baseConfigPath,
- std::optional<std::filesystem::path> const &draftConfigPath,
- std::optional<SpecDecodeDraftingConfig> const &draftingConfig
Create a
DeploymentConfigfrom engine config paths and optional user-side drafting.Parses
baseConfigPathviaparseEngineConfig.If
draftConfigPathis set, parses it viaparseDraftEngineConfig.If
draftingConfigis set,draftConfigPathmust also be set (else throws).If
draftingConfigis set, buildsspecConfigby combining the engines’ capacities with the user-supplied drafting parameters, and validates them against the engines’ capacities:specConfig->verifySize <= specConfig->maxVerifySizespecConfig->draftingStep * specConfig->draftingTopK <= specConfig->maxDraftProposalSizeThrows with named-fields message on violation.
- Throws:
std::runtime_error – on any validation failure or parse failure.