Inference Dims#

struct InferenceDims#

Typed symbolic dimension values resolved per inference step.

Fields have NO in-class defaults — every construction site must set every field explicitly. The canonical construction path is an LLMEngineConfig recipe method (prefillDims, decodeDims, specVerifyDims, proposalDims, acceptDims, resetDims). Direct construction (aggregate or designated initializers) is supported for unit tests.

The eight fields are the complete set of symbolic dims used by LLM and SpecDecode draft engines. Fixed-shape tensor dims do not appear here.

Public Members

int64_t batch#

Active batch size.

int64_t seqLen#

Work-unit length for this step (prompt / proposal / accept / 1)

int64_t kvLen#

KV cache capacity (usually LLMEngineConfig::maxKVCacheCapacity)

int64_t selectLen#

last_token_ids select count (1 except for SpecDecode verification)

int64_t attnMaskSeqLen#

Effective sequence length for the SpecDecode attention_mask / attention_pos_id tensors. This is decoupled from seqLen because the base engine’s attention plugin treats a “small” mask shape ([B, 1, 1]) as a signal to use standard causal attention, while a proposal-shaped mask triggers proposal attention and reads the buffer contents as a bit-packed mask.

Set to 1 for prefill / decode / reset (engine applies standard causal attention and ignores the dummy mask buffer); set to the effective proposal size for verify / proposal / accept (engine applies proposal attention using the prepared bit-packed mask).

int64_t ropeBatch#

RoPE broadcast dim (1 for non-MRope; batch for MRope)

int64_t packedMaskLen#

divUp(attnMaskSeqLen, 32) for SpecDecode masks; else 1

int64_t startIndexLen#

Shape length for kvcache_start_index. Zero is the engine’s sentinel for “initial prefill of an empty KV cache” (plugin-path engines only); batch means “use these per-batch start offsets” for chunked prefill, decode, verify, and accept. TRT-native-ops engines always use batch. Zero is a legitimate, engine-meaningful value for this dim.

struct ShapeDim#

A single dimension in a tensor shape — either fixed or symbolic.

When symbol is non-null the dimension is symbolic and its value is resolved by dereferencing inferenceDims.*symbol at bind time. Otherwise the fixed value is used as-is.

Public Functions

inline bool isSymbolic() const noexcept#

Return true when this dimension is symbolic.

Public Members

int64_t InferenceDims::* symbol = {nullptr}#

Non-null ⇒ symbolic; pointer-to-member of InferenceDims.

int64_t value = {0}#

Fixed value when symbol is null.

std::string_view trt_edgellm::rt::dimName(
int64_t InferenceDims::* member
)#

Return the human-readable name for a InferenceDims member pointer.

Returns an empty string_view if the pointer is not a member of InferenceDims (should not happen in practice — the pointer type restricts the input at compile time).

Parameters:

member – Pointer-to-member — e.g. &InferenceDims::batch

Returns:

Name corresponding to the member (e.g. “batch”)

std::string trt_edgellm::rt::toString(InferenceDims const &dims)#

Format a InferenceDims as a human-readable string.

Returns an owning std::string, not a string_view, because the formatted output has no persistent backing storage.

Parameters:

dims – The value to format

Returns:

A string like {batch=4, seq_len=128, kv_len=4096, ...}

int64_t InferenceDims::* trt_edgellm::rt::firstInvalidMember(
InferenceDims const &dims,
std::vector<int64_t InferenceDims::*> const &referenced
)#

Return the first referenced member whose value is <= 0, or nullptr if all referenced members are positive.

Used by EngineExecutor::prepare to catch callers who bypassed a recipe method and left some fields unset. Pure function: no TRT or CUDA dependency, so it is directly unit-testable without an IExecutionContext.

The scope of this check is restricted to registry-referenced members so that engines which do not use every dim (e.g. a model without a packed attention mask) are not forced to set values they do not consume.

Parameters:
Returns:

The first member-pointer with an invalid value, or nullptr on success