Inference Dims#
-
struct InferenceDims#
Typed symbolic dimension values resolved per inference step.
Fields have NO in-class defaults — every construction site must set every field explicitly. The canonical construction path is an
LLMEngineConfigrecipe method (prefillDims,decodeDims,specVerifyDims,proposalDims,acceptDims,resetDims). Direct construction (aggregate or designated initializers) is supported for unit tests.The eight fields are the complete set of symbolic dims used by LLM and SpecDecode draft engines. Fixed-shape tensor dims do not appear here.
Public Members
-
int64_t batch#
Active batch size.
-
int64_t seqLen#
Work-unit length for this step (prompt / proposal / accept / 1)
-
int64_t kvLen#
KV cache capacity (usually LLMEngineConfig::maxKVCacheCapacity)
-
int64_t selectLen#
last_token_ids select count (1 except for SpecDecode verification)
-
int64_t attnMaskSeqLen#
Effective sequence length for the SpecDecode attention_mask / attention_pos_id tensors. This is decoupled from
seqLenbecause the base engine’s attention plugin treats a “small” mask shape ([B, 1, 1]) as a signal to use standard causal attention, while a proposal-shaped mask triggers proposal attention and reads the buffer contents as a bit-packed mask.Set to 1 for prefill / decode / reset (engine applies standard causal attention and ignores the dummy mask buffer); set to the effective proposal size for verify / proposal / accept (engine applies proposal attention using the prepared bit-packed mask).
-
int64_t ropeBatch#
RoPE broadcast dim (1 for non-MRope; batch for MRope)
-
int64_t packedMaskLen#
divUp(attnMaskSeqLen, 32) for SpecDecode masks; else 1
-
int64_t startIndexLen#
Shape length for
kvcache_start_index. Zero is the engine’s sentinel for “initial prefill of an empty KV cache” (plugin-path engines only);batchmeans “use these per-batch start offsets” for chunked prefill, decode, verify, and accept. TRT-native-ops engines always usebatch. Zero is a legitimate, engine-meaningful value for this dim.
-
int64_t batch#
-
struct ShapeDim#
A single dimension in a tensor shape — either fixed or symbolic.
When
symbolis non-null the dimension is symbolic and its value is resolved by dereferencinginferenceDims.*symbolat bind time. Otherwise the fixedvalueis used as-is.Public Functions
-
inline bool isSymbolic() const noexcept#
Return true when this dimension is symbolic.
Public Members
-
int64_t InferenceDims::* symbol = {nullptr}#
Non-null ⇒ symbolic; pointer-to-member of InferenceDims.
-
int64_t value = {0}#
Fixed value when symbol is null.
-
inline bool isSymbolic() const noexcept#
- std::string_view trt_edgellm::rt::dimName(
- int64_t InferenceDims::* member
Return the human-readable name for a
InferenceDimsmember pointer.Returns an empty
string_viewif the pointer is not a member ofInferenceDims(should not happen in practice — the pointer type restricts the input at compile time).- Parameters:
member – Pointer-to-member — e.g.
&InferenceDims::batch- Returns:
Name corresponding to the member (e.g. “batch”)
-
std::string trt_edgellm::rt::toString(InferenceDims const &dims)#
Format a
InferenceDimsas a human-readable string.Returns an owning
std::string, not astring_view, because the formatted output has no persistent backing storage.- Parameters:
dims – The value to format
- Returns:
A string like
{batch=4, seq_len=128, kv_len=4096, ...}
- int64_t InferenceDims::* trt_edgellm::rt::firstInvalidMember(
- InferenceDims const &dims,
- std::vector<int64_t InferenceDims::*> const &referenced
Return the first referenced member whose value is <= 0, or nullptr if all referenced members are positive.
Used by
EngineExecutor::prepareto catch callers who bypassed a recipe method and left some fields unset. Pure function: no TRT or CUDA dependency, so it is directly unit-testable without anIExecutionContext.The scope of this check is restricted to registry-referenced members so that engines which do not use every dim (e.g. a model without a packed attention mask) are not forced to set values they do not consume.
- Parameters:
dims – The values to validate
referenced – The set of members referenced by the registry — typically returned by
TensorRegistry::referencedMembers()
- Returns:
The first member-pointer with an invalid value, or nullptr on success