C++ API Reference#
This section provides documentation for the TensorRT Edge-LLM C++ API.
- Action Module
- Builder Module
- Common Module
- Kernels Module
- Nv Fp4 Mo E Contiguous GEMM Runner
- Nv Fp4 Mo Efc2 Finalize Runner
- Nv Fp4 Mo E Utils
- Alpha Compute
- Apply Rope Write KV
- Batch Evict Kernels
- Build Layout
- Causal Conv1d
- Common
- Context FMHA Runner
- Conversion
- Cute Dsl Decode Gemv Runner
- Cute Dsl FMHA Runner
- Cute Dsl Gdn Runner
- Cute Dsl GEMM Runner
- Cute Dsl Nvfp4 Moe Runner
- Cute Dsl Ssd Runner
- Decoder XQA Runner
- Dequant
- Dequantize
- EAGLE Accept Kernels
- EAGLE Util Kernels
- Embedding Kernels
- FMHA Params V2
- Fp4 Quantize
- Gdn Kernel Utils
- Image Util Kernels
- Initialize Cos Sin Cache
- Int4 Groupwise GEMM
- Kernel
- Kernel Selector
- Kernels
- KV Cache Utils Kernels
- Marlin
- Marlin Dtypes
- Marlin Mma
- Marlin Template
- Marlin Template
- Moe Activation Kernels
- Moe Align Sum Kernels
- Moe Gather
- Moe Marlin
- Moe Marlin Indices Kernels
- Moe Sigmoid Group Topk Kernels
- Moe Topk Softmax Kernels
- Mtp State Scatter Kernels
- Nvfp4 Moe Types
- Nvfp4 Dequant
- Nvfp4 Tensor
- Selective State Update
- Ssd Varlen Metadata
- Talker Mlp Kernels
- Util Kernels
- Vectorized Types
- Multimodal Module
- Plugins Module
- Profiling Module
- Runtime Module
- Audio Utils
- Decoder Registry
- Decoding Inference Context
- Decoding Strategy
- Deepstack Binding
- Deployment Config
- EAGLE Decoder
- EAGLE Draft Engine Runner
- Embedding Preprocessor
- Engine Executor
- External Weight Manager
- Hybrid Cache Manager
- Image Utils
- Inference Dims
- Inference Phase
- KV Cache Manager
- LLM Engine Config
- LLM Engine Runner
- LLM Inference Runtime
- LLM Runtime Utils
- Lora Manager
- Mamba Cache Manager
- Mtp Decoder
- Pipeline Io
- Qwen3 Omni Tts Runtime
- Registry Builder
- Rope Cache
- Shared Resources
- Spec Decode Utils
- Step Preparer
- Streaming
- System Prompt KV Cache
- Tensor Map
- Tensor Registry
- Vanilla Decoder
- Sampler Module
- Tokenizer Module