Skip to main content

Ctrl+K

TensorRT Edge-LLM

TensorRT Edge-LLM

Table of Contents

Getting Started

Overview
Quick Start Guide
Installation

Models

Supported Models

Model Export & Engine Building

Python Export Pipeline
Engine Builder

Chat Template Configuration

Chat Template Format Guide

C++ Runtime

C++ Runtime Overview
LLM Inference Runtime
LLM Inference SpecDecode Runtime
Advanced Runtime Features

Examples

Examples

Customization

Customization Guide

TensorRT Plugins

TensorRT Plugins Guide

APIs

Python API Reference
C++ API Reference

C++ API Reference

C++ API Reference#

This section provides documentation for the TensorRT Edge-LLM C++ API.

Builder Module
- Builder
Common Module
- Binding Names
- Check Macros
- CUDA Utils
- File Utils
- Hash Utils
- Logger
- MMAP Reader
- Safetensors Utils
- String Utils
- Tensor
- TRT Utils
- Version
Kernels Module
Multimodal Module
Plugins Module
Profiling Module
- Metrics
- Timer
Runtime Module
Sampler Module
- Sampling
Tokenizer Module

previous

Python API Reference

next

Builder Module

Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, Nvidia.

Last updated on January 05, 2026.

This page is generated by TensorRT-Edge-LLM commit f627c5b.