Skip to main content

Ctrl+K

TensorRT Edge-LLM

TensorRT Edge-LLM

Table of Contents

Getting Started

Overview
Quick Start Guide
Installation

Models

Supported Models

Model Export & Engine Building

Python Export Pipeline
Engine Builder

Chat Template Configuration

Chat Template Format Guide

C++ Runtime

C++ Runtime Overview
LLM Inference Runtime
LLM Inference SpecDecode Runtime
Advanced Runtime Features

Examples

Examples

Customization

Customization Guide

TensorRT Plugins

TensorRT Plugins Guide

APIs

Python API Reference
C++ API Reference

C++ API Reference
Runtime Module

Runtime Module#

API documentation for the runtime module.

EAGLE Draft Engine Runner
Image Utils
Linear KV Cache
LLM Engine Runner
LLM Inference Runtime
LLM Inference Spec Decode Runtime
LLM Runtime Utils

previous

Timer

next

EAGLE Draft Engine Runner

Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, Nvidia.

Last updated on January 05, 2026.

This page is generated by TensorRT-Edge-LLM commit f627c5b.