Skip to main content
Ctrl+K
TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

Table of Contents

Getting Started

  • Overview
  • Quick Start Guide
  • Installation

Models

  • Supported Models

Model Export & Engine Building

  • Python Export Pipeline
  • Engine Builder

Chat Template Configuration

  • Chat Template Format Guide

C++ Runtime

  • C++ Runtime Overview
  • LLM Inference Runtime
  • LLM Inference SpecDecode Runtime
  • Advanced Runtime Features

Examples

  • Examples

Customization

  • Customization Guide

TensorRT Plugins

  • TensorRT Plugins Guide

APIs

  • Python API Reference
  • C++ API Reference
    • Builder Module
      • Builder
    • Common Module
      • Binding Names
      • Check Macros
      • CUDA Utils
      • File Utils
      • Hash Utils
      • Logger
      • MMAP Reader
      • Safetensors Utils
      • String Utils
      • Tensor
      • TRT Utils
      • Version
    • Kernels Module
      • Apply Rope Write KV
      • Batch Evict Kernels
      • Context FMHA Runner
      • Decoder XQA Runner
      • Dequantize
      • EAGLE Accept Kernels
      • EAGLE Util Kernels
      • Embedding Kernels
      • FMHA Params V2
      • Image Util Kernels
      • Initialize Cos Sin Cache
      • Int4 Groupwise GEMM
      • KV Cache Utils Kernels
      • Util Kernels
      • Vectorized Types
    • Multimodal Module
      • Image Utils
      • Intern ViT Runner
      • Model Types
      • Multimodal Runner
      • Phi4mm ViT Runner
      • Qwen ViT Runner
    • Plugins Module
      • Attention Plugin
      • Int4 Groupwise GEMM Plugin
      • Plugin Utils
    • Profiling Module
      • Metrics
      • Timer
    • Runtime Module
      • EAGLE Draft Engine Runner
      • Image Utils
      • Linear KV Cache
      • LLM Engine Runner
      • LLM Inference Runtime
      • LLM Inference Spec Decode Runtime
      • LLM Runtime Utils
    • Sampler Module
      • Sampling
    • Tokenizer Module
      • Pre Tokenizer
      • Token Encoder
      • Tokenizer
      • Tokenizer Utils
      • Unicode Data
  • C++ API Reference
  • Runtime Module

Runtime Module#

API documentation for the runtime module.

  • EAGLE Draft Engine Runner
  • Image Utils
  • Linear KV Cache
  • LLM Engine Runner
  • LLM Inference Runtime
  • LLM Inference Spec Decode Runtime
  • LLM Runtime Utils

previous

Timer

next

EAGLE Draft Engine Runner

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, Nvidia.

Last updated on January 05, 2026.

This page is generated by TensorRT-Edge-LLM commit f627c5b.