Skip to main content
Ctrl+K
TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

Table of Contents

Getting Started

  • Overview
  • Quick Start Guide
  • Installation

Models

  • Supported Models

Model Export & Engine Building

  • Python Export Pipeline
  • Engine Builder

Chat Template Configuration

  • Chat Template Format Guide

C++ Runtime

  • C++ Runtime Overview
  • LLM Inference Runtime
  • LLM Inference SpecDecode Runtime
  • Advanced Runtime Features

Examples

  • Examples

Customization

  • Customization Guide

TensorRT Plugins

  • TensorRT Plugins Guide

APIs

  • Python API Reference
  • C++ API Reference
    • Builder Module
      • Builder
    • Common Module
      • Binding Names
      • Check Macros
      • CUDA Utils
      • File Utils
      • Hash Utils
      • Logger
      • MMAP Reader
      • Safetensors Utils
      • String Utils
      • Tensor
      • TRT Utils
      • Version
    • Kernels Module
      • Apply Rope Write KV
      • Batch Evict Kernels
      • Context FMHA Runner
      • Decoder XQA Runner
      • Dequantize
      • EAGLE Accept Kernels
      • EAGLE Util Kernels
      • Embedding Kernels
      • FMHA Params V2
      • Image Util Kernels
      • Initialize Cos Sin Cache
      • Int4 Groupwise GEMM
      • KV Cache Utils Kernels
      • Util Kernels
      • Vectorized Types
    • Multimodal Module
      • Image Utils
      • Intern ViT Runner
      • Model Types
      • Multimodal Runner
      • Phi4mm ViT Runner
      • Qwen ViT Runner
    • Plugins Module
      • Attention Plugin
      • Int4 Groupwise GEMM Plugin
      • Plugin Utils
    • Profiling Module
      • Metrics
      • Timer
    • Runtime Module
      • EAGLE Draft Engine Runner
      • Image Utils
      • Linear KV Cache
      • LLM Engine Runner
      • LLM Inference Runtime
      • LLM Inference Spec Decode Runtime
      • LLM Runtime Utils
    • Sampler Module
      • Sampling
    • Tokenizer Module
      • Pre Tokenizer
      • Token Encoder
      • Tokenizer
      • Tokenizer Utils
      • Unicode Data
  • C++ API Reference

C++ API Reference#

This section provides documentation for the TensorRT Edge-LLM C++ API.

  • Builder Module
    • Builder
  • Common Module
    • Binding Names
    • Check Macros
    • CUDA Utils
    • File Utils
    • Hash Utils
    • Logger
    • MMAP Reader
    • Safetensors Utils
    • String Utils
    • Tensor
    • TRT Utils
    • Version
  • Kernels Module
    • Apply Rope Write KV
    • Batch Evict Kernels
    • Context FMHA Runner
    • Decoder XQA Runner
    • Dequantize
    • EAGLE Accept Kernels
    • EAGLE Util Kernels
    • Embedding Kernels
    • FMHA Params V2
    • Image Util Kernels
    • Initialize Cos Sin Cache
    • Int4 Groupwise GEMM
    • KV Cache Utils Kernels
    • Util Kernels
    • Vectorized Types
  • Multimodal Module
    • Image Utils
    • Intern ViT Runner
    • Model Types
    • Multimodal Runner
    • Phi4mm ViT Runner
    • Qwen ViT Runner
  • Plugins Module
    • Attention Plugin
    • Int4 Groupwise GEMM Plugin
    • Plugin Utils
  • Profiling Module
    • Metrics
    • Timer
  • Runtime Module
    • EAGLE Draft Engine Runner
    • Image Utils
    • Linear KV Cache
    • LLM Engine Runner
    • LLM Inference Runtime
    • LLM Inference Spec Decode Runtime
    • LLM Runtime Utils
  • Sampler Module
    • Sampling
  • Tokenizer Module
    • Pre Tokenizer
    • Token Encoder
    • Tokenizer
    • Tokenizer Utils
    • Unicode Data

previous

Python API Reference

next

Builder Module

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, Nvidia.

Last updated on January 05, 2026.

This page is generated by TensorRT-Edge-LLM commit f627c5b.