Skip to main content
Ctrl+K
TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

TensorRT Edge-LLM - Home TensorRT Edge-LLM - Home

TensorRT Edge-LLM

Table of Contents

Getting Started

  • Overview
  • Supported Models
  • Installation
  • Quick Start Guide
  • Limitations and Known Issues

Examples

  • VLM (Vision-Language Model) Inference
  • Speculative Decoding
  • Phi-4-Multimodal
  • ASR (Automatic Speech Recognition)
  • MoE (Mixture of Experts)
  • TTS (Text-to-Speech)

Features

  • LoRA (Low-Rank Adaptation)
  • Vocabulary Reduction
  • FP8 KV Cache
  • System Prompt Cache

Input & Chat Format

  • Input JSON Format
  • Chat Template Format

Software Design

  • Python Export Pipeline
  • Engine Builder
  • C++ Runtime Overview
  • LLM Inference Runtime
  • LLM Inference SpecDecode Runtime

Customization

  • Customization Guide
  • TensorRT Plugins Guide

Testing

  • Code Coverage with SonarQube

APIs

  • Python API Reference
  • C++ API Reference
    • Builder Module
      • Audio Builder
      • Builder Utils
      • LLM Builder
      • Visual Builder
    • Common Module
      • Binding Names
      • Check Macros
      • CUDA Macros
      • CUDA Utils
      • File Utils
      • Hash Utils
      • Input Limits
      • Logger
      • Math Utils
      • MMAP Reader
      • Safetensors Utils
      • String Utils
      • Tensor
      • TRT Utils
      • Version
    • Kernels Module
      • Apply Rope Write KV
      • Batch Evict Kernels
      • Causal Conv1d
      • Common
      • Context FMHA Runner
      • Conversion
      • Cute Dsl FMHA Runner
      • Decoder XQA Runner
      • Dequant
      • Dequantize
      • EAGLE Accept Kernels
      • EAGLE Util Kernels
      • Embedding Kernels
      • FMHA Params V2
      • Image Util Kernels
      • Initialize Cos Sin Cache
      • Int4 Groupwise GEMM
      • Kernel
      • Kernel Selector
      • KV Cache Utils Kernels
      • Marlin
      • Marlin Dtypes
      • Marlin Mma
      • Marlin Template
      • Moe Activation Kernels
      • Moe Align Sum Kernels
      • Moe Marlin
      • Moe Marlin Indices Kernels
      • Moe Topk Softmax Kernels
      • Selective State Update
      • Talker Mlp Kernels
      • Util Kernels
      • Vectorized Types
    • Multimodal Module
      • Audio Runner
      • Audio Utils
      • Code2 Wav Runner
      • Image Utils
      • Intern ViT Runner
      • Model Types
      • Multimodal Runner
      • Phi4mm ViT Runner
      • Qwen ViT Runner
    • Plugins Module
      • Attention Plugin
      • Causal Conv1d Plugin
      • Int4 Groupwise GEMM Plugin
      • Int4 Moe Plugin
      • Mamba Plugin
      • Plugin Utils
      • VIT Attention Plugin
    • Profiling Module
      • Layer Profiler
      • Metrics
      • Nvtx Wrapper
      • Timer
    • Runtime Module
      • Audio Utils
      • EAGLE Draft Engine Runner
      • Image Utils
      • Linear KV Cache
      • LLM Engine Runner
      • LLM Inference Runtime
      • LLM Inference Spec Decode Runtime
      • LLM Runtime Utils
      • Qwen3 Omni Tts Runtime
    • Sampler Module
      • Sampling
    • Tokenizer Module
      • Pre Tokenizer
      • Token Encoder
      • Tokenizer
      • Tokenizer Utils
      • Unicode Data

Quick Links

  • Releases
  • GitHub
  • Roadmap
  • C++ API Reference
  • Kernels Module

Kernels Module#

API documentation for the kernels module.

  • Apply Rope Write KV
  • Batch Evict Kernels
  • Causal Conv1d
  • Common
  • Context FMHA Runner
  • Conversion
  • Cute Dsl FMHA Runner
  • Decoder XQA Runner
  • Dequant
  • Dequantize
  • EAGLE Accept Kernels
  • EAGLE Util Kernels
  • Embedding Kernels
  • FMHA Params V2
  • Image Util Kernels
  • Initialize Cos Sin Cache
  • Int4 Groupwise GEMM
  • Kernel
  • Kernel Selector
  • KV Cache Utils Kernels
  • Marlin
  • Marlin Dtypes
  • Marlin Mma
  • Marlin Template
  • Moe Activation Kernels
  • Moe Align Sum Kernels
  • Moe Marlin
  • Moe Marlin Indices Kernels
  • Moe Topk Softmax Kernels
  • Selective State Update
  • Talker Mlp Kernels
  • Util Kernels
  • Vectorized Types

previous

Version

next

Apply Rope Write KV

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2025, Nvidia.

Last updated on March 14, 2026.

This page is generated by TensorRT-Edge-LLM commit d71c009.