Installation#

For the NVIDIA DRIVE platform, please refer to the documentation shipped with the DriveOS release

TensorRT Edge-LLM has two separate components that need to be installed on different systems:

  1. Python Export Pipeline (runs on x86 host with GPU)

  2. C++ Runtime (builds and runs on Edge devices)


Part 1: Python Export Pipeline (x86 Host with GPU)#

The Python export pipeline converts and quantizes models. This must run on an x86 Linux system with an NVIDIA GPU.

System Requirements#

  • Platform: x86-64 Linux system

  • Recommended OS: Ubuntu 22.04, 24.04

  • GPU: NVIDIA GPU (for model quantization)

  • CUDA: 12.x or 13.x

  • Python: 3.10+

Installing#

1. Clone Repository

git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive

2. Install Python Package

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install package with all dependencies
pip3 install .

This installs all required Python dependencies including:

  • PyTorch

  • Transformers

  • NVIDIA Model Optimizer

  • ONNX

  • And all other required dependencies

Note: For specific version requirements, please refer to requirements.txt and pyproject.toml in the repository root.

3. Verify Installation

# Test export tools
tensorrt-edgellm-export-llm --help
tensorrt-edgellm-quantize-llm --help

You’re done with export pipeline setup! You can now export and quantize models. The ONNX files will be transferred to the Edge device for runtime deployment.


Part 2: C++ Runtime (Edge Device)#

The C++ runtime builds and executes models on the target Edge device. This must be built on or for the target platform.

System Requirements#

Target Platform:

  • NVIDIA Jetson Thor

  • JetPack 7.1

  • CUDA 13.x (included in JetPack)

  • TensorRT 10.x+ (included in JetPack)

Build Instructions#

1. Install System Dependencies (on Edge device)

sudo apt update
sudo apt install -y \
    cmake \
    build-essential \
    git

2. Verify CUDA and TensorRT Installation

After Jetpack is installed, TensorRT should be installed in /usr

# Check CUDA version
nvcc --version  # Should show CUDA 13.x

# Check TensorRT version
dpkg -l | grep tensorrt  # Should show TensorRT 10.x+

3. Clone Repository (on Edge device)

git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive

4. Configure Build

mkdir build
cd build

# For Edge platforms, it requires toolchain + embedded target
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DTRT_PACKAGE_DIR=/usr \
    -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
    -DEMBEDDED_TARGET=jetson-thor

# If you are just developing on GPUs (SM80, 86, 89, 120) - No toolchain or embedded target needed.

cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DTRT_PACKAGE_DIR=/path/to/TensorRT \
    -DCUDA_VERSION=13.0

CMake Options:

Option

Description

Default

TRT_PACKAGE_DIR

Path to TensorRT installation

Required

CMAKE_TOOLCHAIN_FILE

Required for Edge devices: Use cmake/aarch64_linux_toolchain.cmake for Edge device builds. Not needed for GPU builds

N/A

EMBEDDED_TARGET

Required for Edge devices: Target platform (jetson-thor). Not needed for GPU builds

N/A

CUDA_VERSION

CUDA version (such as 13.0). Important for matching target platform.

13.0

BUILD_UNIT_TESTS

Build unit tests

OFF

For supported GPU architectures and compute capabilities, see Supported Models - Platform Compatibility

5. Build Project

make -j$(nproc)

Build time: ~1-2 minutes depending on hardware.

6. Verify Build

# Test C++ examples
./examples/llm/llm_build --help
./examples/llm/llm_inference --help

You’re done with C++ runtime setup! You can now build engines and run inference on the Edge device.


Complete Workflow Summary#

On x86 Host (Export Pipeline):

  1. Install Python package

  2. Export and quantize models

  3. Transfer ONNX files to Edge device

On Edge Device (C++ Runtime):

  1. Build C++ runtime

  2. Build TensorRT engines from ONNX files

  3. Run inference

Refer to the Quick Start Guide for a complete end-to-end example.


Troubleshooting#

Common Installation Issues#

Issue: Python package import errors

Solution: Ensure virtual environment is activated and package is installed:

python3 -m venv venv
source venv/bin/activate
pip3 install .

Issue: nvcc: command not found

Solution: Ensure JetPack 7.1 is properly installed with CUDA support:

# Verify CUDA installation
nvcc --version
# Should show CUDA 13.x

Issue: TensorRT not found during CMake

Solution: Specify TensorRT package directory. This directory shall contain lib and include and we are looking for nvinfer library and header:

cmake .. \
    -DTRT_PACKAGE_DIR=/usr/local/TensorRT-10.x.x \
    -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
    -DEMBEDDED_TARGET=jetson-thor

Issue: Thread issue during C++ build

Solution: Reduce parallel jobs or even use sequential build:

make -j  # Instead of make -j$(nproc)

Getting Help#

  • Documentation: Check the docs/source/developer_guide directory

  • Issues: Report bugs on GitHub Issues

  • Discussions: Ask questions on GitHub Discussions

  • Community: Join the NVIDIA Developer Forums

Uninstalling#

Python Export Pipeline (x86 Host):

  • Deactivate and remove virtual environment: deactivate && rm -rf venv

  • Remove repository (optional): rm -rf TensorRT-Edge-LLM

C++ Runtime (Edge Device):

  • Remove build directory: rm -rf build

  • Remove repository (optional): rm -rf TensorRT-Edge-LLM


Next Steps#

Now that you have TensorRT Edge-LLM installed, continue to:

  1. Overview: Platform overview, supported features, and key components

  2. Quick Start Guide: Get up and running in 15 minutes

  3. Supported Models: Learn about supported models and how to prepare them

  4. Examples: Explore example applications and use cases


For questions or issues, visit our TensorRT Edge-LLM GitHub repository.