Installation#

For the NVIDIA DRIVE platform, please refer to the documentation shipped with the DriveOS release

TensorRT Edge-LLM has two separate components that need to be installed on different systems:

Python Export Pipeline (runs on x86 host with GPU)
C++ Runtime (builds and runs on Edge devices)

Part 1: Python Export Pipeline (x86 Host with GPU)#

The Python export pipeline converts and quantizes models. This must run on an x86 Linux system with an NVIDIA GPU.

System Requirements#

Platform: x86-64 Linux system
Recommended OS: Ubuntu 22.04, 24.04
GPU: NVIDIA GPU (for model quantization)
CUDA: 12.x or 13.x
Python: 3.10+

Installing#

1. Clone Repository

git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive

2. Install Python Package

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install package with all dependencies
pip3 install .

This installs all required Python dependencies including:

PyTorch
Transformers
NVIDIA Model Optimizer
ONNX
And all other required dependencies

Note: For specific version requirements, please refer to requirements.txt and pyproject.toml in the repository root.

3. Verify Installation

# Test export tools
tensorrt-edgellm-export-llm --help
tensorrt-edgellm-quantize-llm --help

You’re done with export pipeline setup! You can now export and quantize models. The ONNX files will be transferred to the Edge device for runtime deployment.

Part 2: C++ Runtime (Edge Device)#

The C++ runtime builds and executes models on the target Edge device. This must be built on or for the target platform.

System Requirements#

Target Platform:

NVIDIA Jetson Thor
JetPack 7.1
CUDA 13.x (included in JetPack)
TensorRT 10.x+ (included in JetPack)

Build Instructions#

1. Install System Dependencies (on Edge device)

sudo apt update
sudo apt install -y \
    cmake \
    build-essential \
    git

2. Verify CUDA and TensorRT Installation

After Jetpack is installed, TensorRT should be installed in /usr

# Check CUDA version
nvcc --version  # Should show CUDA 13.x

# Check TensorRT version
dpkg -l | grep tensorrt  # Should show TensorRT 10.x+

3. Clone Repository (on Edge device)

git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive

4. Configure Build

mkdir build
cd build

# For Edge platforms, it requires toolchain + embedded target
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DTRT_PACKAGE_DIR=/usr \
    -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
    -DEMBEDDED_TARGET=jetson-thor

# If you are just developing on GPUs (SM80, 86, 89, 120) - No toolchain or embedded target needed.

cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DTRT_PACKAGE_DIR=/path/to/TensorRT \
    -DCUDA_VERSION=13.0

CMake Options:

Option	Description	Default
`TRT_PACKAGE_DIR`	Path to TensorRT installation	Required
`CMAKE_TOOLCHAIN_FILE`	Required for Edge devices: Use `cmake/aarch64_linux_toolchain.cmake` for Edge device builds. Not needed for GPU builds	N/A
`EMBEDDED_TARGET`	Required for Edge devices: Target platform (`jetson-thor`). Not needed for GPU builds	N/A
`CUDA_VERSION`	CUDA version (such as 13.0). Important for matching target platform.	13.0
`BUILD_UNIT_TESTS`	Build unit tests	OFF

For supported GPU architectures and compute capabilities, see Supported Models - Platform Compatibility

5. Build Project

make -j$(nproc)

Build time: ~1-2 minutes depending on hardware.

6. Verify Build

# Test C++ examples
./examples/llm/llm_build --help
./examples/llm/llm_inference --help

You’re done with C++ runtime setup! You can now build engines and run inference on the Edge device.

Complete Workflow Summary#

On x86 Host (Export Pipeline):

Install Python package
Export and quantize models
Transfer ONNX files to Edge device

On Edge Device (C++ Runtime):

Build C++ runtime
Build TensorRT engines from ONNX files
Run inference

Refer to the Quick Start Guide for a complete end-to-end example.

Troubleshooting#

Common Installation Issues#

Issue: Python package import errors

Solution: Ensure virtual environment is activated and package is installed:

python3 -m venv venv
source venv/bin/activate
pip3 install .

Issue: nvcc: command not found

Solution: Ensure JetPack 7.1 is properly installed with CUDA support:

# Verify CUDA installation
nvcc --version
# Should show CUDA 13.x

Issue: TensorRT not found during CMake

Solution: Specify TensorRT package directory. This directory shall contain lib and include and we are looking for nvinfer library and header:

cmake .. \
    -DTRT_PACKAGE_DIR=/usr/local/TensorRT-10.x.x \
    -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
    -DEMBEDDED_TARGET=jetson-thor

Issue: Thread issue during C++ build

Solution: Reduce parallel jobs or even use sequential build:

make -j  # Instead of make -j$(nproc)

Getting Help#

Documentation: Check the docs/source/developer_guide directory
Issues: Report bugs on GitHub Issues
Discussions: Ask questions on GitHub Discussions
Community: Join the NVIDIA Developer Forums

Uninstalling#

Python Export Pipeline (x86 Host):

Deactivate and remove virtual environment: deactivate && rm -rf venv
Remove repository (optional): rm -rf TensorRT-Edge-LLM

C++ Runtime (Edge Device):

Remove build directory: rm -rf build
Remove repository (optional): rm -rf TensorRT-Edge-LLM

Next Steps#

Now that you have TensorRT Edge-LLM installed, continue to:

Overview: Platform overview, supported features, and key components
Quick Start Guide: Get up and running in 15 minutes
Supported Models: Learn about supported models and how to prepare them
Examples: Explore example applications and use cases

For questions or issues, visit our TensorRT Edge-LLM GitHub repository.