Installation#
For the NVIDIA DRIVE platform, please refer to the documentation shipped with the DriveOS release
TensorRT Edge-LLM has two separate components that need to be installed on different systems:
Python Export Pipeline (runs on x86 host with GPU)
C++ Runtime (builds and runs on Edge devices)
Part 1: Python Export Pipeline (x86 Host with GPU)#
The Python export pipeline converts and quantizes models. This must run on an x86 Linux system with an NVIDIA GPU.
System Requirements#
Platform: x86-64 Linux system
Recommended OS: Ubuntu 22.04, 24.04
GPU: NVIDIA GPU (for model quantization)
CUDA: 12.x or 13.x
Python: 3.10+
Installing#
1. Clone Repository
git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive
2. Install Python Package
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
# Install package with all dependencies
pip3 install .
This installs all required Python dependencies including:
PyTorch
Transformers
NVIDIA Model Optimizer
ONNX
And all other required dependencies
Note: For specific version requirements, please refer to
requirements.txtandpyproject.tomlin the repository root.
3. Verify Installation
# Test export tools
tensorrt-edgellm-export-llm --help
tensorrt-edgellm-quantize-llm --help
You’re done with export pipeline setup! You can now export and quantize models. The ONNX files will be transferred to the Edge device for runtime deployment.
Part 2: C++ Runtime (Edge Device)#
The C++ runtime builds and executes models on the target Edge device. This must be built on or for the target platform.
System Requirements#
Target Platform:
NVIDIA Jetson Thor
JetPack 7.1
CUDA 13.x (included in JetPack)
TensorRT 10.x+ (included in JetPack)
Build Instructions#
1. Install System Dependencies (on Edge device)
sudo apt update
sudo apt install -y \
cmake \
build-essential \
git
2. Verify CUDA and TensorRT Installation
After Jetpack is installed, TensorRT should be installed in /usr
# Check CUDA version
nvcc --version # Should show CUDA 13.x
# Check TensorRT version
dpkg -l | grep tensorrt # Should show TensorRT 10.x+
3. Clone Repository (on Edge device)
git clone https://github.com/NVIDIA/TensorRT-Edge-LLM.git
cd TensorRT-Edge-LLM
git submodule update --init --recursive
4. Configure Build
mkdir build
cd build
# For Edge platforms, it requires toolchain + embedded target
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DTRT_PACKAGE_DIR=/usr \
-DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
-DEMBEDDED_TARGET=jetson-thor
# If you are just developing on GPUs (SM80, 86, 89, 120) - No toolchain or embedded target needed.
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DTRT_PACKAGE_DIR=/path/to/TensorRT \
-DCUDA_VERSION=13.0
CMake Options:
Option |
Description |
Default |
|---|---|---|
|
Path to TensorRT installation |
Required |
|
Required for Edge devices: Use |
N/A |
|
Required for Edge devices: Target platform ( |
N/A |
|
CUDA version (such as 13.0). Important for matching target platform. |
13.0 |
|
Build unit tests |
OFF |
For supported GPU architectures and compute capabilities, see Supported Models - Platform Compatibility
5. Build Project
make -j$(nproc)
Build time: ~1-2 minutes depending on hardware.
6. Verify Build
# Test C++ examples
./examples/llm/llm_build --help
./examples/llm/llm_inference --help
You’re done with C++ runtime setup! You can now build engines and run inference on the Edge device.
Complete Workflow Summary#
On x86 Host (Export Pipeline):
Install Python package
Export and quantize models
Transfer ONNX files to Edge device
On Edge Device (C++ Runtime):
Build C++ runtime
Build TensorRT engines from ONNX files
Run inference
Refer to the Quick Start Guide for a complete end-to-end example.
Troubleshooting#
Common Installation Issues#
Issue: Python package import errors
Solution: Ensure virtual environment is activated and package is installed:
python3 -m venv venv
source venv/bin/activate
pip3 install .
Issue: nvcc: command not found
Solution: Ensure JetPack 7.1 is properly installed with CUDA support:
# Verify CUDA installation
nvcc --version
# Should show CUDA 13.x
Issue: TensorRT not found during CMake
Solution: Specify TensorRT package directory. This directory shall contain lib and include and we are looking for nvinfer library and header:
cmake .. \
-DTRT_PACKAGE_DIR=/usr/local/TensorRT-10.x.x \
-DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
-DEMBEDDED_TARGET=jetson-thor
Issue: Thread issue during C++ build
Solution: Reduce parallel jobs or even use sequential build:
make -j # Instead of make -j$(nproc)
Getting Help#
Documentation: Check the
docs/source/developer_guidedirectoryIssues: Report bugs on GitHub Issues
Discussions: Ask questions on GitHub Discussions
Community: Join the NVIDIA Developer Forums
Uninstalling#
Python Export Pipeline (x86 Host):
Deactivate and remove virtual environment:
deactivate && rm -rf venvRemove repository (optional):
rm -rf TensorRT-Edge-LLM
C++ Runtime (Edge Device):
Remove build directory:
rm -rf buildRemove repository (optional):
rm -rf TensorRT-Edge-LLM
Next Steps#
Now that you have TensorRT Edge-LLM installed, continue to:
Overview: Platform overview, supported features, and key components
Quick Start Guide: Get up and running in 15 minutes
Supported Models: Learn about supported models and how to prepare them
Examples: Explore example applications and use cases
For questions or issues, visit our TensorRT Edge-LLM GitHub repository.