Build from Source#

Building from source is mostly intended for developers who wish to modify, customize, and contribute to TensorRT LLM. If you only need to run TensorRT LLM, use the Installation Guide instead.

Prerequisites#

Use Docker to build and run TensorRT LLM. Instructions to install an environment to run Docker containers for the NVIDIA platform can be found here.

TensorRT LLM uses git-lfs, which needs to be installed in advance:

apt-get update && apt-get -y install git git-lfs
git lfs install

Step 1: Clone the Repository#

git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull

Step 2: Pull the Development Container#

Pull the pre-built TensorRT LLM devel container from NGC. Replace x.y.z with the desired version. Browse the available tags on NGC to find the latest release.

docker pull nvcr.io/nvidia/tensorrt-llm/devel:1.3.0rc13

Step 3: Start the Container#

From the repository root, start a development container with the source tree mounted into it.

docker run --rm -it \
        --ipc=host \
        --ulimit memlock=-1 --ulimit stack=67108864 \
        --gpus=all \
        --volume <path_to_tensorrt_llm_on_host>:<path_to_tensorrt_llm_in_container> \
        --workdir <path_to_tensorrt_llm_in_container> \
        nvcr.io/nvidia/tensorrt-llm/devel:1.3.0rc13

Step 4: Build TensorRT LLM#

Once inside the container, build TensorRT LLM from source using scripts/build_wheel.py. Run python3 ./scripts/build_wheel.py --help for the full list of options.

Typical development build#

Build the C++ code, skip wheel packaging, and use symlinks so that changes are reflected immediately. Then install in editable mode for Python development.

python3 scripts/build_wheel.py --use_ccache -a "90-real" --skip_building_wheel --linking_install_binary
pip install -e .

Key flags used above:

Flag

Purpose

--use_ccache

Use ccache for faster incremental rebuilds

-a "90-real"

Build only for a specific GPU architecture (e.g. Hopper). Reduces compile time significantly. See Hardware for values.

--skip_building_wheel

Skip .whl packaging – only needed for distribution, not development

--linking_install_binary

Symlink built libraries instead of copying them

pip install -e .

Editable install so Python changes take effect without reinstalling

Other common options#

Flag

Purpose

--clean

Clean the build directory before building

--build_type RelWithDebInfo

Build with debug info (default: Release)

-j <N>

Number of parallel compile jobs (default: number of available CPUs)

--fast_build

Skip compiling some kernels to speed up compilation – for development only

--cpp_only

Build only the C++ runtime library, without Python bindings

Python-only build (no C++ compilation)#

If you only need to modify Python code, you can skip C++ compilation entirely by reusing precompiled binaries:

TRTLLM_USE_PRECOMPILED=1 pip install -e .

This downloads a precompiled wheel matching the version in tensorrt_llm/version.py and extracts its compiled libraries into your working directory. Override the version with TRTLLM_USE_PRECOMPILED=x.y.z or specify a custom URL/path with TRTLLM_PRECOMPILED_LOCATION.