Building from Source Code on Windows
Note
This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.
Prerequisites
Install prerequisites listed in our Installing on Windows document.
Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.
Download and install Visual Studio 2022.
Download and unzip TensorRT 10.7.0.23.
Building a TensorRT-LLM Docker Image
Docker Desktop
Install Docker Desktop on Windows.
Set the following configurations:
Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select Switch to Windows containers….
In the Docker Desktop settings on the General tab, uncheck Use the WSL 2 based image.
On the Docker Engine tab, set your configuration file to:
{
"experimental": true
}
Note
After building, copy the files out of your container. docker cp
is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, trt-llm-build
, to your container when you run it for moving files between the container and host system.
Acquire an Image
The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the TensorRT-LLM\windows\
folder, run the build command:
docker build -f .\docker\Dockerfile -t tensorrt-llm-windows-build:latest .
And your image is now ready for use.
Run the Container
Run the container in interactive mode with your build folder mounted. Specify a memory limit with the -m
flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.
docker run -it -m 12g -v .\trt-llm-build:C:\workspace\trt-llm-build tensorrt-llm-windows-build:latest
Build and Extract Files
Clone and setup the TensorRT-LLM repository within the container.
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
Build TensorRT-LLM. This command generates
build\tensorrt_llm-*.whl
.
python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-10.7.0.23\
Copy or move
build\tensorrt_llm-*.whl
into your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, you’ll also need to gather various DLLs from the build into your mounted folder. For more information, refer to C++ Runtime Usage.
Building TensorRT-LLM on Bare Metal
Prerequisites
Install all prerequisites (
git
,python
,CUDA
) listed in our Installing on Windows document.Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.6.3 installer. To install these assets, download the CUDA 11.8 Toolkit.
During installation, select Advanced installation.
Nsight NVTX is located in the CUDA drop-down.
Deselect all packages, and select Nsight NVTX.
Install the dependencies one of two ways:
Run the
setup_build_env.ps1
script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.Run PowerShell as Administrator to use the script.
./setup_build_env.ps1 -TRTPath <TRT-containing-folder> [-skipCMake] [-skipVSBuildTools] [-skipTRT]
Close and reopen PowerShell after running the script so that
Path
changes take effect.Supply a directory that already exists to contain TensorRT to
-TRTPath
, for example,-TRTPath ~/inference
may be valid, but-TRTPath ~/inference/TensorRT
will not be valid ifTensorRT
does not exist.-TRTPath
isn’t required if-skipTRT
is supplied.
Install the dependencies one at a time.
Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.
Download and install Visual Studio 2022. When prompted to select more Workloads, check Desktop development with C++.
Download and unzip TensorRT 10.7.0.23. Move the folder to a location you can reference later, such as
%USERPROFILE%\inference\TensorRT
.Add the libraries for TensorRT to your system’s
Path
environment variable. YourPath
should include a line like this:
%USERPROFILE%\inference\TensorRT\lib
Close and re-open any existing PowerShell or Git Bash windows so they pick up the new
Path
.Remove existing
tensorrt
wheels first by executing
pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
Install the TensorRT core libraries, run PowerShell, and use
pip
to install the Python wheel.
pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-*.whl
Verify that your TensorRT installation is working properly.
python -c "import tensorrt as trt; print(trt.__version__)"
Steps
Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands.
If you installed Visual Studio Build Tools (that is, used the
setup_build_env.ps1
script):
& 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
If you installed Visual Studio Community (e.g. via manual GUI setup):
& 'C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
In PowerShell, from the
TensorRT-LLM
root folder, run:
python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root>
The -a
flag specifies the device architecture. "89-real"
supports GeForce 40-series cards.
The flag -D "ENABLE_MULTI_DEVICE=0"
, while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.
This command generates build\tensorrt_llm-*.whl
.
Linking with the TensorRT-LLM C++ Runtime
Note
This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.
Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.
Building from source produces the following library files.
tensorrt_llm
libraries located incpp\build\tensorrt_llm
tensorrt_llm.dll
- Shared librarytensorrt_llm.exp
- Export filetensorrt_llm.lib
- Stub for linking totensorrt_llm.dll
Dependency libraries (these get copied to
tensorrt_llm\libs\
)nvinfer_plugin_tensorrt_llm
libraries located incpp\build\tensorrt_llm\plugins\
nvinfer_plugin_tensorrt_llm.dll
nvinfer_plugin_tensorrt_llm.exp
nvinfer_plugin_tensorrt_llm.lib
th_common
libraries located incpp\build\tensorrt_llm\thop\
th_common.dll
th_common.exp
th_common.lib
The locations of the DLLs, in addition to some torch
DLLs and TensorRT
DLLs, must be added to the Windows Path
in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your Path
. When complete, your Path
should include lines similar to these:
%USERPROFILE%\inference\TensorRT\lib
%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib
Your Path
additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.
Again, close and re-open any existing PowerShell or Git Bash windows so they pick up the new Path
.