Building from Source Code on Windows

Note

This section is for advanced users. Skip this section if you plan to use the pre-built TensorRT-LLM release wheel.

Prerequisites

  1. Install prerequisites listed in our Installing on Windows document.

  2. Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.

  3. Download and install Visual Studio 2022.

  4. Download and unzip TensorRT 10.1.0.27.

Building a TensorRT-LLM Docker Image

Docker Desktop

  1. Install Docker Desktop on Windows.

  2. Set the following configurations:

  3. Right-click the Docker icon in the Windows system tray (bottom right of your taskbar) and select Switch to Windows containers….

  4. In the Docker Desktop settings on the General tab, uncheck Use the WSL 2 based image.

  5. On the Docker Engine tab, set your configuration file to:

{
  "experimental": true
}

Note

After building, copy the files out of your container. docker cp is not supported on Windows for Hyper-V based images. Unless you are using WSL 2 based images, mount a folder, for example, trt-llm-build, to your container when you run it for moving files between the container and host system.

Acquire an Image

The Docker container will be hosted for public download in a future release. At this time, it must be built manually. From the TensorRT-LLM\windows\ folder, run the build command:

docker build -f .\docker\Dockerfile -t tensorrt-llm-windows-build:latest .

And your image is now ready for use.

Run the Container

Run the container in interactive mode with your build folder mounted. Specify a memory limit with the -m flag. By default, the limit is 2 GB, which is not sufficient to build TensorRT-LLM.

docker run -it -m 12g -v .\trt-llm-build:C:\workspace\trt-llm-build tensorrt-llm-windows-build:latest

Build and Extract Files

  1. Clone and setup the TensorRT-LLM repository within the container.

git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
  1. Build TensorRT-LLM. This command generates build\tensorrt_llm-*.whl.

python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-10.1.0.27\
  1. Copy or move build\tensorrt_llm-*.whl into your mounted folder so it can be accessed on your host machine. If you intend to use the C++ runtime, you’ll also need to gather various DLLs from the build into your mounted folder. For more information, refer to C++ Runtime Usage.

Building TensorRT-LLM on Bare Metal

Prerequisites

  1. Install all prerequisites (git, python, CUDA) listed in our Installing on Windows document.

  2. Install Nsight NVTX. TensorRT-LLM on Windows currently depends on NVTX assets that do not come packaged with the CUDA 12.4.1 installer. To install these assets, download the CUDA 11.8 Toolkit.

    1. During installation, select Advanced installation.

    2. Nsight NVTX is located in the CUDA drop-down.

    3. Deselect all packages, and select Nsight NVTX.

  3. Install the dependencies one of two ways:

    1. Run the setup_build_env.ps1 script, which installs CMake, Microsoft Visual Studio Build Tools, and TensorRT automatically with default settings.

      1. Run PowerShell as Administrator to use the script.

      ./setup_build_env.ps1 -TRTPath <TRT-containing-folder> [-skipCMake] [-skipVSBuildTools] [-skipTRT]
      
      1. Close and reopen PowerShell after running the script so that Path changes take effect.

      2. Supply a directory that already exists to contain TensorRT to -TRTPath, for example, -TRTPath ~/inference may be valid, but -TRTPath ~/inference/TensorRT will not be valid if TensorRT does not exist. -TRTPath isn’t required if -skipTRT is supplied.

    2. Install the dependencies one at a time.

      1. Install CMake, version 3.27.7 is recommended, and select the option to add it to the system path.

      2. Download and install Visual Studio 2022. When prompted to select more Workloads, check Desktop development with C++.

      3. Download and unzip TensorRT 10.1.0.27. Move the folder to a location you can reference later, such as %USERPROFILE%\inference\TensorRT.

        1. Add the libraries for TensorRT to your system’s Path environment variable. Your Path should include a line like this:

        %USERPROFILE%\inference\TensorRT\lib
        
        1. Close and re-open any existing PowerShell or Git Bash windows so they pick up the new Path.

        2. Remove existing tensorrt wheels first by executing

        pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings
        pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
        
        1. Install the TensorRT core libraries, run PowerShell, and use pip to install the Python wheel.

        pip install %USERPROFILE%\inference\TensorRT\python\tensorrt-*.whl
        
        1. Verify that your TensorRT installation is working properly.

        python -c "import tensorrt as trt; print(trt.__version__)"
        

Steps

  1. Launch a 64-bit Developer PowerShell. From your usual PowerShell terminal, run one of the following two commands.

    1. If you installed Visual Studio Build Tools (that is, used the setup_build_env.ps1 script):

    & 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
    
    1. If you installed Visual Studio Community (e.g. via manual GUI setup):

    & 'C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\Launch-VsDevShell.ps1' -Arch amd64
    
  2. In PowerShell, from the TensorRT-LLM root folder, run:

python .\scripts\build_wheel.py -a "89-real" --trt_root <path_to_trt_root>

The -a flag specifies the device architecture. "89-real" supports GeForce 40-series cards.

The flag -D "ENABLE_MULTI_DEVICE=0", while not specified here, is implied on Windows. Multi-device inference is supported on Linux, but not on Windows.

This command generates build\tensorrt_llm-*.whl.

Linking with the TensorRT-LLM C++ Runtime

Note

This section is for advanced users. Skip this section if you do not intend to use the TensorRT-LLM C++ runtime directly. You must build from source to use the C++ runtime.

Building from source creates libraries that can be used if you wish to directly link against the C++ runtime for TensorRT-LLM. These libraries are also required if you wish to run C++ unit tests and some benchmarks.

Building from source produces the following library files.

  • tensorrt_llm libraries located in cpp\build\tensorrt_llm

    • tensorrt_llm.dll - Shared library

    • tensorrt_llm.exp - Export file

    • tensorrt_llm.lib - Stub for linking to tensorrt_llm.dll

  • Dependency libraries (these get copied to tensorrt_llm\libs\)

    • nvinfer_plugin_tensorrt_llm libraries located in cpp\build\tensorrt_llm\plugins\

      • nvinfer_plugin_tensorrt_llm.dll

      • nvinfer_plugin_tensorrt_llm.exp

      • nvinfer_plugin_tensorrt_llm.lib

    • th_common libraries located in cpp\build\tensorrt_llm\thop\

      • th_common.dll

      • th_common.exp

      • th_common.lib

The locations of the DLLs, in addition to some torch DLLs, must be added to the Windows Path in order to use the TensorRT-LLM C++ runtime. Append the locations of these libraries to your Path. When complete, your Path should include lines similar to these:

%USERPROFILE%\inference\TensorRT-LLM\cpp\build\tensorrt_llm
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\tensorrt_llm\libs
%USERPROFILE%\AppData\Local\Programs\Python\Python310\Lib\site-packages\torch\lib

Your Path additions may differ, particularly if you used the Docker method and copied all the relevant DLLs into a single folder.