Installing on Windows
Note
The Windows release of TensorRT-LLM is currently in beta. We recommend using the rel
branch for the most stable experience.
Prerequisites
Clone this repository using Git for Windows.
Install the dependencies one of two ways:
Run the provided PowerShell script;
setup_env.ps1
, which installs Python, CUDA 12.2, and Microsoft MPI automatically with default settings. Run PowerShell as Administrator to use the script.
./setup_env.ps1 [-skipCUDA] [-skipPython] [-skipMPI]
Install the dependencies one at a time.
Install Python 3.10.
Select Add python.exe to PATH at the start of the installation. The installation may only add the
python
command, but not thepython3
command.Navigate to the installation path
%USERPROFILE%\AppData\Local\Programs\Python\Python310
(AppData
is a hidden folder) and copypython.exe
topython3.exe
.
Install CUDA 12.2 Toolkit. Use the Express Installation option. Installation may require a restart.
Download and install Microsoft MPI. You will be prompted to choose between an
exe
, which installs the MPI executable, and anmsi
, which installs the MPI SDK. Download and install both.
Download and unzip cuDNN.
Move the folder to a location you can reference later, such as
%USERPROFILE%\inference\cuDNN
.Add the libraries and binaries for cuDNN to your system’s
Path
environment variable.Click the Windows button and search for environment variables.
Click Edit the system environment variables > Environment Variables.
In the new window under System variables, click Path > Edit. Add New lines for the
bin
andlib
directories of cuDNN. YourPath
should include lines like this:
%USERPROFILE%\inference\cuDNN\bin %SERPROFILE%\inference\cuDNN\lib
Click OK on all the open dialog windows.
Close and re-open any existing PowerShell or Git Bash windows so they pick up the new
Path
.
Steps
Install TensorRT-LLM.
pip install tensorrt_llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121
Run the following command to verify that your TensorRT-LLM installation is working properly.
python -c "import tensorrt_llm; print(tensorrt_llm._utils.trt_version())"
Build the model.
Deploy the model.