Installing on Linux#
Install TensorRT-LLM (tested on Ubuntu 24.04).
(Optional) pip3 install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 sudo apt-get -y install libopenmpi-dev && pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs. On prior GPUs, this extra installation is not required.
If using the PyTorch NGC Container image, the prerequisite steps for installing NVIDIA Blackwell-enabled PyTorch package and
libopenmpi-dev
are not required.Sanity check the installation by running the following in Python (tested on Python 3.12):
1from tensorrt_llm import LLM, SamplingParams 2 3 4def main(): 5 6 prompts = [ 7 "Hello, my name is", 8 "The president of the United States is", 9 "The capital of France is", 10 "The future of AI is", 11 ] 12 sampling_params = SamplingParams(temperature=0.8, top_p=0.95) 13 14 llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0") 15 16 outputs = llm.generate(prompts, sampling_params) 17 18 # Print the outputs. 19 for output in outputs: 20 prompt = output.prompt 21 generated_text = output.outputs[0].text 22 print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") 23 24 25# The entry point of the program need to be protected for spawning processes. 26if __name__ == '__main__': 27 main()
Known limitations
There are some known limitations when you pip install pre-built TensorRT-LLM wheel package.
MPI in the Slurm environment
If you encounter an error while running TensorRT-LLM in a Slurm-managed cluster, you need to reconfigure the MPI installation to work with Slurm. The setup methods depends on your slurm configuration, pls check with your admin. This is not a TensorRT-LLM specific, rather a general mpi+slurm issue.
The application appears to have been direct launched using "srun", but OMPI was not built with SLURM support. This usually happens when OMPI was not configured --with-slurm and we weren't able to discover a SLURM installation in the usual places.
CUDA Toolkit
pip install tensorrt-llm
won’t install CUDA toolkit in your system, and the CUDA Toolkit is not required if want to just deploy a TensorRT-LLM engine. TensorRT-LLM uses the ModelOpt to quantize a model, while the ModelOpt requires CUDA toolkit to jit compile certain kernels which is not included in the pytorch to do quantization effectively. Please install CUDA toolkit when you see the following message when running ModelOpt quantization./usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/cpp_extension.py:65: UserWarning: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. Unable to load extension modelopt_cuda_ext and falling back to CPU version.
The installation of CUDA toolkit can be found in CUDA Toolkit Documentation.
Install inside the PyTorch NGC Container
The PyTorch NGC Container may lock Python package versions via the
/etc/pip/constraint.txt
file. When installing the pre-built TensorRT-LLM wheel inside the PyTorch NGC Container, you need to clear this file first.[ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt
PyTorch NGC Container typically includes a pre-installed
tensorrt
Python package. If there is a version mismatch between this pre-installed package and the version required by the TensorRT-LLM wheel, you will need to uninstall the existingtensorrt
package before installing TensorRT-LLM.pip uninstall -y tensorrt