Installing on Linux via pip#

  1. Install TensorRT-LLM (tested on Ubuntu 24.04).

    Install prerequisites

    Before the pre-built Python wheel can be installed via pip, a few prerequisites must be put into place:

    # Optional step: Only required for Blackwell and Grace Hopper
    pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
    
    sudo apt-get -y install libopenmpi-dev
    

    PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell and Grace Hopper GPUs. On prior GPUs, this extra installation is not required.

    Tip

    Instead of manually installing the preqrequisites as described above, it is also possible to use the pre-built TensorRT-LLM Develop container image hosted on NGC (see here for information on container tags).

    Install pre-built TensorRT-LLM wheel

    Once all prerequisites are in place, TensorRT-LLM can be installed as follows:

    pip3 install --upgrade pip setuptools && pip3 install tensorrt_llm
    
  2. Sanity check the installation by running the following in Python (tested on Python 3.12):

     1from tensorrt_llm import LLM, SamplingParams
     2
     3
     4def main():
     5
     6    # Model could accept HF model name, a path to local HF model,
     7    # or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF.
     8    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
     9
    10    # Sample prompts.
    11    prompts = [
    12        "Hello, my name is",
    13        "The capital of France is",
    14        "The future of AI is",
    15    ]
    16
    17    # Create a sampling params.
    18    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
    19
    20    for output in llm.generate(prompts, sampling_params):
    21        print(
    22            f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}"
    23        )
    24
    25    # Got output like
    26    # Prompt: 'Hello, my name is', Generated text: '\n\nJane Smith. I am a student pursuing my degree in Computer Science at [university]. I enjoy learning new things, especially technology and programming'
    27    # Prompt: 'The president of the United States is', Generated text: 'likely to nominate a new Supreme Court justice to fill the seat vacated by the death of Antonin Scalia. The Senate should vote to confirm the'
    28    # Prompt: 'The capital of France is', Generated text: 'Paris.'
    29    # Prompt: 'The future of AI is', Generated text: 'an exciting time for us. We are constantly researching, developing, and improving our platform to create the most advanced and efficient model available. We are'
    30
    31
    32if __name__ == '__main__':
    33    main()
    

Known limitations

There are some known limitations when you pip install pre-built TensorRT-LLM wheel package.

  1. MPI in the Slurm environment

    If you encounter an error while running TensorRT-LLM in a Slurm-managed cluster, you need to reconfigure the MPI installation to work with Slurm. The setup methods depends on your slurm configuration, pls check with your admin. This is not a TensorRT-LLM specific, rather a general mpi+slurm issue.

    The application appears to have been direct launched using "srun",
    but OMPI was not built with SLURM support. This usually happens
    when OMPI was not configured --with-slurm and we weren't able
    to discover a SLURM installation in the usual places.
    
  2. CUDA Toolkit

    pip install tensorrt-llm won’t install CUDA toolkit in your system, and the CUDA Toolkit is not required if want to just deploy a TensorRT-LLM engine. TensorRT-LLM uses the ModelOpt to quantize a model, while the ModelOpt requires CUDA toolkit to jit compile certain kernels which is not included in the pytorch to do quantization effectively. Please install CUDA toolkit when you see the following message when running ModelOpt quantization.

    /usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/cpp_extension.py:65:
    UserWarning: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
    Unable to load extension modelopt_cuda_ext and falling back to CPU version.
    

    The installation of CUDA toolkit can be found in CUDA Toolkit Documentation.