Support Matrix

TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.

Models

LLM Models

Multi-Modal Models [2]

Hardware

The following table shows the supported hardware for TensorRT-LLM.

If a GPU is not listed, it is important to note that TensorRT-LLM is expected to work on GPUs based on the Volta, Turing, Ampere, Hopper, and Ada Lovelace architectures. Certain limitations may, however, apply.

Hardware Compatibility

Operating System

TensorRT-LLM requires Linux x86_64 or Windows.

GPU Model Architectures

Software

The following table shows the supported software for TensorRT-LLM.

Software Compatibility

Container

24.07

TensorRT

10.4

Precision

  • Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4

  • Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4

  • Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[4]

  • Turing (SM75) - FP32, FP16, INT8[5], INT4

  • Volta (SM70) - FP32, FP16, INT8[5], INT4[6]

[^ReplitCode]:Replit Code is not supported with the transformers 4.45+.

Note

Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to Numerical Precision and examples folder for additional information.