Support Matrix

TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.

Models

LLM Models

Multi-Modal Models [2]

Hardware

The following table shows the supported hardware for TensorRT-LLM.

If a GPU architecture is not listed, the TensorRT-LLM team does not develop or test the software on the architecture and support is limited to community support. In addition, older architectures can have limitations for newer software releases.

Hardware Compatibility

Operating System

TensorRT-LLM requires Linux x86_64, Linux aarch64 or Windows.

GPU Model Architectures

Software

The following table shows the supported software for TensorRT-LLM.

Software Compatibility

Container

24.10

TensorRT

10.6

Precision

  • Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4

  • Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4

  • Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[4]

[^ReplitCode]:Replit Code is not supported with the transformers 4.45+.

Note

Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to Numerical Precision and examples folder for additional information.