Support Matrix
TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.
Models
LLM Models
Replit Code[^ReplitCode]
Multi-Modal Models [2]
Hardware
The following table shows the supported hardware for TensorRT-LLM.
If a GPU is not listed, it is important to note that TensorRT-LLM is expected to work on GPUs based on the Volta, Turing, Ampere, Hopper, and Ada Lovelace architectures. Certain limitations may, however, apply.
Hardware Compatibility |
|
---|---|
Operating System |
TensorRT-LLM requires Linux x86_64 or Windows. |
GPU Model Architectures |
Software
The following table shows the supported software for TensorRT-LLM.
Software Compatibility |
|
---|---|
Container |
|
TensorRT |
|
Precision |
[^ReplitCode]:Replit Code is not supported with the transformers 4.45+.
Note
Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to Numerical Precision and examples folder for additional information.