Support Matrix
TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.
Models
LLM Models
[Minitron] (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/nemotron)
Multi-Modal Models [2]
Hardware
The following table shows the supported hardware for TensorRT-LLM.
If a GPU architecture is not listed, the TensorRT-LLM team does not develop or test the software on the architecture and support is limited to community support. In addition, older architectures can have limitations for newer software releases.
Hardware Compatibility |
|
---|---|
Operating System |
TensorRT-LLM requires Linux x86_64, Linux aarch64 or Windows. |
GPU Model Architectures |
Software
The following table shows the supported software for TensorRT-LLM.
Software Compatibility |
|
---|---|
Container |
|
TensorRT |
|
Precision |
|
[^ReplitCode]:Replit Code is not supported with the transformers 4.45+.
Note
Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to Numerical Precision and examples folder for additional information.