Support Matrix#

TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.

Models (PyTorch Backend)#

Architecture	Model	HuggingFace Example	Modality
`BertForSequenceClassification`	BERT-based	`textattack/bert-base-uncased-yelp-polarity`	L
`DeciLMForCausalLM`	Nemotron	`nvidia/Llama-3_1-Nemotron-51B-Instruct`	L
`DeepseekV3ForCausalLM`	DeepSeek-V3	`deepseek-ai/DeepSeek-V3`	L
`Exaone4ForCausalLM`	EXAONE 4.0	`LGAI-EXAONE/EXAONE-4.0-32B`	L
`Gemma3ForCausalLM`	Gemma 3	`google/gemma-3-1b-it`	L
`Gemma3ForConditionalGeneration`	Gemma 3	`google/gemma-3-27b-it`	L + I
`HCXVisionForCausalLM`	HyperCLOVAX-SEED-Vision	`naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B`	L + I
`LlavaLlamaModel`	VILA	`Efficient-Large-Model/NVILA-8B`	L + I + V
`LlavaNextForConditionalGeneration`	LLaVA-NeXT	`llava-hf/llava-v1.6-mistral-7b-hf`	L + I
`LlamaForCausalLM`	Llama 3.1, Llama 3, Llama 2, LLaMA	`meta-llama/Meta-Llama-3.1-70B`	L
`Llama4ForConditionalGeneration`	Llama 4	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	L + I
`MistralForCausalLM`	Bielik	`speakleash/Bielik-11B-v2.2-Instruct`	L
`MistralForCausalLM`	Mistral	`mistralai/Mistral-7B-v0.1`	L
`Mistral3ForConditionalGeneration`	Mistral3	`mistralai/Mistral-Small-3.1-24B-Instruct-2503`	L + I
`MixtralForCausalLM`	Mixtral	`mistralai/Mixtral-8x7B-v0.1`	L
`MllamaForConditionalGeneration`	Llama 3.2	`meta-llama/Llama-3.2-11B-Vision`	L
`NemotronForCausalLM`	Nemotron-3, Nemotron-4, Minitron	`nvidia/Minitron-8B-Base`	L
`NemotronNASForCausalLM`	NemotronNAS	`nvidia/Llama-3_3-Nemotron-Super-49B-v1`	L
`Phi4MMForCausalLM`	Phi-4-multimodal	`microsoft/Phi-4-multimodal-instruct`	L + I + A
`Qwen2ForCausalLM`	QwQ, Qwen2	`Qwen/Qwen2-7B-Instruct`	L
`Qwen2ForProcessRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-PRM-7B`	L
`Qwen2ForRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-RM-72B`	L
`Qwen2VLForConditionalGeneration`	Qwen2-VL	`Qwen/Qwen2-VL-7B-Instruct`	L + I + V
`Qwen2_5_VLForConditionalGeneration`	Qwen2.5-VL	`Qwen/Qwen2.5-VL-7B-Instruct`	L + I + V
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-8B`	L
`Qwen3MoeForCausalLM`	Qwen3MoE	`Qwen/Qwen3-30B-A3B`	L

Note:

L: Language
I: Image
V: Video
A: Audio

Models (TensorRT Backend)#

LLM Models#

Hardware#

The following table shows the supported hardware for TensorRT-LLM.

If a GPU architecture is not listed, the TensorRT-LLM team does not develop or test the software on the architecture and support is limited to community support. In addition, older architectures can have limitations for newer software releases.

	Hardware Compatibility
Operating System	TensorRT-LLM requires Linux x86_64 or Linux aarch64.
GPU Model Architectures	NVIDIA GB200 NVL72 NVIDIA Blackwell Architecture NVIDIA Grace Hopper Superchip NVIDIA Hopper Architecture NVIDIA Ada Lovelace Architecture NVIDIA Ampere Architecture

Software#

The following table shows the supported software for TensorRT-LLM.

	Software Compatibility
Container	25.06
TensorRT	10.11
Precision	Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4 Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4 Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[5]

Note

Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer to Numerical Precision and examples folder for additional information.

Support Matrix#

Models (PyTorch Backend)#

Models (TensorRT Backend)#

LLM Models#

Multi-Modal Models [3]#

Hardware#

Software#