Support Matrix
Feature Support Matrix
Quantization Format |
Details |
Supported Model Formats |
Deployment |
---|---|---|---|
FP4 |
|
PyTorch |
TensorRT, TensorRT-LLM |
FP8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
INT8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
W4A16 (INT4 Weights Only) |
|
PyTorch, ONNX |
TensorRT, TensorRT-LLM |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch*, ONNX* |
TensorRT-LLM |
Quantization Format |
Details |
Supported Model Formats |
Deployment |
---|---|---|---|
W4A16 (INT4 Weights Only) |
|
PyTorch*, ONNX |
ORT-DirectML, TensorRT*, TensorRT-LLM* |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch* |
TensorRT-LLM* |
FP8 |
|
PyTorch*, ONNX |
TensorRT*, TensorRT-LLM*, ORT-CUDA |
INT8 |
|
PyTorch*, ONNX |
TensorRT*, TensorRT-LLM*, ORT-CUDA |
Note
Features marked with an asterisk (*) are considered experimental.
Model Support Matrix
Please checkout the model support matrix here.
Model
ONNX INT4 AWQ (W4A16)
ONNX INT8 Max (W8A8)
ONNX FP8 Max (W8A8)
Llama3.1-8B-Instruct
Yes
No
No
Phi3.5-mini-Instruct
Yes
No
No
Mistral-7B-Instruct-v0.3
Yes
No
No
Llama3.2-3B-Instruct
Yes
No
No
Gemma-2b-it
Yes
No
No
Gemma-2-2b
Yes
No
No
Gemma-2-9b
Yes
No
No
Nemotron Mini 4B Instruct
Yes
No
No
Qwen2.5-7B-Instruct
Yes
No
No
DeepSeek-R1-Distill-Llama-8B
Yes
No
No
DeepSeek-R1-Distil-Qwen-1.5B
Yes
No
No
DeepSeek-R1-Distil-Qwen-7B
Yes
No
No
DeepSeek-R1-Distill-Qwen-14B
Yes
No
No
Mistral-NeMo-Minitron-2B-128k-Instruct
Yes
No
No
Mistral-NeMo-Minitron-4B-128k-Instruct
Yes
No
No
Mistral-NeMo-Minitron-8B-128k-Instruct
Yes
No
No
whisper-large
No
Yes
Yes
sam2-hiera-large
No
Yes
Yes