Support Matrix
Feature Support Matrix
Quantization Format |
Details |
Supported Model Formats |
Deployment |
---|---|---|---|
FP8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
INT8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
W4A16 (INT4 Weights Only) |
|
PyTorch, ONNX |
TensorRT, TensorRT-LLM |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch*, ONNX* |
TensorRT-LLM |
Quantization Format |
Details |
Supported Model Formats |
Deployment |
---|---|---|---|
W4A16 (INT4 Weights Only) |
|
PyTorch*, ONNX |
ORT-DirectML, TensorRT*, TensorRT-LLM* |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch* |
TensorRT-LLM* |
FP8 |
|
PyTorch*, ONNX* |
TensorRT*, TensorRT-LLM* |
INT8 |
|
PyTorch*, ONNX* |
TensorRT*, TensorRT-LLM* |
Note
Features marked with an asterisk (*) are considered experimental.
Model Support Matrix
Please checkout the model support matrix here.
Model |
ONNX INT4 AWQ |
---|---|
Llama3.1-8B-Instruct |
Yes |
Phi3.5-mini-Instruct |
Yes |
Mistral-7B-Instruct-v0.3 |
Yes |
Llama3.2-3B-Instruct |
Yes |
Gemma-2b-it |
Yes |
Nemotron Mini 4B Instruct |
Yes |