Support Matrix
Feature Support Matrix
Quantization Format |
Details |
Supported Model Formats |
Deployment |
|---|---|---|---|
FP4 |
|
PyTorch |
TensorRT, TensorRT-LLM |
FP8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
INT8 |
|
PyTorch, ONNX* |
TensorRT*, TensorRT-LLM |
W4A16 (INT4 Weights Only) |
|
PyTorch, ONNX |
TensorRT, TensorRT-LLM |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch*, ONNX* |
TensorRT-LLM |
Quantization Format |
Details |
Supported Model Formats |
Deployment |
|---|---|---|---|
W4A16 (INT4 Weights Only) |
|
PyTorch*, ONNX |
ORT-DML, ORT-CUDA, ORT-TRT-RTX, TensorRT*, TensorRT-LLM* |
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch* |
TensorRT-LLM* |
FP8 |
|
PyTorch*, ONNX |
TensorRT*, TensorRT-LLM*, ORT-CUDA |
INT8 |
|
PyTorch*, ONNX |
TensorRT*, TensorRT-LLM*, ORT-CUDA |
Note
Features marked with an asterisk (*) are considered experimental.
ORT-CUDA,ORT-DML, andORT-TRT-RTXare ONNX Runtime Execution Providers (EPs) for CUDA, DirectML, and TensorRT-RTX respectively. Support for different deployment backends can vary across models.