Feature Support Matrix
Quantization Techniques - Windows
Quantization Format |
Details |
Supported Model Formats |
Deployment |
---|---|---|---|
W4A16 (INT4 Weights Only) |
|
PyTorch*, ONNX |
|
W4A8 (INT4 Weights, FP8 Activations) |
|
PyTorch* |
|
FP8 |
|
PyTorch*, ONNX* |
|
INT8 |
|
PyTorch*, ONNX* |
|
Note
Features marked with an asterisk (*) are considered experimental.
Supported Models - Windows
Model |
ONNX INT4 AWQ |
Llama3.1-8B-Instruct |
Yes |
Phi3.5-mini-Instruct |
Yes |
Mistral-7B-Instruct-v0.3 |
Yes |
Llama3.2-3B-Instruct |
Yes |
Gemma-2b-it |
Yes |
Nemotron Mini 4B Instruct |
Yes |