Feature Support Matrix

Quantization Techniques - Windows

Quantization Format	Details	Supported Model Formats	Deployment
W4A16 (INT4 Weights Only)	Block-wise INT4 Weights, F16 Activations Uses AWQ Algorithm GPUs: Ampere and Later	PyTorch*, ONNX	ORT-DirectML, TensorRT, TensorRT-LLM
W4A8 (INT4 Weights, FP8 Activations)	Block-wise INT8 Weights, Per-Tensor FP8 Activations Uses AWQ Algorithm GPUs: Ada and Later	PyTorch*	TensorRT-LLM*
FP8	Per-Tensor FP8 Weight & Activations GPUs: Ada and Later	PyTorch, ONNX	TensorRT, TensorRT-LLM
INT8	Per-channel INT8 Weights, Per-Tensor FP8 Activations Uses Smooth Quant Algorithm GPUs: Ada and Later	PyTorch, ONNX	TensorRT, TensorRT-LLM

Note

Features marked with an asterisk (*) are considered experimental.