.. _Support_Matrix: ============== Support Matrix ============== Feature Support Matrix ====================== .. tab:: Linux .. list-table:: :widths: 20 40 20 20 :header-rows: 1 :stub-columns: 1 * - Quantization Format - Details - Supported Model Formats - Deployment * - FP8 - * Per-Tensor FP8 Weight & Activations * GPUs: Ada and Later - PyTorch, ONNX* - TensorRT*, TensorRT-LLM * - INT8 - * Per-channel INT8 Weights, Per-Tensor FP8 Activations * Uses Smooth Quant Algorithm * GPUs: Ampere and Later - PyTorch, ONNX* - TensorRT*, TensorRT-LLM * - W4A16 (INT4 Weights Only) - * Block-wise INT4 Weights, F16 Activations * Uses AWQ Algorithm * GPUs: Ampere and Later - PyTorch, ONNX - TensorRT, TensorRT-LLM * - W4A8 (INT4 Weights, FP8 Activations) - * Block-wise INT8 Weights, Per-Tensor FP8 Activations * Uses AWQ Algorithm * GPUs: Ada and Later - PyTorch*, ONNX* - TensorRT-LLM .. tab:: Windows .. list-table:: :widths: 20 40 20 20 :header-rows: 1 :stub-columns: 1 * - Quantization Format - Details - Supported Model Formats - Deployment * - W4A16 (INT4 Weights Only) - * Block-wise INT4 Weights, F16 Activations * Uses AWQ Algorithm * GPUs: Ampere and Later - PyTorch*, ONNX - ORT-DirectML, TensorRT*, TensorRT-LLM* * - W4A8 (INT4 Weights, FP8 Activations) - * Block-wise INT8 Weights, Per-Tensor FP8 Activations * Uses AWQ Algorithm * GPUs: Ada and Later - PyTorch* - TensorRT-LLM* * - FP8 - * Per-Tensor FP8 Weight & Activations * GPUs: Ada and Later - PyTorch*, ONNX* - TensorRT*, TensorRT-LLM* * - INT8 - * Per-channel INT8 Weights, Per-Tensor FP8 Activations * Uses Smooth Quant Algorithm * GPUs: Ada and Later - PyTorch*, ONNX* - TensorRT*, TensorRT-LLM* .. note:: Features marked with an asterisk (*) are considered experimental. Model Support Matrix ==================== .. tab:: Linux Please checkout the model support matrix `here `_. .. tab:: Windows .. list-table:: :header-rows: 1 * - Model - ONNX INT4 AWQ * - Llama3.1-8B-Instruct - Yes * - Phi3.5-mini-Instruct - Yes * - Mistral-7B-Instruct-v0.3 - Yes * - Llama3.2-3B-Instruct - Yes * - Gemma-2b-it - Yes * - Nemotron Mini 4B Instruct - Yes