Model Optimizer Changelog (Windows)
0.27 (2025-04-30)
New Features
New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer Windows Support Matrix for details about supported features and models.
TensorRT Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check example scripts for getting started with quantizing these models.
0.19 (2024-11-18)
New Features
This is the first official release of TensorRT Model Optimizer for Windows
ONNX INT4 Quantization:
modelopt.onnx.quantization.quantize_int4
now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See Support Matrix for details about supported features and models.LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example
DirectML Deployment Guide: Added DML deployment guide. Refer DirectML.
MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).
Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.
* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.