NVIDIA Model Optimizer Changelog (Windows)

0.41 (TBD)

Bug Fixes

Fix ONNX 1.19 compatibility issues with CuPy during ONNX INT4 AWQ quantization. ONNX 1.19 uses ml_dtypes.int4 instead of numpy.int8 which caused CuPy failures.

New Features

Add support for ONNX Mixed Precision Weight-only quantization using INT4 and INT8 precisions. Refer quantization example for GenAI LLMs.
Add support for some diffusion models’ quantization on Windows. Refer example script for details.
Add Perplexity and KL-Divergence accuracy benchmarks.

0.33 (2025-07-21)

New Features

Model Optimizer for Windows now supports NvTensorRtRtx execution-provider.

0.27 (2025-04-30)

New Features

New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer Windows Support Matrix for details about supported features and models.
Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check example scripts for getting started with quantizing these models.

0.19 (2024-11-18)

New Features

This is the first official release of Model Optimizer for Windows
ONNX INT4 Quantization: modelopt.onnx.quantization.quantize_int4 now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See Support Matrix for details about supported features and models.
LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer Olive example.
DirectML Deployment Guide: Added DML deployment guide. Refer Onnxruntime deployment guide for details.
MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).
Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.

* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.