Model Optimizer Changelog (Windows)

0.27 (2025-04-30)

New Features

  • New LLM models like DeepSeek etc. are supported with ONNX INT4 AWQ quantization on Windows. Refer Windows Support Matrix for details about supported features and models.

  • TensorRT Model Optimizer for Windows now supports ONNX INT8 and FP8 quantization (W8A8) of SAM2 and Whisper models. Check example scripts for getting started with quantizing these models.

0.19 (2024-11-18)

New Features

  • This is the first official release of TensorRT Model Optimizer for Windows

  • ONNX INT4 Quantization: modelopt.onnx.quantization.quantize_int4 now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See Support Matrix for details about supported features and models.

  • LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example

  • DirectML Deployment Guide: Added DML deployment guide. Refer DirectML.

  • MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).

  • Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.

* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.