Model Optimizer Changelog (Windows)

0.19 (2024-11-18)

New Features

  • This is the first official release of TensorRT Model Optimizer for Windows

  • ONNX INT4 Quantization: modelopt.onnx.quantization.quantize_int4 now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See Support Matrix for details about supported features and models.

  • LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example

  • DirectML Deployment Guide: Added DML deployment guide. Refer DirectML Deployment.

  • MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).

  • Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.

* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.