Model Optimizer Changelog (Windows)
0.19 (2024-11-18)
New Features
This is the first official release of TensorRT Model Optimizer for Windows
ONNX INT4 Quantization:
modelopt.onnx.quantization.quantize_int4
now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See Feature Support Matrix for details about supported features and models.LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example
DirectML Deployment Guide: Added DML deployment guide. Refer DirectML Deployment.
MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).
Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.
* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.