Install ModelOpt-Windows with Olive

ModelOpt-Windows can be installed and used through Olive to perform model optimization using quantization technique. Follow the steps below to configure Olive for use with ModelOpt-Windows.

Setup Steps for Olive with ModelOpt-Windows

1. Installation

Install Olive and the Model Optimizer: Run the following command to install Olive with NVIDIA Model Optimizer - Windows:
pip install olive-ai[nvmo]
Install Prerequisites: Ensure all required dependencies are installed. For example, to install CUDA Execution-Provider (EP) based onnxruntime and onnxruntime-genai packages, run the following commands:
$ pip install onnxruntime-genai-cuda
$ pip install onnxruntime-gpu
Above onnxruntime and onnxruntime-genai packages enable Olive workflow with CUDA Execution-Provider (EP). To use other EPs, install corresponding packages.

Refer to the ONNX Runtime’s EP documentation for details about different EPs, their requirements, and installation instructions.

Additionally, ensure that dependencies for Model Optimizer - Windows are met as mentioned in the Install ModelOpt-Windows as a Standalone Toolkit.

2. Configure Olive for Model Optimizer – Windows

New Olive Pass: Olive introduces a new pass, NVModelOptQuantization (or “nvmo”), specifically designed for model quantization using Model Optimizer – Windows.

Add to Configuration: To apply quantization to your target model, include this pass in the Olive configuration file. [Refer this guide for details about this pass.].

3. Setup Other Passes in Olive Configuration

Add Other Passes: Add additional passes to the Olive configuration file as needed for the desired Olive workflow of your input model.

4. Install other dependencies

Install other requirements as needed by the Olive scripts and config.

5. Run the Optimization

Execute Optimization: To start the optimization process, run the following commands:
$ olive run --config <config json> --setup
$ olive run --config <config json>
Alternatively, you can execute the optimization using the following Python code:
from olive.workflows import run as olive_run

olive_run("config.json")

Note:

Currently, the Model Optimizer - Windows only supports Onnx Runtime GenAI based LLM models in the Olive workflow.
To get started with Olive, refer to the official Olive documentation.