Install ModelOpt-Windows with Olive
ModelOpt-Windows can be installed and used through Olive to quantize Large Language Models (LLMs) in ONNX format for deployment with DirectML. Follow the steps below to configure Olive for use with ModelOpt-Windows.
Setup Steps for Olive with ModelOpt-Windows
1. Installation
Install Olive and the Model Optimizer: Run the following command to install Olive with NVIDIA Model Optimizer - Windows:
pip install olive-ai[nvmo]Install Prerequisites: Ensure all required dependencies are installed. Use the following commands to install the necessary packages:
$ pip install onnxruntime-genai-directml>=0.4.0 $ pip install onnxruntime-directml==1.20.0Additionally, ensure that dependencies for TensorRT Model Optimizer - Windows are met as mentioned in the Install ModelOpt-Windows as a Standalone Toolkit.
2. Configure Olive for TensorRT Model Optimizer – Windows
New Olive Pass: Olive introduces a new pass,
NVModelOptQuantization
(or “nvmo”), specifically designed for model quantization using TensorRT Model Optimizer – Windows.Add to Configuration: To apply quantization to your target model, include this pass in the Olive configuration file. [Refer phi3 Olive example].
3. Setup Other Passes in Olive Configuration
Add Other Passes: Add additional passes to the Olive configuration file as needed for the desired Olive workflow of your input model. [Refer phi3 Olive example]
4. Run the Optimization
Execute Optimization: To start the optimization process, run the following commands:
$ olive run --config <config json> --setup $ olive run --config <config json>Alternatively, you can execute the optimization using the following Python code:
from olive.workflows import run as olive_run olive_run("config.json")
Note:
Currently, the TensorRT-Model Optimizer - Windows only supports Onnx Runtime GenAI based models in the Olive workflow.