.. _Install-Page-Standalone-Windows:
================================================
Install ModelOpt-Windows as a Standalone Toolkit
================================================
The Model Optimizer - Windows (ModelOpt-Windows) can be installed as a standalone toolkit for quantizing ONNX models. Below are the setup steps:
**1. Setup Prerequisites**
Before using ModelOpt-Windows, the following components must be installed:
- NVIDIA GPU and Graphics Driver
- Python version >= 3.10 and < 3.13
- Visual Studio 2022 / MSVC / C/C++ Build Tools
- CUDA Toolkit, CuDNN for using CUDA path during calibration (e.g. for calibration of ONNX models using `onnxruntime-gpu` or CUDA EP)
Update ``PATH`` environment variable as needed for above prerequisites.
**2. Setup Virtual Environment (Optional but Recommended)**
It is recommended to use a virtual environment for managing Python dependencies. Tools such as *conda* or Python's built-in *venv* module can help create and activate a virtual environment. Example steps for using Python's *venv* module:
.. code-block:: shell
$ mkdir myEnv
$ python -m venv .\myEnv
$ .\myEnv\Scripts\activate
In the newly created virtual environment, none of the required packages (e.g., onnx, onnxruntime, onnxruntime-directml, onnxruntime-gpu, nvidia-modelopt etc.) will be pre-installed.
**3. Install ModelOpt-Windows Wheel**
To install the ONNX module of ModelOpt-Windows, run the following command:
.. code-block:: bash
pip install "nvidia-modelopt[onnx]"
If you install ModelOpt-Windows without the extra ``[onnx]`` option, only the minimal core dependencies and the PyTorch module (``torch``) will be installed. Support for ONNX model quantization requires installing with ``[onnx]``.
**4. ONNX Model Quantization: Setup ONNX Runtime Execution Provider for Calibration**
The Post-Training Quantization (PTQ) process for ONNX models usually involves running the base model with user-supplied inputs, a process called calibration. The user-supplied model inputs are referred to as calibration data. To perform calibration, the base model must be run using a suitable ONNX Execution Provider (EP), such as *DmlExecutionProvider* (DirectML EP) or *CUDAExecutionProvider* (CUDA EP). There are different ONNX Runtime packages for each EP:
- *onnxruntime-directml* provides the DirectML EP.
- *onnxruntime-trt-rtx* provides TensorRT-RTX EP.
- *onnxruntime-gpu* provides the CUDA EP.
- *onnxruntime* provides the CPU EP.
By default, ModelOpt-Windows installs *onnxruntime-gpu*. The default CUDA version needed for *onnxruntime-gpu* since v1.19.0 is 12.x. The *onnxruntime-gpu* package (i.e. CUDA EP) has CUDA and cuDNN dependencies:
- Install CUDA and cuDNN:
- For the ONNX Runtime GPU package, you need to install the appropriate version of CUDA and cuDNN. Refer to the `CUDA Execution Provider requirements `_ for compatible versions of CUDA and cuDNN.
If you need to use any other EP for calibration, you can uninstall the existing *onnxruntime-gpu* package and install the corresponding package. For example, to use the DirectML EP, you can uninstall the existing *onnxruntime-gpu* package and install the *onnxruntime-directml* package:
.. code-block:: bash
pip uninstall onnxruntime-gpu
pip install onnxruntime-directml
**5. Setup GPU Acceleration Tool for Quantization**
By default, ModelOpt-Windows utilizes the `cupy-cuda12x `_ tool for GPU acceleration during the INT4 ONNX quantization process. This is compatible with CUDA 12.x.
**6. Verify Installation**
Ensure the following steps are verified:
- **Task Manager**: Check that the GPU appears in the Task Manager, indicating that the graphics driver is installed and functioning.
- **Python Interpreter**: Open the command line and type python. The Python interpreter should start, displaying the Python version.
- **Onnxruntime Package**: Ensure that exactly one of the following is installed:
- *onnxruntime-directml* (DirectML EP)
- *onnxruntime-trt-rtx* (TensorRT-RTX EP)
- *onnxruntime-gpu* (CUDA EP)
- *onnxruntime* (CPU EP)
- **Onnx and Onnxruntime Import**: Ensure that following python command runs successfully.
.. code-block:: python
python -c "import onnx; import onnxruntime"
- **Environment Variables**: For workflows using CUDA dependencies (e.g., CUDA EP-based calibration), ensure environment variables like *CUDA_PATH*, *CUDA_V12_4*, or *CUDA_V11_8* etc. are set correctly. Reopen the command-prompt if any environment variable is updated or newly created.
- **ModelOpt-Windows Import Check**: Run the following command to ensure the installation is successful:
.. code-block:: python
python -c "import modelopt.onnx.quantization"
- If you encounter any difficulties during the installation process, please refer :ref:`FAQ_ModelOpt_Windows` FAQs for potential solutions and additional guidance.