.. _Install-Page-Standalone-Windows:
================================================
Install ModelOpt-Windows as a Standalone Toolkit
================================================
The TensorRT Model Optimizer - Windows (ModelOpt-Windows) can be installed as a standalone toolkit for quantizing Large Language Models (LLMs). Below are the setup steps:
**1. Setup Prerequisites**
Before using ModelOpt-Windows, the following components must be installed:
- NVIDIA GPU and Graphics Driver
- Python version >= 3.10 and < 3.13
- Visual Studio 2022 / MSVC / C/C++ Build Tools
Update ``PATH`` environment variable as needed for above prerequisites.
**2. Setup Virtual Environment (Optional but Recommended)**
It is recommended to use a virtual environment for managing Python dependencies. Tools such as *conda* or Python's built-in *venv* module can help create and activate a virtual environment. Example steps for using Python's *venv* module:
.. code-block:: shell
$ mkdir myEnv
$ python -m venv .\myEnv
$ .\myEnv\Scripts\activate
In the newly created virtual environment, none of the required packages (e.g., onnx, onnxruntime, onnxruntime-directml, onnxruntime-gpu, nvidia-modelopt) will be pre-installed.
**3. Install ModelOpt-Windows Wheel**
To install the ModelOpt-Windows wheel, run the following command:
.. code-block:: bash
pip install "nvidia-modelopt[onnx]" --extra-index-url https://pypi.nvidia.com
This command installs ModelOpt-Windows and its ONNX module, along with the *onnxruntime-directml* (v1.20.0) package. If ModelOpt-Windows is installed without the additional parameter, only the bare minimum dependencies will be installed, without the relevant module and dependencies.
**4. Setup ONNX Runtime (ORT) for Calibration**
The ONNX Post-Training Quantization (PTQ) process involves running the base model with user-supplied inputs, a process called calibration. The user-supplied model inputs are referred to as calibration data. To perform calibration, the base model must be run using a suitable ONNX Execution Provider (EP), such as *DmlExecutionProvider* (DirectML EP) or *CudaExecutionProvider* (CUDA EP). There are different ONNX Runtime packages for each EP:
- *onnxruntime-directml* provides the DirectML EP.
- *onnxruntime-gpu* provides the CUDA EP.
- *onnxruntime* provides the CPU EP.
By default, ModelOpt-Windows installs *onnxruntime-directml* and uses the DirectML EP (v1.20.0) for calibration. No additional dependencies are required.
If you prefer to use the CUDA EP for calibration, uninstall the existing *onnxruntime-directml* package and install the *onnxruntime-gpu* package, which requires CUDA and cuDNN dependencies:
- Uninstall *onnxruntime-directml*:
.. code-block:: bash
pip uninstall onnxruntime-directml
- Install CUDA and cuDNN:
- For the ONNX Runtime GPU package, you need to install the appropriate version of CUDA and cuDNN. Refer to the `CUDA Execution Provider requirements `_ for compatible versions of CUDA and cuDNN.
- Install ONNX Runtime GPU (CUDA 12.x):
.. code-block:: bash
pip install onnxruntime-gpu
- The default CUDA version for *onnxruntime-gpu* since v1.19.0 is 12.x.
**5. Setup GPU Acceleration Tool for Quantization**
ModelOpt-Windows utilizes the `cupy-cuda12x `_ tool for GPU acceleration during the INT4 ONNX quantization process if you have CUDA 12.x.
**6. Verify Installation**
Ensure the following steps are verified:
- **Task Manager**: Check that the GPU appears in the Task Manager, indicating that the graphics driver is installed and functioning.
- **Python Interpreter**: Open the command line and type python. The Python interpreter should start, displaying the Python version.
- **Onnxruntime Package**: Ensure that one of the following is installed:
- *onnxruntime-directml* (DirectML EP)
- *onnxruntime-gpu* (CUDA EP)
- *onnxruntime* (CPU EP)
- **Environment Variables**: For workflows using CUDA dependencies (e.g., CUDA EP-based calibration), ensure environment variables like *CUDA_PATH*, *CUDA_V12_4*, or *CUDA_V11_8* etc. are set correctly. Reopen the command-prompt if any environment variable is updated or newly created.
- **ModelOpt-Windows Import Check**: Run the following command to ensure the installation is successful:
.. code-block:: python
python -c "import modelopt.onnx.quantization"
- If you encounter any difficulties during the installation process, please refer :ref:`FAQ_ModelOpt_Windows` FAQs for potential solutions and additional guidance.