Install ModelOpt-Windows as a Standalone Toolkit

The TensorRT Model Optimizer - Windows (ModelOpt-Windows) can be installed as a standalone toolkit for quantizing Large Language Models (LLMs). Below are the setup steps:

1. Setup Prerequisites

Before using ModelOpt-Windows, the following components must be installed:

  • NVIDIA GPU and Graphics Driver

  • Python version >= 3.10 and < 3.13

  • Visual Studio 2022 / MSVC / C/C++ Build Tools

Update PATH environment variable as needed for above prerequisites.

2. Setup Virtual Environment (Optional but Recommended)

It is recommended to use a virtual environment for managing Python dependencies. Tools such as conda or Python’s built-in venv module can help create and activate a virtual environment. Example steps for using Python’s venv module:

$ mkdir myEnv
$ python -m venv .\myEnv
$ .\myEnv\Scripts\activate

In the newly created virtual environment, none of the required packages (e.g., onnx, onnxruntime, onnxruntime-directml, onnxruntime-gpu, nvidia-modelopt) will be pre-installed.

3. Install ModelOpt-Windows Wheel

To install the ModelOpt-Windows wheel, run the following command:

pip install "nvidia-modelopt[onnx]" --extra-index-url https://pypi.nvidia.com

This command installs ModelOpt-Windows and its ONNX module, along with the onnxruntime-directml (v1.20.0) package. If ModelOpt-Windows is installed without the additional parameter, only the bare minimum dependencies will be installed, without the relevant module and dependencies.

4. Setup ONNX Runtime (ORT) for Calibration

The ONNX Post-Training Quantization (PTQ) process involves running the base model with user-supplied inputs, a process called calibration. The user-supplied model inputs are referred to as calibration data. To perform calibration, the base model must be run using a suitable ONNX Execution Provider (EP), such as DmlExecutionProvider (DirectML EP) or CudaExecutionProvider (CUDA EP). There are different ONNX Runtime packages for each EP:

  • onnxruntime-directml provides the DirectML EP.

  • onnxruntime-gpu provides the CUDA EP.

  • onnxruntime provides the CPU EP.

By default, ModelOpt-Windows installs onnxruntime-directml and uses the DirectML EP (v1.20.0) for calibration. No additional dependencies are required. If you prefer to use the CUDA EP for calibration, uninstall the existing onnxruntime-directml package and install the onnxruntime-gpu package, which requires CUDA and cuDNN dependencies:

  • Uninstall onnxruntime-directml:

    pip uninstall onnxruntime-directml
    
  • Install CUDA and cuDNN:
    • For the ONNX Runtime GPU package, you need to install the appropriate version of CUDA and cuDNN. Refer to the CUDA Execution Provider requirements for compatible versions of CUDA and cuDNN.

  • Install ONNX Runtime GPU (CUDA 12.x):

    pip install onnxruntime-gpu
    
    • The default CUDA version for onnxruntime-gpu since v1.19.0 is 12.x.

5. Setup GPU Acceleration Tool for Quantization

ModelOpt-Windows utilizes the cupy tool for GPU acceleration during the ONNX quantization process. There are different cupy packages for CUDA 11.x and CUDA 12.x. Install the appropriate version of cupy based on your CUDA setup. For installation and troubleshooting, refer to the cupy documentation.

6. Verify Installation

Ensure the following steps are verified:
  • Task Manager: Check that the GPU appears in the Task Manager, indicating that the graphics driver is installed and functioning.

  • Python Interpreter: Open the command line and type python. The Python interpreter should start, displaying the Python version.

  • Onnxruntime Package: Ensure that one of the following is installed:
    • onnxruntime-directml (DirectML EP)

    • onnxruntime-gpu (CUDA EP)

    • onnxruntime (CPU EP)

  • Environment Variables: For workflows using CUDA dependencies (e.g., CUDA EP-based calibration), ensure environment variables like CUDA_PATH, CUDA_V12_4, or CUDA_V11_8 etc. are set correctly. Reopen the command-prompt if any environment variable is updated or newly created.

  • ModelOpt-Windows Import Check: Run the following command to ensure the installation is successful:

    python -c "import modelopt.onnx.quantization"
    
  • If you encounter any difficulties during the installation process, please refer ModelOpt-Windows FAQs for potential solutions and additional guidance.