Installation for Linux

System requirements

Latest Model Optimizer (nvidia-modelopt) currently has the following system requirements:

OS

Linux

Architecture

x86_64, aarch64 (SBSA)

Python

>=3.8,<3.13

CUDA

>=11.8 (Recommended 12.x)

PyTorch (Optional)

>=2.0

ONNX Runtime (Optional)

1.18

TensorRT-LLM (Optional)

0.15

Environment setup

Using ModelOpt’s docker image

Easiest way to get started with using Model Optimizer and additional dependencies (e.g. TensorRT-LLM deployment) is to start from our docker image.

After installing the NVIDIA Container Toolkit, please run the following commands to build the Model Optimizer docker container which has all the necessary dependencies pre-installed for running the examples.

# Clone the ModelOpt repository
git clone https://github.com/NVIDIA/TensorRT-Model-Optimizer.git
cd TensorRT-Model-Optimizer

# Build the docker (will be tagged `docker.io/library/modelopt_examples:latest`)
# You may customize `docker/Dockerfile` to include or exclude certain dependencies you may or may not need.
bash docker/build.sh

# Run the docker image
docker run --gpus all -it --shm-size 20g --rm docker.io/library/modelopt_examples:latest bash

# Check installation (inside the docker container)
python -c "import modelopt; print(modelopt.__version__)"

Using alternative NVIDIA docker images

For PyTorch, you can also use NVIDIA NGC PyTorch container and for NVIDIA NeMo framework, you can use the NeMo container. Both of these containers come with Model Optimizer pre-installed. NeMo container also comes with the HuggingFace and TensorRT-LLM dependencies. Make sure to update the Model Optimizer to the latest version if not already.

Setting up a virtual environment

We recommend setting up a virtual environment if you don’t have one already. Run the following command to set up and activate a conda virtual environment named modelopt with Python 3.12:

conda create -n modelopt python=3.12 pip
conda activate modelopt

(Optional) Install desired PyTorch version

By default, the latest PyTorch version (torch>=2.0) available on pip will be installed. If you want to install a specific PyTorch version for a specific CUDA version, please first follow the instructions to install your desired PyTorch version.

(Optional) Install ONNX Runtime

If you wish to use ModelOpt’s ONNX features, please follow the steps from the ONNX Runtime documentation to install the recommended ONNX Runtime version correctly for your CUDA version. If ONNX Runtime is not pre-installed, ModelOpt will use the default ONNX Runtime which may not be compatible with your CUDA version.

(Optional) Install other NVIDIA dependencies

If you wish to use ModelOpt in conjunction with other NVIDIA libraries (e.g. TensorRT-LLM, NeMo, Triton, etc.), please make sure to check the ease of installation of these libraries in a local environment. If you face any issues, we recommend using a docker image for a seamless experience. For example, TensorRT-LLM documentation. requires installing in a docker image. You may still choose to use other ModelOpt’s features locally for example, quantizing a HuggingFace model and then use a docker image for deployment.

Install Model Optimizer

ModelOpt including its dependencies can be installed via pip. Please review the license terms of ModelOpt and any dependencies before use.

If you build and use ModelOpt’s docker image, you can skip this step as the image already contains ModelOpt and all optional dependencies pre-installed. If you use other suggested docker images, ModelOpt is pre-installed with some of the below optional dependencies. Make sure to upgrade to the latest version of ModelOpt (with appropriate optional dependencies you need) using pip as shown below.

pip install "nvidia-modelopt[all]" -U --extra-index-url https://pypi.nvidia.com

If you want to install only partial dependencies, please replace [all] with the desired optional dependencies as described below.

Identify correct partial dependencies

Note that when installing nvidia-modelopt without any optional dependencies, only the barebone requirements are installed and none of the modules will work without the appropriate optional dependencies or [all] optional dependencies. Below is a list of optional dependencies that need to be installed to correctly use the corresponding modules:

Module

Optional dependencies

modelopt.deploy

[deploy]

modelopt.onnx

[onnx]

modelopt.torch

[torch]

modelopt.torch._deploy

[torch, deploy]

Additionally, we support installing dependencies for following 3rd-party packages:

Third-party package

Optional dependencies

Huggingface (transformers, diffusers, etc.)

[hf]

Check installation

Tip

When you use ModelOpt’s PyTorch quantization APIs for the first time, it will compile the fast quantization kernels using your installed torch and CUDA if available. This may take a few minutes but subsequent quantization calls will be much faster. To invoke the compilation and check if it is successful or pre-compile for docker builds, run the following command:

python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"