Installation for Linux

System requirements

Latest Model Optimizer (nvidia-modelopt) currently has the following system requirements:

OS	Linux
Architecture	x86_64, aarch64 (SBSA)
Python	>=3.10,<3.13
CUDA	>=12.0
PyTorch	>=2.4
TensorRT-LLM (Optional)	0.20
ONNX Runtime (Optional)	1.22
TensorRT (Optional)	>=10.0

Environment setup

Docker image (Recommended)

Using ModelOpt’s docker image

Easiest way to get started with using Model Optimizer and additional dependencies (e.g. TensorRT-LLM deployment) is to start from our docker image.

After installing the NVIDIA Container Toolkit, please run the following commands to build the Model Optimizer docker container which has all the necessary dependencies pre-installed for running the examples.

# Clone the ModelOpt repository
git clone https://github.com/NVIDIA/TensorRT-Model-Optimizer.git
cd TensorRT-Model-Optimizer

# Build the docker (will be tagged `docker.io/library/modelopt_examples:latest`)
# You may customize `docker/Dockerfile` to include or exclude certain dependencies you may or may not need.
bash docker/build.sh

# Run the docker image
docker run --gpus all -it --shm-size 20g --rm docker.io/library/modelopt_examples:latest bash

# Check installation (inside the docker container)
python -c "import modelopt; print(modelopt.__version__)"

Using alternative NVIDIA docker images

For PyTorch, you can also use NVIDIA NGC PyTorch container and for NVIDIA NeMo framework, you can use the NeMo container. Both of these containers come with Model Optimizer pre-installed. NeMo container also comes with the HuggingFace and TensorRT-LLM dependencies. Make sure to update the Model Optimizer to the latest version if not already.

Local environment (PIP / Conda)

Setting up a virtual environment

We recommend setting up a virtual environment if you don’t have one already. Run the following command to set up and activate a conda virtual environment named modelopt with Python 3.12:

conda create -n modelopt python=3.12 pip
conda activate modelopt

(Optional) Install desired PyTorch version

By default, the latest PyTorch version available on pip will be installed. If you want to install a specific PyTorch version for a specific CUDA version, please first follow the instructions to install your desired PyTorch version.

(Optional) Install other NVIDIA dependencies

If you wish to use ModelOpt in conjunction with other NVIDIA libraries (e.g. TensorRT, TensorRT-LLM, NeMo, Triton, etc.), please make sure to check the ease of installation of these libraries in a local environment. If you face any issues, we recommend using a docker image for a seamless experience. For example, TensorRT-LLM documentation. requires installing in a docker image. You may still choose to use other ModelOpt’s features locally for example, quantizing a HuggingFace model and then use a docker image for deployment.

Install Model Optimizer

ModelOpt including its dependencies can be installed via pip. Please review the license terms of ModelOpt and any dependencies before use.

If you build and use ModelOpt’s docker image, you can skip this step as the image already contains ModelOpt and all optional dependencies pre-installed. If you use other suggested docker images, ModelOpt is pre-installed with some of the below optional dependencies. Make sure to upgrade to the latest version of ModelOpt (with appropriate optional dependencies you need) using pip as shown below.

pip install -U "nvidia-modelopt[all]"

If you want to install only partial dependencies, please replace [all] with the desired optional dependencies as described below.

Identify correct partial dependencies

Note that when installing nvidia-modelopt without any optional dependencies, only the modelopt.torch package requirements are installed and other modules may not work without the appropriate optional dependencies or [all] optional dependencies. Below is a list of optional dependencies that need to be installed to correctly use the corresponding modules:

Module	Optional dependencies
`modelopt.onnx`	`[onnx]`
`modelopt.torch._deploy`	`[onnx]`

Additionally, we support installing dependencies for following 3rd-party packages:

Third-party package	Optional dependencies
Huggingface (`transformers`, `diffusers`, etc.)	`[hf]`

Accelerated Quantization with Triton Kernels

ModelOpt includes optimized quantization kernels implemented with Triton language that accelerate quantization operations by approximately 40% compared to the default implementation. These kernels are particularly beneficial for AWQ and Quantization-aware Training (QAT) workflows.

The Triton-based kernels currently support the NVFP4 quantization format, with support for additional formats coming in future releases. To use these accelerated kernels, you need:

CUDA device with compute capability >= 8.9 (e.g. RTX 40 series, RTX 6000, NVIDIA L40 or later)
Triton package installed: pip install triton

No additional configuration is required - the optimized kernels are used automatically when available for your hardware and quantization format.

Check installation

Tip

When you use ModelOpt’s PyTorch quantization APIs for the first time, it will compile the fast quantization kernels using your installed torch and CUDA if available. This may take a few minutes but subsequent quantization calls will be much faster. To invoke the compilation and check if it is successful or pre-compile for docker builds, run the following command:

python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"