Installation for Linux
System requirements
Latest Model Optimizer (nvidia-modelopt
) currently has the following system requirements:
OS |
Linux |
Architecture |
x86_64, aarch64 (SBSA) |
Python |
>=3.8,<3.13 |
CUDA |
>=11.8 (Recommended 12.x) |
PyTorch (Optional) |
>=2.0 |
ONNX Runtime (Optional) |
1.18 |
TensorRT-LLM (Optional) |
0.14 |
Environment setup
Using ModelOpt’s docker image
Easiest way to get started with using Model Optimizer and additional dependencies (e.g. TensorRT-LLM deployment) is to start from our docker image.
After installing the NVIDIA Container Toolkit, please run the following commands to build the Model Optimizer docker container which has all the necessary dependencies pre-installed for running the examples.
# Clone the ModelOpt repository
git clone https://github.com/NVIDIA/TensorRT-Model-Optimizer.git
cd TensorRT-Model-Optimizer
# Build the docker (will be tagged `docker.io/library/modelopt_examples:latest`)
# You may customize `docker/Dockerfile` to include or exclude certain dependencies you may or may not need.
bash docker/build.sh
# Run the docker image
docker run --gpus all -it --shm-size 20g --rm docker.io/library/modelopt_examples:latest bash
# Check installation (inside the docker container)
python -c "import modelopt; print(modelopt.__version__)"
Using alternative NVIDIA docker images
For PyTorch, you can also use NVIDIA NGC PyTorch container and for NVIDIA NeMo framework, you can use the NeMo container. Both of these containers come with Model Optimizer pre-installed. NeMo container also comes with the HuggingFace and TensorRT-LLM dependencies. Make sure to update the Model Optimizer to the latest version if not already.
Setting up a virtual environment
We recommend setting up a virtual environment if you don’t have one already. Run the following
command to set up and activate a conda
virtual environment named modelopt
with Python 3.12:
conda create -n modelopt python=3.12 pip
conda activate modelopt
(Optional) Install desired PyTorch version
By default, the latest PyTorch version (torch>=2.0
) available on pip
will
be installed. If you want to install a specific PyTorch version for a specific CUDA version, please first
follow the instructions to install your desired PyTorch version.
(Optional) Install ONNX Runtime
If you wish to use ModelOpt’s ONNX features, please follow the steps from the ONNX Runtime documentation to install the recommended ONNX Runtime version correctly for your CUDA version. If ONNX Runtime is not pre-installed, ModelOpt will use the default ONNX Runtime which may not be compatible with your CUDA version.
(Optional) Install other NVIDIA dependencies
If you wish to use ModelOpt in conjunction with other NVIDIA libraries (e.g. TensorRT-LLM, NeMo, Triton, etc.), please make sure to check the ease of installation of these libraries in a local environment. If you face any issues, we recommend using a docker image for a seamless experience. For example, TensorRT-LLM documentation. requires installing in a docker image. You may still choose to use other ModelOpt’s features locally for example, quantizing a HuggingFace model and then use a docker image for deployment.
Install Model Optimizer
ModelOpt including its dependencies can be installed via pip
. Please review the license terms of ModelOpt and any
dependencies before use.
If you build and use ModelOpt’s docker image, you can skip this step as the image already contains ModelOpt and all optional dependencies pre-installed. If you use other suggested docker images, ModelOpt is pre-installed with some of the below optional dependencies. Make sure to upgrade to the latest version of ModelOpt (with appropriate optional dependencies you need) using pip as shown below.
pip install "nvidia-modelopt[all]" -U --extra-index-url https://pypi.nvidia.com
If you want to install only partial dependencies, please replace [all]
with the desired
optional dependencies as described below.
Identify correct partial dependencies
Note that when installing nvidia-modelopt
without any optional dependencies, only the barebone
requirements are installed and none of the modules will work without the appropriate optional
dependencies or [all]
optional dependencies. Below is a list of optional dependencies that
need to be installed to correctly use the corresponding modules:
Module |
Optional dependencies |
---|---|
|
|
|
|
|
|
|
|
Additionally, we support installing dependencies for following 3rd-party packages:
Third-party package |
Optional dependencies |
---|---|
Huggingface ( |
|
Check installation
Tip
When you use ModelOpt’s PyTorch quantization APIs for the first time, it will compile the fast quantization kernels using your installed torch and CUDA if available. This may take a few minutes but subsequent quantization calls will be much faster. To invoke the compilation and check if it is successful or pre-compile for docker builds, run the following command:
python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"