TensorRT Model Optimizer
Getting Started
Overview
Installation
Quick Start: Quantization
Quick Start: Quantization (Windows)
Quick Start: Pruning
Quick Start: Distillation
Quick Start: Sparsity
Guides
Support Matrix
Quantization
Pruning
NAS
Distillation
Sparsity
Saving & Restoring
Speculative Decoding
Deployment
TensorRT-LLM Deployment
DirectML Deployment
Examples
All GitHub Examples
ResNet20 on CIFAR-10: Pruning
HF BERT: Prune, Distill & Quantize
Reference
Changelog
modelopt API
deploy
onnx
op_types
quantization
calib_utils
extensions
fp8
graph_utils
gs_patching
int4
int8
operators
ort_patching
ort_utils
partitioning
qdq_utils
quant_utils
modelopt.onnx.quantization.quantize
trt_utils
utils
torch
Support
Contact us
FAQs
TensorRT Model Optimizer
modelopt API
onnx
quantization
extensions
View page source
extensions
Module to load C++ extensions.