Model Optimizer

Getting Started

  • Overview
  • Installation
  • Quick Start: PTQ - PyTorch
  • Quick Start: PTQ - ONNX
  • Quick Start: PTQ - PyTorch to ONNX
  • Quick Start: PTQ - Windows
  • Quick Start: QAT
  • Quick Start: Pruning
  • Quick Start: Distillation
  • Quick Start: Speculative Decoding
  • Quick Start: Sparsity

Guides

  • Support Matrix
  • Quantization
  • Saving & Restoring
  • Pruning
  • Distillation
  • Speculative Decoding
  • Sparsity
  • NAS
  • AutoCast (ONNX)
  • Autotune (ONNX)

Deployment

  • TensorRT-LLM
  • Onnxruntime
  • Unified HuggingFace Checkpoint

Examples

  • All GitHub Examples

Reference

  • Changelog
  • modelopt API
    • deploy
    • onnx
    • torch
      • distill
      • export
      • kernels
      • nas
      • opt
      • peft
      • prune
      • puzzletron
        • activation_scoring
        • anymodel
        • block_config
        • build_library_and_stats
        • dataset
        • entrypoint
        • mip
        • plugins
        • pruning
        • puzzletron_nas_plugin
        • replacement_library
        • scoring
        • sewing_kit
        • subblock_stats
        • tools
        • utils
      • quantization
      • sparsity
      • speculative
      • trace
      • utils

Support

  • Contact us
  • FAQs
Model Optimizer
  • modelopt API
  • torch
  • puzzletron
  • anymodel
  • models
  • View page source

models

Modules

modelopt.torch.puzzletron.anymodel.models.gpt_oss

GPT-OSS model support for AnyModel.

modelopt.torch.puzzletron.anymodel.models.llama

modelopt.torch.puzzletron.anymodel.models.mistral_small

modelopt.torch.puzzletron.anymodel.models.nemotron_h

modelopt.torch.puzzletron.anymodel.models.nemotron_h_v2

modelopt.torch.puzzletron.anymodel.models.qwen2

modelopt.torch.puzzletron.anymodel.models.qwen3

modelopt.torch.puzzletron.anymodel.models.qwen3_vl

Previous Next

© Copyright 2023-2025, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.