Model Optimizer

Getting Started

  • Overview
  • Installation
  • Quick Start: PTQ - PyTorch
  • Quick Start: PTQ - ONNX
  • Quick Start: PTQ - PyTorch to ONNX
  • Quick Start: PTQ - Windows
  • Quick Start: QAT
  • Quick Start: Pruning
  • Quick Start: Distillation
  • Quick Start: Speculative Decoding
  • Quick Start: Sparsity

Guides

  • Support Matrix
  • Quantization
  • Saving & Restoring
  • Pruning
  • Distillation
  • Speculative Decoding
  • Sparsity
  • NAS
  • AutoCast (ONNX)
  • Autotune (ONNX)

Deployment

  • TensorRT-LLM
  • Onnxruntime
  • Unified HuggingFace Checkpoint

Examples

  • All GitHub Examples

Reference

  • Changelog
  • modelopt API
    • deploy
    • onnx
    • torch
      • distill
      • export
      • kernels
      • nas
      • opt
      • peft
      • prune
      • puzzletron
        • activation_scoring
        • anymodel
        • block_config
        • build_library_and_stats
        • dataset
        • entrypoint
        • mip
        • plugins
        • pruning
        • puzzletron_nas_plugin
        • replacement_library
        • scoring
        • sewing_kit
        • subblock_stats
        • tools
        • utils
      • quantization
      • sparsity
      • speculative
      • trace
      • utils

Support

  • Contact us
  • FAQs
Model Optimizer
  • modelopt API
  • torch
  • puzzletron
  • pruning
  • View page source

pruning

Modules

modelopt.torch.puzzletron.pruning.expert_removal_pruning_mixin

modelopt.torch.puzzletron.pruning.ffn_intermediate_pruning_mixin

modelopt.torch.puzzletron.pruning.kv_heads_pruning_mixin

modelopt.torch.puzzletron.pruning.pruning_ckpts

Utilities for creating pruned model checkpoints.

modelopt.torch.puzzletron.pruning.pruning_mixin

modelopt.torch.puzzletron.pruning.pruning_utils

Structured pruning mixins and checkpoint utilities for Puzzletron.

Previous Next

© Copyright 2023-2025, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.