Model Optimizer

Getting Started

Overview
Installation
Quick Start: PTQ - PyTorch
Quick Start: PTQ - ONNX
Quick Start: PTQ - PyTorch to ONNX
Quick Start: PTQ - Windows
Quick Start: QAT
Quick Start: Pruning
Quick Start: Distillation
Quick Start: Speculative Decoding
Quick Start: Sparsity

Guides

Support Matrix
Recipes
ModelOpt Config System
Quantization
Saving & Restoring
Pruning
Distillation
Speculative Decoding
Sparsity
NAS
AutoCast (ONNX)
Autotune (ONNX)

Deployment

TensorRT-LLM
Onnxruntime
Unified HuggingFace Checkpoint

Examples

All GitHub Examples

Reference

Changelog
modelopt API
- deploy
- onnx
- torch
  - distill
  - export
  - kernels
  - nas
  - opt
  - peft
  - prune
  - puzzletron
  - quantization
    - algorithms
    - backends
    - calib
    - modelopt.torch.quantization.compress
    - config
    - conversion
    - export_onnx
    - extensions
    - mode
    - model_calib
    - model_quant
    - nn
    - plugins
    - qtensor
    - tensor_quant
    - utils
  - sparsity
  - speculative
  - trace
  - utils

Support

Contact us
FAQs

Model Optimizer

modelopt API
torch
quantization
nn
modules
quant_embedding
View page source

quant_embedding

Quantized Embedding.

Classes

QuantEmbedding

alias of _QuantEmbedding

QuantEmbedding: alias of _QuantEmbedding

Previous Next

© Copyright 2023-2025, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.