Recipes

A recipe is a declarative YAML specification that fully describes how to optimize a model. Recipes decouple optimization settings from Python code, enabling reuse, sharing, version control, and reproducibility. Instead of editing Python scripts to change quantization parameters, you author (or select) a recipe file and pass it to the ModelOpt tooling.

Motivation

Without recipes, optimization settings are scattered across command-line arguments, Python constants, and ad-hoc code edits. This makes it difficult to:

  • Reproduce a published result – the exact configuration is buried in script arguments.

  • Share a configuration – there is no single artifact to hand off.

  • Version-control changes – diffs are mixed in with unrelated code changes.

  • Onboard new models – inference engineers must read source code to discover which settings to tweak.

Recipes solve these problems by capturing all the configuration needed to optimize a model in a single YAML file (or a small directory of files).

Design overview

The recipe system is part of the modelopt.recipe package and consists of three layers:

  1. Recipe files – YAML documents stored in the modelopt_recipes/ directory (shipped with the package) or on the user’s filesystem.

  2. Config loaderload_config() reads YAML files, resolves paths, and performs automatic ExMy floating-point notation conversion.

  3. Recipe loaderload_recipe() validates the YAML against Pydantic models and returns a typed recipe object ready for use.

Recipe file format

A recipe is a YAML file with two top-level sections: metadata and a type-specific configuration section (currently ptq_cfg for PTQ recipes).

Single-file format

The simplest form is a single .yml or .yaml file:

# modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml

metadata:
  recipe_type: ptq
  description: FP8 per-tensor weight and activation (W8A8), FP8 KV cache, max calibration.

ptq_cfg:
  algorithm: max
  quant_cfg:
    - quantizer_path: '*'
      enable: false
    - quantizer_path: '*input_quantizer'
      cfg:
        num_bits: e4m3
        axis:
    - quantizer_path: '*weight_quantizer'
      cfg:
        num_bits: e4m3
        axis:
    - quantizer_path: '*[kv]_bmm_quantizer'
      enable: true
      cfg:
        num_bits: e4m3
    # ... standard exclusions omitted for brevity

Directory format

For larger recipes or when you want to keep metadata separate from the quantization configuration, use a directory with two files:

my_recipe/
  recipe.yml      # metadata section
  ptq_cfg.yml     # ptq_cfg section (quant_cfg + algorithm)

recipe.yml:

metadata:
  recipe_type: ptq
  description: My custom NVFP4 recipe.

ptq_cfg.yml:

algorithm: max
quant_cfg:
  - quantizer_path: '*'
    enable: false
  - quantizer_path: '*weight_quantizer'
    cfg:
      num_bits: e2m1
      block_sizes: {-1: 16, type: dynamic, scale_bits: e4m3}
  - quantizer_path: '*input_quantizer'
    cfg:
      num_bits: e4m3
      axis:

Metadata section

Every recipe file must contain a metadata mapping with at least a recipe_type field:

Field

Required

Description

recipe_type

Yes

The optimization category. Currently only "ptq" is supported.

description

No

A human-readable summary of what the recipe does.

PTQ configuration section

For PTQ recipes (recipe_type: ptq), the ptq_cfg mapping contains:

Field

Required

Description

quant_cfg

Yes

An ordered list of QuantizerCfgEntry dicts. See Quantization Configuration (quant_cfg) for the full specification of entries, ordering semantics, and atomicity rules.

algorithm

No

The calibration algorithm: "max" (default), "mse", "smoothquant", "awq_lite", "awq_full", "awq_clip", "gptq", or null for formats that need no calibration (e.g. MX formats).

ExMy floating-point notation

Recipe files support a convenient shorthand for floating-point bit formats in num_bits and scale_bits fields. Instead of writing a Python tuple, you write the format name directly:

num_bits: e4m3       # automatically converted to (4, 3)
scale_bits: e8m0     # automatically converted to (8, 0)

The notation is case-insensitive (E4M3, e4m3, E4m3 all work). The conversion is performed by load_config() when loading any YAML file, so it works in both recipe files and standalone config files.

Common formats:

Notation

Tuple

Description

e4m3

(4, 3)

FP8 E4M3 – standard FP8 weight/activation format

e5m2

(5, 2)

FP8 E5M2 – wider dynamic range, used for gradients

e2m1

(2, 1)

FP4 E2M1 – NVFP4 weight format

e8m0

(8, 0)

E8M0 – MX block scaling format

Built-in recipes

ModelOpt ships a library of built-in recipes under the modelopt_recipes/ package. These are bundled with the Python distribution and can be referenced by their relative path (without the modelopt_recipes/ prefix).

General PTQ recipes

General recipes are model-agnostic and apply to any supported architecture:

Recipe path

Description

general/ptq/fp8_default-fp8_kv

FP8 per-tensor W8A8, FP8 KV cache, max calibration

general/ptq/nvfp4_default-fp8_kv

NVFP4 W4A4 with FP8 KV cache, max calibration

general/ptq/nvfp4_mlp_only-fp8_kv

NVFP4 for MLP layers only, FP8 KV cache

general/ptq/nvfp4_experts_only-fp8_kv

NVFP4 for MoE expert layers only, FP8 KV cache

general/ptq/nvfp4_omlp_only-fp8_kv

NVFP4 for output projection + MLP layers, FP8 KV cache

Model-specific recipes

Model-specific recipes are tuned for a particular architecture and live under models/<model_name>/:

Recipe path

Description

models/Step3.5-Flash/nvfp4-mlp-only

NVFP4 MLP-only for Step 3.5 Flash MoE model

Loading recipes

Python API

Use load_recipe() to load a recipe. The path is resolved against the built-in library first, then the filesystem:

from modelopt.recipe import load_recipe, ModelOptPTQRecipe

# Load a built-in recipe by relative path (suffix optional)
recipe = load_recipe("general/ptq/fp8_default-fp8_kv")
assert isinstance(recipe, ModelOptPTQRecipe)

# The ptq_cfg dict can be passed directly to mtq.quantize()
import modelopt.torch.quantization as mtq

model = mtq.quantize(model, recipe.ptq_cfg, forward_loop)
# Load a custom recipe from the filesystem
recipe = load_recipe("/path/to/my_custom_recipe.yml")
model = mtq.quantize(model, recipe.ptq_cfg, forward_loop)

Command-line usage

The hf_ptq.py example accepts a --recipe flag:

python examples/llm_ptq/hf_ptq.py \
    --model Qwen/Qwen3-8B \
    --recipe general/ptq/fp8_default-fp8_kv \
    --export_path build/fp8 \
    --calib_size 512 \
    --export_fmt hf

When --recipe is provided, the script loads the recipe and uses its ptq_cfg directly, bypassing the --qformat / --kv_cache_qformat flags.

Loading standalone configs

load_config() loads arbitrary YAML config files with automatic ExMy conversion and built-in path resolution. This is useful for loading shared configuration fragments:

from modelopt.recipe import load_config

cfg = load_config("configs/some_shared_config")

Path resolution

Both load_recipe() and load_config() resolve paths using the same strategy:

  1. If the path is absolute, use it directly.

  2. If relative, check the built-in recipes library first (modelopt_recipes/), probing .yml and .yaml suffixes.

  3. Then check the filesystem, probing the same suffixes.

This means built-in recipes can be referenced without any prefix:

# These are all equivalent:
load_recipe("general/ptq/fp8_default-fp8_kv")
load_recipe("general/ptq/fp8_default-fp8_kv.yml")

Writing a custom recipe

To create a custom recipe:

  1. Start from an existing recipe that is close to your target configuration.

  2. Copy it and modify the quant_cfg entries as needed (see Quantization Configuration (quant_cfg) for entry format details).

  3. Update the metadata.description to describe your changes.

  4. Save the file and pass its path to load_recipe() or --recipe.

Example – creating an INT8 per-channel recipe:

# my_int8_recipe.yml
metadata:
  recipe_type: ptq
  description: INT8 per-channel weight, per-tensor activation.

ptq_cfg:
  algorithm: max
  quant_cfg:
    - quantizer_path: '*'
      enable: false
    - quantizer_path: '*weight_quantizer'
      cfg:
        num_bits: 8
        axis: 0
    - quantizer_path: '*input_quantizer'
      cfg:
        num_bits: 8
        axis:
    - quantizer_path: '*lm_head*'
      enable: false
    - quantizer_path: '*output_layer*'
      enable: false

Recipe repository layout

The modelopt_recipes/ package is organized as follows:

modelopt_recipes/
+-- __init__.py
+-- general/                    # Model-agnostic recipes
|   +-- ptq/
|       +-- fp8_default-fp8_kv.yml
|       +-- nvfp4_default-fp8_kv.yml
|       +-- nvfp4_mlp_only-fp8_kv.yml
|       +-- nvfp4_experts_only-fp8_kv.yml
|       +-- nvfp4_omlp_only-fp8_kv.yml
+-- models/                     # Model-specific recipes
|   +-- Step3.5-Flash/
|       +-- nvfp4-mlp-only.yaml
+-- configs/                    # Shared configuration fragments

Recipe data model

Recipes are validated at load time using Pydantic models:

ModelOptRecipeBase

Base class for all recipe types. Contains recipe_type and description.

ModelOptPTQRecipe

PTQ-specific recipe. Adds the ptq_cfg field (a dict with quant_cfg and algorithm).

RecipeType

Enum of supported recipe types. Currently only PTQ.

Future directions

The recipe system is designed to grow:

  • QAT recipesrecipe_type: qat with training hyperparameters, distillation settings, and dataset configuration.

  • Sparsity recipes – structured and unstructured pruning configurations.

  • Speculative decoding recipes – draft model and vocabulary calibration settings.

  • Composite recipes – chaining multiple optimization stages (e.g., quantize then prune) in a single recipe.

  • Dataset configuration – standardized dataset section for calibration data specification.

  • Recipe merging and override utilities – programmatic tools to compose and customize recipes.

  • Unified entry point – a nv-modelopt CLI that accepts --recipe as the primary configuration mechanism, replacing per-example scripts.