Recipes
A recipe is a declarative YAML specification that fully describes how to optimize a model. Recipes decouple optimization settings from Python code, enabling reuse, sharing, version control, and reproducibility. Instead of editing Python scripts to change quantization parameters, you author (or select) a recipe file and pass it to the ModelOpt tooling.
Motivation
Without recipes, optimization settings are scattered across command-line arguments, Python constants, and ad-hoc code edits. This makes it difficult to:
Reproduce a published result – the exact configuration is buried in script arguments.
Share a configuration – there is no single artifact to hand off.
Version-control changes – diffs are mixed in with unrelated code changes.
Onboard new models – inference engineers must read source code to discover which settings to tweak.
Recipes solve these problems by capturing all the configuration needed to optimize a model in a single YAML file (or a small directory of files).
Design overview
The recipe system is part of the modelopt.recipe package and consists of three
layers:
Recipe files – YAML documents stored in the
modelopt_recipes/directory (shipped with the package) or on the user’s filesystem.Config loader –
load_config()reads YAML files, resolves paths, and performs automaticExMyfloating-point notation conversion.Recipe loader –
load_recipe()validates the YAML against Pydantic models and returns a typed recipe object ready for use.
Recipe file format
A recipe is a YAML file with two top-level sections: metadata and a
type-specific configuration section (currently ptq_cfg for PTQ recipes).
Single-file format
The simplest form is a single .yml or .yaml file:
# modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
metadata:
recipe_type: ptq
description: FP8 per-tensor weight and activation (W8A8), FP8 KV cache, max calibration.
ptq_cfg:
algorithm: max
quant_cfg:
- quantizer_path: '*'
enable: false
- quantizer_path: '*input_quantizer'
cfg:
num_bits: e4m3
axis:
- quantizer_path: '*weight_quantizer'
cfg:
num_bits: e4m3
axis:
- quantizer_path: '*[kv]_bmm_quantizer'
enable: true
cfg:
num_bits: e4m3
# ... standard exclusions omitted for brevity
Directory format
For larger recipes or when you want to keep metadata separate from the quantization configuration, use a directory with two files:
my_recipe/
recipe.yml # metadata section
ptq_cfg.yml # ptq_cfg section (quant_cfg + algorithm)
recipe.yml:
metadata:
recipe_type: ptq
description: My custom NVFP4 recipe.
ptq_cfg.yml:
algorithm: max
quant_cfg:
- quantizer_path: '*'
enable: false
- quantizer_path: '*weight_quantizer'
cfg:
num_bits: e2m1
block_sizes: {-1: 16, type: dynamic, scale_bits: e4m3}
- quantizer_path: '*input_quantizer'
cfg:
num_bits: e4m3
axis:
Metadata section
Every recipe file must contain a metadata mapping with at least a recipe_type field:
Field |
Required |
Description |
|---|---|---|
|
Yes |
The optimization category. Currently only |
|
No |
A human-readable summary of what the recipe does. |
PTQ configuration section
For PTQ recipes (recipe_type: ptq), the ptq_cfg mapping contains:
Field |
Required |
Description |
|---|---|---|
|
Yes |
An ordered list of |
|
No |
The calibration algorithm: |
ExMy floating-point notation
Recipe files support a convenient shorthand for floating-point bit formats in
num_bits and scale_bits fields. Instead of writing a Python tuple, you
write the format name directly:
num_bits: e4m3 # automatically converted to (4, 3)
scale_bits: e8m0 # automatically converted to (8, 0)
The notation is case-insensitive (E4M3, e4m3, E4m3 all work). The
conversion is performed by load_config() when loading any
YAML file, so it works in both recipe files and standalone config files.
Common formats:
Notation |
Tuple |
Description |
|---|---|---|
|
|
FP8 E4M3 – standard FP8 weight/activation format |
|
|
FP8 E5M2 – wider dynamic range, used for gradients |
|
|
FP4 E2M1 – NVFP4 weight format |
|
|
E8M0 – MX block scaling format |
Built-in recipes
ModelOpt ships a library of built-in recipes under the modelopt_recipes/ package.
These are bundled with the Python distribution and can be referenced by their relative
path (without the modelopt_recipes/ prefix).
General PTQ recipes
General recipes are model-agnostic and apply to any supported architecture:
Recipe path |
Description |
|---|---|
|
FP8 per-tensor W8A8, FP8 KV cache, max calibration |
|
NVFP4 W4A4 with FP8 KV cache, max calibration |
|
NVFP4 for MLP layers only, FP8 KV cache |
|
NVFP4 for MoE expert layers only, FP8 KV cache |
|
NVFP4 for output projection + MLP layers, FP8 KV cache |
Model-specific recipes
Model-specific recipes are tuned for a particular architecture and live under
models/<model_name>/:
Recipe path |
Description |
|---|---|
|
NVFP4 MLP-only for Step 3.5 Flash MoE model |
Loading recipes
Python API
Use load_recipe() to load a recipe. The path is resolved
against the built-in library first, then the filesystem:
from modelopt.recipe import load_recipe, ModelOptPTQRecipe
# Load a built-in recipe by relative path (suffix optional)
recipe = load_recipe("general/ptq/fp8_default-fp8_kv")
assert isinstance(recipe, ModelOptPTQRecipe)
# The ptq_cfg dict can be passed directly to mtq.quantize()
import modelopt.torch.quantization as mtq
model = mtq.quantize(model, recipe.ptq_cfg, forward_loop)
# Load a custom recipe from the filesystem
recipe = load_recipe("/path/to/my_custom_recipe.yml")
model = mtq.quantize(model, recipe.ptq_cfg, forward_loop)
Command-line usage
The hf_ptq.py example accepts a --recipe flag:
python examples/llm_ptq/hf_ptq.py \
--model Qwen/Qwen3-8B \
--recipe general/ptq/fp8_default-fp8_kv \
--export_path build/fp8 \
--calib_size 512 \
--export_fmt hf
When --recipe is provided, the script loads the recipe and uses its ptq_cfg
directly, bypassing the --qformat / --kv_cache_qformat flags.
Loading standalone configs
load_config() loads arbitrary YAML config files with
automatic ExMy conversion and built-in path resolution. This is useful
for loading shared configuration fragments:
from modelopt.recipe import load_config
cfg = load_config("configs/some_shared_config")
Path resolution
Both load_recipe() and load_config()
resolve paths using the same strategy:
If the path is absolute, use it directly.
If relative, check the built-in recipes library first (
modelopt_recipes/), probing.ymland.yamlsuffixes.Then check the filesystem, probing the same suffixes.
This means built-in recipes can be referenced without any prefix:
# These are all equivalent:
load_recipe("general/ptq/fp8_default-fp8_kv")
load_recipe("general/ptq/fp8_default-fp8_kv.yml")
Writing a custom recipe
To create a custom recipe:
Start from an existing recipe that is close to your target configuration.
Copy it and modify the
quant_cfgentries as needed (see Quantization Configuration (quant_cfg) for entry format details).Update the
metadata.descriptionto describe your changes.Save the file and pass its path to
load_recipe()or--recipe.
Example – creating an INT8 per-channel recipe:
# my_int8_recipe.yml
metadata:
recipe_type: ptq
description: INT8 per-channel weight, per-tensor activation.
ptq_cfg:
algorithm: max
quant_cfg:
- quantizer_path: '*'
enable: false
- quantizer_path: '*weight_quantizer'
cfg:
num_bits: 8
axis: 0
- quantizer_path: '*input_quantizer'
cfg:
num_bits: 8
axis:
- quantizer_path: '*lm_head*'
enable: false
- quantizer_path: '*output_layer*'
enable: false
Recipe repository layout
The modelopt_recipes/ package is organized as follows:
modelopt_recipes/
+-- __init__.py
+-- general/ # Model-agnostic recipes
| +-- ptq/
| +-- fp8_default-fp8_kv.yml
| +-- nvfp4_default-fp8_kv.yml
| +-- nvfp4_mlp_only-fp8_kv.yml
| +-- nvfp4_experts_only-fp8_kv.yml
| +-- nvfp4_omlp_only-fp8_kv.yml
+-- models/ # Model-specific recipes
| +-- Step3.5-Flash/
| +-- nvfp4-mlp-only.yaml
+-- configs/ # Shared configuration fragments
Recipe data model
Recipes are validated at load time using Pydantic models:
ModelOptRecipeBaseBase class for all recipe types. Contains
recipe_typeanddescription.ModelOptPTQRecipePTQ-specific recipe. Adds the
ptq_cfgfield (a dict withquant_cfgandalgorithm).RecipeTypeEnum of supported recipe types. Currently only
PTQ.
Future directions
The recipe system is designed to grow:
QAT recipes –
recipe_type: qatwith training hyperparameters, distillation settings, and dataset configuration.Sparsity recipes – structured and unstructured pruning configurations.
Speculative decoding recipes – draft model and vocabulary calibration settings.
Composite recipes – chaining multiple optimization stages (e.g., quantize then prune) in a single recipe.
Dataset configuration – standardized
datasetsection for calibration data specification.Recipe merging and override utilities – programmatic tools to compose and customize recipes.
Unified entry point – a
nv-modeloptCLI that accepts--recipeas the primary configuration mechanism, replacing per-example scripts.