.. _recipes: Recipes ####### A **recipe** is a declarative YAML specification that fully describes how to optimize a model. Recipes decouple optimization settings from Python code, enabling reuse, sharing, version control, and reproducibility. Instead of editing Python scripts to change quantization parameters, you author (or select) a recipe file and pass it to the ModelOpt tooling. .. contents:: On this page :local: :depth: 2 Motivation ========== Without recipes, optimization settings are scattered across command-line arguments, Python constants, and ad-hoc code edits. This makes it difficult to: * **Reproduce** a published result -- the exact configuration is buried in script arguments. * **Share** a configuration -- there is no single artifact to hand off. * **Version-control** changes -- diffs are mixed in with unrelated code changes. * **Onboard new models** -- inference engineers must read source code to discover which settings to tweak. Recipes solve these problems by capturing **all** the configuration needed to optimize a model in a single YAML file (or a small directory of files). Design overview =============== The recipe system is part of the :mod:`modelopt.recipe` package and consists of three layers: 1. **Recipe files** -- YAML documents stored in the ``modelopt_recipes/`` directory (shipped with the package) or on the user's filesystem. 2. **Config loader** -- :func:`~modelopt.recipe.load_config` reads YAML files, resolves paths, and performs automatic ``ExMy`` floating-point notation conversion. 3. **Recipe loader** -- :func:`~modelopt.recipe.load_recipe` validates the YAML against Pydantic models and returns a typed recipe object ready for use. Recipe file format ================== A recipe is a YAML file with two top-level sections: ``metadata`` and a type-specific configuration section (currently ``ptq_cfg`` for PTQ recipes). Single-file format ------------------ The simplest form is a single ``.yml`` or ``.yaml`` file: .. code-block:: yaml # modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml metadata: recipe_type: ptq description: FP8 per-tensor weight and activation (W8A8), FP8 KV cache, max calibration. ptq_cfg: algorithm: max quant_cfg: - quantizer_path: '*' enable: false - quantizer_path: '*input_quantizer' cfg: num_bits: e4m3 axis: - quantizer_path: '*weight_quantizer' cfg: num_bits: e4m3 axis: - quantizer_path: '*[kv]_bmm_quantizer' enable: true cfg: num_bits: e4m3 # ... standard exclusions omitted for brevity Directory format ---------------- For larger recipes or when you want to keep metadata separate from the quantization configuration, use a directory with two files: .. code-block:: text my_recipe/ recipe.yml # metadata section ptq_cfg.yml # ptq_cfg section (quant_cfg + algorithm) ``recipe.yml``: .. code-block:: yaml metadata: recipe_type: ptq description: My custom NVFP4 recipe. ``ptq_cfg.yml``: .. code-block:: yaml algorithm: max quant_cfg: - quantizer_path: '*' enable: false - quantizer_path: '*weight_quantizer' cfg: num_bits: e2m1 block_sizes: {-1: 16, type: dynamic, scale_bits: e4m3} - quantizer_path: '*input_quantizer' cfg: num_bits: e4m3 axis: Metadata section ================ Every recipe file must contain a ``metadata`` mapping with at least a ``recipe_type`` field: .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Field - Required - Description * - ``recipe_type`` - Yes - The optimization category. Currently only ``"ptq"`` is supported. * - ``description`` - No - A human-readable summary of what the recipe does. PTQ configuration section ========================= For PTQ recipes (``recipe_type: ptq``), the ``ptq_cfg`` mapping contains: .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Field - Required - Description * - ``quant_cfg`` - Yes - An ordered list of :class:`~modelopt.torch.quantization.config.QuantizerCfgEntry` dicts. See :ref:`quant-cfg` for the full specification of entries, ordering semantics, and atomicity rules. * - ``algorithm`` - No - The calibration algorithm: ``"max"`` (default), ``"mse"``, ``"smoothquant"``, ``"awq_lite"``, ``"awq_full"``, ``"awq_clip"``, ``"gptq"``, or ``null`` for formats that need no calibration (e.g. MX formats). ExMy floating-point notation ============================= Recipe files support a convenient shorthand for floating-point bit formats in ``num_bits`` and ``scale_bits`` fields. Instead of writing a Python tuple, you write the format name directly: .. code-block:: yaml num_bits: e4m3 # automatically converted to (4, 3) scale_bits: e8m0 # automatically converted to (8, 0) The notation is case-insensitive (``E4M3``, ``e4m3``, ``E4m3`` all work). The conversion is performed by :func:`~modelopt.recipe.load_config` when loading any YAML file, so it works in both recipe files and standalone config files. Common formats: .. list-table:: :header-rows: 1 :widths: 15 15 70 * - Notation - Tuple - Description * - ``e4m3`` - ``(4, 3)`` - FP8 E4M3 -- standard FP8 weight/activation format * - ``e5m2`` - ``(5, 2)`` - FP8 E5M2 -- wider dynamic range, used for gradients * - ``e2m1`` - ``(2, 1)`` - FP4 E2M1 -- NVFP4 weight format * - ``e8m0`` - ``(8, 0)`` - E8M0 -- MX block scaling format Built-in recipes ================ ModelOpt ships a library of built-in recipes under the ``modelopt_recipes/`` package. These are bundled with the Python distribution and can be referenced by their relative path (without the ``modelopt_recipes/`` prefix). General PTQ recipes ------------------- General recipes are model-agnostic and apply to any supported architecture: .. list-table:: :header-rows: 1 :widths: 40 60 * - Recipe path - Description * - ``general/ptq/fp8_default-fp8_kv`` - FP8 per-tensor W8A8, FP8 KV cache, max calibration * - ``general/ptq/nvfp4_default-fp8_kv`` - NVFP4 W4A4 with FP8 KV cache, max calibration * - ``general/ptq/nvfp4_mlp_only-fp8_kv`` - NVFP4 for MLP layers only, FP8 KV cache * - ``general/ptq/nvfp4_experts_only-fp8_kv`` - NVFP4 for MoE expert layers only, FP8 KV cache * - ``general/ptq/nvfp4_omlp_only-fp8_kv`` - NVFP4 for output projection + MLP layers, FP8 KV cache Model-specific recipes ---------------------- Model-specific recipes are tuned for a particular architecture and live under ``models//``: .. list-table:: :header-rows: 1 :widths: 40 60 * - Recipe path - Description * - ``models/Step3.5-Flash/nvfp4-mlp-only`` - NVFP4 MLP-only for Step 3.5 Flash MoE model Loading recipes =============== Python API ---------- Use :func:`~modelopt.recipe.load_recipe` to load a recipe. The path is resolved against the built-in library first, then the filesystem: .. code-block:: python from modelopt.recipe import load_recipe, ModelOptPTQRecipe # Load a built-in recipe by relative path (suffix optional) recipe = load_recipe("general/ptq/fp8_default-fp8_kv") assert isinstance(recipe, ModelOptPTQRecipe) # The ptq_cfg dict can be passed directly to mtq.quantize() import modelopt.torch.quantization as mtq model = mtq.quantize(model, recipe.ptq_cfg, forward_loop) .. code-block:: python # Load a custom recipe from the filesystem recipe = load_recipe("/path/to/my_custom_recipe.yml") model = mtq.quantize(model, recipe.ptq_cfg, forward_loop) Command-line usage ------------------ The ``hf_ptq.py`` example accepts a ``--recipe`` flag: .. code-block:: bash python examples/llm_ptq/hf_ptq.py \ --model Qwen/Qwen3-8B \ --recipe general/ptq/fp8_default-fp8_kv \ --export_path build/fp8 \ --calib_size 512 \ --export_fmt hf When ``--recipe`` is provided, the script loads the recipe and uses its ``ptq_cfg`` directly, bypassing the ``--qformat`` / ``--kv_cache_qformat`` flags. Loading standalone configs -------------------------- :func:`~modelopt.recipe.load_config` loads arbitrary YAML config files with automatic ``ExMy`` conversion and built-in path resolution. This is useful for loading shared configuration fragments: .. code-block:: python from modelopt.recipe import load_config cfg = load_config("configs/some_shared_config") Path resolution =============== Both :func:`~modelopt.recipe.load_recipe` and :func:`~modelopt.recipe.load_config` resolve paths using the same strategy: 1. If the path is absolute, use it directly. 2. If relative, check the **built-in recipes library** first (``modelopt_recipes/``), probing ``.yml`` and ``.yaml`` suffixes. 3. Then check the **filesystem**, probing the same suffixes. This means built-in recipes can be referenced without any prefix: .. code-block:: python # These are all equivalent: load_recipe("general/ptq/fp8_default-fp8_kv") load_recipe("general/ptq/fp8_default-fp8_kv.yml") Writing a custom recipe ======================= To create a custom recipe: 1. Start from an existing recipe that is close to your target configuration. 2. Copy it and modify the ``quant_cfg`` entries as needed (see :ref:`quant-cfg` for entry format details). 3. Update the ``metadata.description`` to describe your changes. 4. Save the file and pass its path to ``load_recipe()`` or ``--recipe``. Example -- creating an INT8 per-channel recipe: .. code-block:: yaml # my_int8_recipe.yml metadata: recipe_type: ptq description: INT8 per-channel weight, per-tensor activation. ptq_cfg: algorithm: max quant_cfg: - quantizer_path: '*' enable: false - quantizer_path: '*weight_quantizer' cfg: num_bits: 8 axis: 0 - quantizer_path: '*input_quantizer' cfg: num_bits: 8 axis: - quantizer_path: '*lm_head*' enable: false - quantizer_path: '*output_layer*' enable: false Recipe repository layout ======================== The ``modelopt_recipes/`` package is organized as follows: .. code-block:: text modelopt_recipes/ +-- __init__.py +-- general/ # Model-agnostic recipes | +-- ptq/ | +-- fp8_default-fp8_kv.yml | +-- nvfp4_default-fp8_kv.yml | +-- nvfp4_mlp_only-fp8_kv.yml | +-- nvfp4_experts_only-fp8_kv.yml | +-- nvfp4_omlp_only-fp8_kv.yml +-- models/ # Model-specific recipes | +-- Step3.5-Flash/ | +-- nvfp4-mlp-only.yaml +-- configs/ # Shared configuration fragments Recipe data model ================= Recipes are validated at load time using Pydantic models: :class:`~modelopt.recipe.config.ModelOptRecipeBase` Base class for all recipe types. Contains ``recipe_type`` and ``description``. :class:`~modelopt.recipe.config.ModelOptPTQRecipe` PTQ-specific recipe. Adds the ``ptq_cfg`` field (a dict with ``quant_cfg`` and ``algorithm``). :class:`~modelopt.recipe.config.RecipeType` Enum of supported recipe types. Currently only ``PTQ``. Future directions ================= The recipe system is designed to grow: * **QAT recipes** -- ``recipe_type: qat`` with training hyperparameters, distillation settings, and dataset configuration. * **Sparsity recipes** -- structured and unstructured pruning configurations. * **Speculative decoding recipes** -- draft model and vocabulary calibration settings. * **Composite recipes** -- chaining multiple optimization stages (e.g., quantize then prune) in a single recipe. * **Dataset configuration** -- standardized ``dataset`` section for calibration data specification. * **Recipe merging and override utilities** -- programmatic tools to compose and customize recipes. * **Unified entry point** -- a ``nv-modelopt`` CLI that accepts ``--recipe`` as the primary configuration mechanism, replacing per-example scripts.