.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/05_ensemble_workflow_extend.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_05_ensemble_workflow_extend.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_05_ensemble_workflow_extend.py:


Single Variable Perturbation Method
===================================

Intermediate ensemble inference using a custom perturbation method.

This example will demonstrate how to run a an ensemble inference workflow
with a custom perturbation method that only applies noise to a specific variable.

In this example you will learn:

- How to extend an existing pertubration method
- How to instantiate a built in prognostic model
- Creating a data source and IO object
- Running a simple built in workflow
- Extend a built-in method using custom code.
- Post-processing results

.. GENERATED FROM PYTHON SOURCE LINES 36-43

.. code-block:: Python

    # /// script
    # dependencies = [
    #   "earth2studio[dlwp,perturbation] @ git+https://github.com/NVIDIA/earth2studio.git",
    #   "matplotlib",
    # ]
    # ///


.. GENERATED FROM PYTHON SOURCE LINES 44-49

Set Up
------
All workflows inside Earth2Studio require constructed components to be
handed to them. In this example, we will use the built in ensemble workflow
:py:meth:`earth2studio.run.ensemble`.

.. GENERATED FROM PYTHON SOURCE LINES 51-55

.. literalinclude:: ../../earth2studio/run.py
   :language: python
   :start-after: # sphinx - ensemble start
   :end-before: # sphinx - ensemble end

.. GENERATED FROM PYTHON SOURCE LINES 57-63

We need the following:

- Prognostic Model: Use the built in DLWP model :py:class:`earth2studio.models.px.DLWP`.
- perturbation_method: Extend the Spherical Gaussian Method :py:class:`earth2studio.perturbation.SphericalGaussian`.
- Datasource: Pull data from the GFS data api :py:class:`earth2studio.data.GFS`.
- IO Backend: Save the outputs into a Zarr store :py:class:`earth2studio.io.ZarrBackend`.

.. GENERATED FROM PYTHON SOURCE LINES 65-89

.. code-block:: Python

    import os

    os.makedirs("outputs", exist_ok=True)
    from dotenv import load_dotenv

    load_dotenv()  # TODO: make common example prep function

    import numpy as np
    import torch

    from earth2studio.data import GFS
    from earth2studio.io import ZarrBackend
    from earth2studio.models.px import DLWP
    from earth2studio.perturbation import Perturbation, SphericalGaussian
    from earth2studio.run import ensemble
    from earth2studio.utils.type import CoordSystem

    # Load the default model package which downloads the check point from NGC
    package = DLWP.load_default_package()
    model = DLWP.load_model(package)

    # Create the data source
    data = GFS()


.. GENERATED FROM PYTHON SOURCE LINES 90-93

The perturbation method in :ref:`sphx_glr_examples_03_ensemble_workflow.py` is naive because it
applies the same noise amplitude to every variable. We can create a custom wrapper
that only applies the perturbation method to a particular variable instead.

.. GENERATED FROM PYTHON SOURCE LINES 96-130

.. code-block:: Python

    class ApplyToVariable:
        """Apply a perturbation to only a particular variable."""

        def __init__(self, pm: Perturbation, variable: str | list[str]):
            self.pm = pm
            if isinstance(variable, str):
                variable = [variable]
            self.variable = variable

        @torch.inference_mode()
        def __call__(
            self,
            x: torch.Tensor,
            coords: CoordSystem,
        ) -> tuple[torch.Tensor, CoordSystem]:
            # Apply perturbation
            xp, _ = self.pm(x, coords)
            # Add perturbed slice back into original tensor
            ind = np.isin(coords["variable"], self.variable)
            x[..., ind, :, :] = xp[..., ind, :, :]
            return x, coords


    # Generate a new noise amplitude that specifically targets 't2m' with a 1 K noise amplitude
    avsg = ApplyToVariable(SphericalGaussian(noise_amplitude=1.0), "t2m")

    # Create the IO handler, store in memory
    chunks = {"ensemble": 1, "time": 1, "lead_time": 1}
    io = ZarrBackend(
        file_name="outputs/05_ensemble_avsg.zarr",
        chunks=chunks,
        backend_kwargs={"overwrite": True},
    )


.. GENERATED FROM PYTHON SOURCE LINES 131-140

Execute the Workflow
--------------------
With all components initialized, running the workflow is a single line of Python code.
Workflow will return the provided IO object back to the user, which can be used to
then post process. Some have additional APIs that can be handy for post-processing or
saving to file. Check the API docs for more information.

For the forecast we will predict for 10 steps (for FCN, this is 60 hours) with 8 ensemble
members which will be ran in 2 batches with batch size 4.

.. GENERATED FROM PYTHON SOURCE LINES 142-157

.. code-block:: Python

    nsteps = 10
    nensemble = 8
    batch_size = 4
    io = ensemble(
        ["2024-01-01"],
        nsteps,
        nensemble,
        model,
        data,
        io,
        avsg,
        batch_size=batch_size,
        output_coords={"variable": np.array(["t2m", "tcwv"])},
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2025-07-18 21:19:56.523 | INFO     | earth2studio.run:ensemble:315 - Running ensemble inference!
    2025-07-18 21:19:56.523 | INFO     | earth2studio.run:ensemble:323 - Inference device: cuda
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.539 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 397402829-996456
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.562 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 294691465-856457
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.584 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 329116923-847018
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.607 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 208052937-721817
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.629 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 251230645-803982
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.650 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 408062467-879185
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.672 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20231231/18/atmos/gfs.t18z.pgrb2.0p25.f000 420029701-1181204
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]    Fetching GFS data:  14%|█▍        | 1/7 [00:00<00:00,  6.44it/s]    Fetching GFS data: 100%|██████████| 7/7 [00:00<00:00, 44.99it/s]
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.699 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 391722290-987401
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.724 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 204118947-720169
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.749 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 402321768-876246
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.774 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 246334297-805355
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.802 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 289307267-851916
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.829 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 323956279-837771
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]                                                            2025-07-18 21:19:56.856 | DEBUG    | earth2studio.data.gfs:fetch_array:380 - Fetching GFS grib file: noaa-gfs-bdp-pds/gfs.20240101/00/atmos/gfs.t00z.pgrb2.0p25.f000 414179964-1179422
    Fetching GFS data:   0%|          | 0/7 [00:00<?, ?it/s]    Fetching GFS data:  14%|█▍        | 1/7 [00:00<00:01,  5.42it/s]    Fetching GFS data: 100%|██████████| 7/7 [00:00<00:00, 37.87it/s]
    2025-07-18 21:19:56.915 | SUCCESS  | earth2studio.run:ensemble:345 - Fetched data from GFS
    2025-07-18 21:19:56.921 | WARNING  | earth2studio.io.zarr:add_array:200 - Datetime64 not supported in zarr 3.0, converting to int64 nanoseconds since epoch
    2025-07-18 21:19:56.924 | WARNING  | earth2studio.io.zarr:add_array:206 - Timedelta64 not supported in zarr 3.0, converting to int64 nanoseconds since epoch
    2025-07-18 21:19:56.937 | INFO     | earth2studio.run:ensemble:373 - Starting 8 Member Ensemble Inference with             2 number of batches.


    Total Ensemble Batches:   0%|          | 0/2 [00:00<?, ?it/s]
    Running batch 0 inference:   0%|          | 0/11 [00:00<?, ?it/s]
    Running batch 0 inference:   9%|▉         | 1/11 [00:00<00:03,  2.73it/s]
    Running batch 0 inference:  18%|█▊        | 2/11 [00:00<00:03,  2.76it/s]
    Running batch 0 inference:  27%|██▋       | 3/11 [00:01<00:02,  2.87it/s]
    Running batch 0 inference:  36%|███▋      | 4/11 [00:01<00:02,  2.87it/s]
    Running batch 0 inference:  45%|████▌     | 5/11 [00:01<00:02,  2.92it/s]
    Running batch 0 inference:  55%|█████▍    | 6/11 [00:02<00:01,  2.91it/s]
    Running batch 0 inference:  64%|██████▎   | 7/11 [00:02<00:01,  2.93it/s]
    Running batch 0 inference:  73%|███████▎  | 8/11 [00:02<00:01,  2.89it/s]
    Running batch 0 inference:  82%|████████▏ | 9/11 [00:03<00:00,  2.91it/s]
    Running batch 0 inference:  91%|█████████ | 10/11 [00:03<00:00,  2.85it/s]
    Running batch 0 inference: 100%|██████████| 11/11 [00:03<00:00,  2.89it/s]
                                                                              

    Total Ensemble Batches:  50%|█████     | 1/2 [00:07<00:07,  7.44s/it]
    Running batch 4 inference:   0%|          | 0/11 [00:00<?, ?it/s]
    Running batch 4 inference:   9%|▉         | 1/11 [00:00<00:03,  2.96it/s]
    Running batch 4 inference:  18%|█▊        | 2/11 [00:00<00:03,  2.86it/s]
    Running batch 4 inference:  27%|██▋       | 3/11 [00:01<00:02,  2.89it/s]
    Running batch 4 inference:  36%|███▋      | 4/11 [00:01<00:02,  2.86it/s]
    Running batch 4 inference:  45%|████▌     | 5/11 [00:01<00:02,  2.89it/s]
    Running batch 4 inference:  55%|█████▍    | 6/11 [00:02<00:01,  2.88it/s]
    Running batch 4 inference:  64%|██████▎   | 7/11 [00:02<00:01,  2.91it/s]
    Running batch 4 inference:  73%|███████▎  | 8/11 [00:02<00:01,  2.89it/s]
    Running batch 4 inference:  82%|████████▏ | 9/11 [00:03<00:00,  2.91it/s]
    Running batch 4 inference:  91%|█████████ | 10/11 [00:03<00:00,  2.89it/s]
    Running batch 4 inference: 100%|██████████| 11/11 [00:03<00:00,  2.91it/s]
                                                                              

    Total Ensemble Batches: 100%|██████████| 2/2 [00:14<00:00,  7.39s/it]    Total Ensemble Batches: 100%|██████████| 2/2 [00:14<00:00,  7.39s/it]
    2025-07-18 21:20:11.726 | SUCCESS  | earth2studio.run:ensemble:423 - Inference complete


.. GENERATED FROM PYTHON SOURCE LINES 158-165

Post Processing
---------------
The last step is to post process our results. Lets plot both the perturbed t2m field
and also the unperturbed tcwv field. First to confirm the perturbation method works as
expect, the initial state is plotted.

Notice that the Zarr IO function has additional APIs to interact with the stored data.

.. GENERATED FROM PYTHON SOURCE LINES 167-211

.. code-block:: Python

    import matplotlib.pyplot as plt

    forecast = "2024-01-01"


    def plot_(axi, data, title, cmap):
        """Simple plot util function"""
        im = axi.imshow(data, cmap=cmap)
        plt.colorbar(im, ax=axi, shrink=0.5, pad=0.04)
        axi.set_title(title)


    step = 0  # lead time = 24 hrs
    plt.close("all")

    # Create a figure and axes with the specified projection
    fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(10, 6))
    plot_(
        ax[0, 0],
        np.mean(io["t2m"][:, 0, step], axis=0),
        f"{forecast} - t2m - Lead time: {6*step}hrs - Mean",
        "coolwarm",
    )
    plot_(
        ax[0, 1],
        np.std(io["t2m"][:, 0, step], axis=0),
        f"{forecast} - t2m - Lead time: {6*step}hrs - Std",
        "coolwarm",
    )
    plot_(
        ax[1, 0],
        np.mean(io["tcwv"][:, 0, step], axis=0),
        f"{forecast} - tcwv - Lead time: {6*step}hrs - Mean",
        "Blues",
    )
    plot_(
        ax[1, 1],
        np.std(io["tcwv"][:, 0, step], axis=0),
        f"{forecast} - tcwv - Lead time: {6*step}hrs - Std",
        "Blues",
    )

    plt.savefig(f"outputs/05_{forecast}_{step}_ensemble.jpg")


.. image-sg:: /examples/images/sphx_glr_05_ensemble_workflow_extend_001.png
   :alt: 2024-01-01 - t2m - Lead time: 0hrs - Mean, 2024-01-01 - t2m - Lead time: 0hrs - Std, 2024-01-01 - tcwv - Lead time: 0hrs - Mean, 2024-01-01 - tcwv - Lead time: 0hrs - Std
   :srcset: /examples/images/sphx_glr_05_ensemble_workflow_extend_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 212-216

Due to the intrinsic coupling between all fields, we should expect all variables to
have some uncertainty for later lead times. Here the total column water vapor is
plotted at a lead time of 24 hours, note the variance in the members despite just
perturbing the temperature field.

.. GENERATED FROM PYTHON SOURCE LINES 218-237

.. code-block:: Python

    step = 4  # lead time = 24 hrs
    plt.close("all")

    # Create a figure and axes with the specified projection
    fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 3))
    plot_(
        ax[0],
        np.mean(io["tcwv"][:, 0, step], axis=0),
        f"{forecast} - tcwv - Lead time: {6*step}hrs - Mean",
        "Blues",
    )
    plot_(
        ax[1],
        np.std(io["tcwv"][:, 0, step], axis=0),
        f"{forecast} - tcwv - Lead time: {6*step}hrs - Std",
        "Blues",
    )

    plt.savefig(f"outputs/05_{forecast}_{step}_ensemble.jpg")


.. image-sg:: /examples/images/sphx_glr_05_ensemble_workflow_extend_002.png
   :alt: 2024-01-01 - tcwv - Lead time: 24hrs - Mean, 2024-01-01 - tcwv - Lead time: 24hrs - Std
   :srcset: /examples/images/sphx_glr_05_ensemble_workflow_extend_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 17.691 seconds)


.. _sphx_glr_download_examples_05_ensemble_workflow_extend.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 05_ensemble_workflow_extend.ipynb <05_ensemble_workflow_extend.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 05_ensemble_workflow_extend.py <05_ensemble_workflow_extend.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 05_ensemble_workflow_extend.zip <05_ensemble_workflow_extend.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_