HealDA Global Data Assimilation#

Producing a global weather analysis from satellite and in-situ observations.

This example demonstrates how to use the HealDA data assimilation model to produce a global weather analysis on a HEALPix grid from sparse in-situ (conventional) and satellite radiance observations sourced from the NOAA UFS replay archive. Three runs are compared: conventional observations only, satellite observations only, and both combined to illustrate the impact of each observation type.

In this example you will learn:

  • How to load and initialise the HealDA data assimilation model

  • Fetching UFS conventional and satellite observation DataFrames

  • Running the model with different observation combinations

  • Comparing the assimilated global fields against ERA5 data

# /// script
# dependencies = [
#   "earth2studio[da-healda] @ git+https://github.com/NVIDIA/earth2studio.git",
#   "cartopy",
# ]
# ///

Set Up#

This example requires the following components:

HealDA is a stateless neural-network-based data assimilation model that ingests conventional (radiosonde, surface station, GPS-RO, etc.) and satellite radiance observations and produces a single global weather analysis on a HEALPix level-6 grid.

import os

os.makedirs("outputs", exist_ok=True)
from dotenv import load_dotenv

load_dotenv()  # TODO: make common example prep function

from datetime import timedelta

import numpy as np
import torch
from loguru import logger
from tqdm import tqdm

logger.remove()
logger.add(lambda msg: tqdm.write(msg, end=""), colorize=True)

from earth2studio.data import NCAR_ERA5, UFSObsConv, UFSObsSat, fetch_dataframe
from earth2studio.models.da import HealDA

# Load the default model package (downloads checkpoint from HuggingFace)
# Setting lat_lon=True regrids the native HEALPix output to a regular lat-lon grid.
package = HealDA.load_default_package()
model = HealDA.load_model(package, lat_lon=True)
model = model.to("cuda:0")

Fetch Observations#

Pull conventional and satellite observation DataFrames for the analysis time. The UFS data sources return pandas DataFrames that match the schemas expected by HealDA.input_coords(). We use earth2studio.data.fetch_dataframe() which attaches request_time metadata required by the model. The time_tolerance parameter defines a time window around the analysis time so that observations will be retrieved for.

analysis_time = np.array([np.datetime64("2024-01-01T00:00")])

conv_source = UFSObsConv(time_tolerance=(timedelta(hours=-21), timedelta(hours=3)))
conv_schema, sat_schema = model.input_coords()
conv_df = fetch_dataframe(
    conv_source,
    time=analysis_time,
    variable=np.array(conv_schema["variable"]),
    fields=np.array(list(conv_schema.keys())),
)
logger.info(f"Fetched {len(conv_df)} conventional observations")

sat_source = UFSObsSat(time_tolerance=(timedelta(hours=-21), timedelta(hours=3)))
sat_df = fetch_dataframe(
    sat_source,
    time=analysis_time,
    variable=np.array(sat_schema["variable"]),
    fields=np.array(list(sat_schema.keys())),
)
logger.info(f"Fetched {len(sat_df)} satellite observations")
Fetching GSI files:   0%|          | 0/25 [00:00<?, ?it/s]
Fetching GSI files: 100%|██████████| 25/25 [00:00<00:00, 21500.43it/s]
2026-03-23 20:52:55.285 | INFO     | __main__:<module>:108 - Fetched 8515112 conventional observations

Fetching GSI files:   0%|          | 0/90 [00:00<?, ?it/s]

2026-03-23 20:52:55.599 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n16_ges.2023123106_control.nc4 not found

Fetching GSI files:   0%|          | 0/90 [00:00<?, ?it/s]
Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.606 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n17_ges.2023123100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.668 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n15_ges.2023123106_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.677 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n16_ges.2024010100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.738 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_n17_ges.2024010100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.750 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_mhs_n18_ges.2023123112_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.809 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n16_ges.2023123100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.819 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_metop-a_ges.2024010100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.880 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_metop-a_ges.2023123112_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.895 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_mhs_n18_ges.2023123100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.949 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_n17_ges.2023123100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:55.974 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n17_ges.2023123118_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.020 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_mhs_metop-a_ges.2023123112_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.047 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_metop-a_ges.2023123118_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.088 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_n17_ges.2023123118_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.118 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n17_ges.2024010100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.156 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_mhs_metop-a_ges.2024010100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.191 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_mhs_n18_ges.2023123106_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.227 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_n16_ges.2023123100_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.262 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_n17_ges.2023123112_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]

2026-03-23 20:52:56.294 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_n17_ges.2023123106_control.nc4 not found

Fetching GSI files:  57%|█████▋    | 51/90 [00:00<00:00, 197.28it/s]
Fetching GSI files:  79%|███████▉  | 71/90 [00:00<00:00, 63.58it/s]

2026-03-23 20:52:56.361 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_n16_ges.2024010100_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.363 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n16_ges.2023123118_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.429 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_mhs_metop-a_ges.2023123118_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.433 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_mhs_metop-a_ges.2023123100_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.497 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n15_ges.2023123100_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.503 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n17_ges.2023123112_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.565 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_n16_ges.2023123112_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.576 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_metop-a_ges.2023123106_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.650 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n15_ges.2024010100_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]

2026-03-23 20:52:56.656 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n16_ges.2023123112_control.nc4 not found

Fetching GSI files:  79%|███████▉  | 71/90 [00:01<00:00, 63.58it/s]
Fetching GSI files:  90%|█████████ | 81/90 [00:01<00:00, 50.26it/s]

2026-03-23 20:52:56.718 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_mhs_n18_ges.2024010100_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:01<00:00, 50.26it/s]

2026-03-23 20:52:59.772 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n17_ges.2023123106_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]

2026-03-23 20:52:59.792 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_mhs_metop-a_ges.2023123106_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]

2026-03-23 20:52:59.796 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_mhs_n18_ges.2023123118_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]

2026-03-23 20:52:59.825 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_n16_ges.2023123106_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]

2026-03-23 20:52:59.827 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n15_ges.2023123118_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]

2026-03-23 20:52:59.834 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_metop-a_ges.2023123100_control.nc4 not found

Fetching GSI files:  90%|█████████ | 81/90 [00:04<00:00, 50.26it/s]
Fetching GSI files:  98%|█████████▊| 88/90 [00:04<00:00, 10.69it/s]

2026-03-23 20:52:59.861 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_n16_ges.2023123118_control.nc4 not found

Fetching GSI files:  98%|█████████▊| 88/90 [00:04<00:00, 10.69it/s]

2026-03-23 20:52:59.938 | WARNING  | earth2studio.data.ufs:_handle_missing_file:734 - File s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n15_ges.2023123112_control.nc4 not found

Fetching GSI files:  98%|█████████▊| 88/90 [00:04<00:00, 10.69it/s]
Fetching GSI files: 100%|██████████| 90/90 [00:04<00:00, 19.58it/s]
2026-03-23 20:53:02.507 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_mhs_metop-a_ges.2023123100_control.nc4
2026-03-23 20:53:02.508 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_mhs_metop-a_ges.2023123106_control.nc4
2026-03-23 20:53:02.508 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_mhs_metop-a_ges.2023123112_control.nc4
2026-03-23 20:53:02.508 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_mhs_metop-a_ges.2023123118_control.nc4
2026-03-23 20:53:02.508 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_mhs_metop-a_ges.2024010100_control.nc4
2026-03-23 20:53:04.146 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_mhs_n18_ges.2023123100_control.nc4
2026-03-23 20:53:04.147 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_mhs_n18_ges.2023123106_control.nc4
2026-03-23 20:53:04.147 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_mhs_n18_ges.2023123112_control.nc4
2026-03-23 20:53:04.147 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_mhs_n18_ges.2023123118_control.nc4
2026-03-23 20:53:04.147 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_mhs_n18_ges.2024010100_control.nc4
2026-03-23 20:53:04.933 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_metop-a_ges.2023123100_control.nc4
2026-03-23 20:53:04.934 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_metop-a_ges.2023123106_control.nc4
2026-03-23 20:53:04.934 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_metop-a_ges.2023123112_control.nc4
2026-03-23 20:53:04.934 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_metop-a_ges.2023123118_control.nc4
2026-03-23 20:53:04.934 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_metop-a_ges.2024010100_control.nc4
2026-03-23 20:53:08.352 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_n16_ges.2023123100_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_n16_ges.2023123106_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_n16_ges.2023123112_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_n16_ges.2023123118_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_n16_ges.2024010100_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsua_n17_ges.2023123100_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsua_n17_ges.2023123106_control.nc4
2026-03-23 20:53:08.353 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsua_n17_ges.2023123112_control.nc4
2026-03-23 20:53:08.354 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsua_n17_ges.2023123118_control.nc4
2026-03-23 20:53:08.354 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsua_n17_ges.2024010100_control.nc4
2026-03-23 20:53:10.287 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n15_ges.2023123100_control.nc4
2026-03-23 20:53:10.287 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n15_ges.2023123106_control.nc4
2026-03-23 20:53:10.287 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n15_ges.2023123112_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n15_ges.2023123118_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n15_ges.2024010100_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n16_ges.2023123100_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n16_ges.2023123106_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n16_ges.2023123112_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n16_ges.2023123118_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n16_ges.2024010100_control.nc4
2026-03-23 20:53:10.288 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123100/gsi/diag_amsub_n17_ges.2023123100_control.nc4
2026-03-23 20:53:10.289 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123106/gsi/diag_amsub_n17_ges.2023123106_control.nc4
2026-03-23 20:53:10.289 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123112/gsi/diag_amsub_n17_ges.2023123112_control.nc4
2026-03-23 20:53:10.289 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2023/12/2023123118/gsi/diag_amsub_n17_ges.2023123118_control.nc4
2026-03-23 20:53:10.289 | WARNING  | earth2studio.data.ufs:_compile_dataframe:233 - Cached file missing for s3://noaa-ufs-gefsv13replay-pds/2024/01/2024010100/gsi/diag_amsub_n17_ges.2024010100_control.nc4
2026-03-23 20:53:10.426 | INFO     | __main__:<module>:117 - Fetched 4378286 satellite observations

Observation Locations#

Plot the spatial distribution of conventional and satellite observations to visualise their coverage before running the assimilation. There are 12-14 million observations typically for the model’s 24-hour time window.

import cartopy.crs as ccrs
import matplotlib.pyplot as plt

plt.close("all")
fig, axes = plt.subplots(
    1,
    2,
    subplot_kw={"projection": ccrs.Robinson()},
    figsize=(16, 4),
)

# Conventional observations
ax = axes[0]
ax.set_global()
ax.coastlines(linewidth=0.5)
ax.gridlines(linewidth=0.3, alpha=0.5)
ax.scatter(
    conv_df["lon"].values[::10],
    conv_df["lat"].values[::10],
    s=0.1,
    alpha=0.3,
    c="tab:blue",
    transform=ccrs.PlateCarree(),
)
ax.set_title(f"Conventional obs (n={len(conv_df):,})", fontsize=13)

# Satellite observations
ax = axes[1]
ax.set_global()
ax.coastlines(linewidth=0.5)
ax.gridlines(linewidth=0.3, alpha=0.5)
ax.scatter(
    sat_df["lon"].values[::10],
    sat_df["lat"].values[::10],
    s=0.1,
    alpha=0.3,
    c="tab:orange",
    transform=ccrs.PlateCarree(),
)
ax.set_title(f"Satellite obs (n={len(sat_df):,})", fontsize=13)
fig.suptitle(
    f"Observation Locations {str(analysis_time[0])[:16]} UTC",
    fontsize=15,
)
plt.tight_layout()
plt.savefig("outputs/22_healda_obs_locations.jpg", dpi=150)
Observation Locations 2024-01-01T00:00 UTC, Conventional obs (n=8,515,112), Satellite obs (n=4,378,286)

DA models can be called directly for stateless inference or via create_generator() for stateful (iterative) assimilation workflows. Here we use the direct call API to invoke the model.

HealDA is designed to work with the (-21, 3) hour observation window from the UFS replay archive using both conventional and satellite observations. However, the DA model interface is flexible enough to accept different time windows and observation sources. Below we test three configurations - conventional only, satellite only, and proper combined - to illustrate the impact each observation type has on the analysis.

torch.manual_seed(42)
result_both = model(conv_obs=conv_df, sat_obs=sat_df)
logger.info(f"Combined analysis shape: {result_both.shape}")
2026-03-23 20:53:50.762 | INFO     | __main__:<module>:188 - Combined analysis shape: (1, 74, 181, 360)
torch.manual_seed(42)
result_sat = model(sat_obs=sat_df)
logger.info(f"Sat-only analysis shape: {result_sat.shape}")
2026-03-23 20:53:54.321 | INFO     | __main__:<module>:193 - Sat-only analysis shape: (1, 74, 181, 360)
torch.manual_seed(42)
result_conv = model(conv_obs=conv_df)
logger.info(f"Conv-only analysis shape: {result_conv.shape}")
2026-03-23 20:53:59.067 | INFO     | __main__:<module>:198 - Conv-only analysis shape: (1, 74, 181, 360)

Post Processing#

Because we loaded the model with lat_lon=True the output is already on a regular equiangular lat-lon grid, so no manual regridding is needed. Compare the three runs for surface temperature (t2m) and geopotential 500 hPa (z500). Each row shows a different observation configuration.

plt.close("all")
plot_vars = ["t2m", "z500"]
titles = ["Conv + Sat", "Sat only", "Conv only"]
results = [result_both, result_sat, result_conv]
projection = ccrs.Robinson()

fig, axes = plt.subplots(
    len(results),
    len(plot_vars),
    subplot_kw={"projection": projection},
    figsize=(14, 8),
)
fig.subplots_adjust(wspace=0.02, hspace=0.08, left=0.1, right=0.9)

lat = results[0].coords["lat"].values
lon = results[0].coords["lon"].values
cmaps = ["Spectral_r", "PRGn"]

for row, (title, da) in enumerate(zip(titles, results)):
    for col, var in enumerate(plot_vars):
        ax = axes[row, col]
        field = da.sel(variable=var).data[0].get()  # [nlat, nlon] cupy -> numpy
        im = ax.pcolormesh(
            lon,
            lat,
            field,
            transform=ccrs.PlateCarree(),
            cmap=cmaps[col],
        )
        ax.coastlines(linewidth=0.5)
        ax.gridlines(linewidth=0.3, alpha=0.5)
        fig.colorbar(im, ax=ax, shrink=0.6)
        if row == 0:
            ax.set_title(var, fontsize=14)
        if col == 0:
            ax.text(
                -0.05,
                0.5,
                title,
                fontsize=12,
                va="bottom",
                ha="center",
                rotation="vertical",
                rotation_mode="anchor",
                transform=ax.transAxes,
            )

fig.suptitle(f"HealDA Analysis {str(analysis_time[0])[:16]} UTC", fontsize=18, y=0.97)
plt.tight_layout()
plt.savefig("outputs/22_healda_analysis.jpg", dpi=150)
HealDA Analysis 2024-01-01T00:00 UTC, t2m, z500

HealDA vs ERA5#

Next fetch ERA5 reanalysis at 0.25° resolution from the NCAR archive to compare against the assimilated fields. HealDA outputs standard Earth2Studio variable names so we can query ERA5 with the same identifiers. We expect the runs that are missing an observation source to show larger errors, while the combined run yields the most accurate global prediction.

era5_ds = NCAR_ERA5()
era5_da = era5_ds(analysis_time, plot_vars)
era5_interp = era5_da.interp(lat=lat, lon=lon, method="nearest")
Fetching NCAR ERA5 data:   0%|          | 0/2 [00:00<?, ?it/s]

2026-03-23 20:54:32.879 | DEBUG    | earth2studio.data.ncar:fetch_array:402 - Fetching NCAR ERA5 variable: 2T in file s3://nsf-ncar-era5/e5.oper.an.sfc/202401/e5.oper.an.sfc.128_167_2t.ll025sc.2024010100_2024013123.nc

Fetching NCAR ERA5 data:   0%|          | 0/2 [00:00<?, ?it/s]

2026-03-23 20:54:32.880 | DEBUG    | earth2studio.data.ncar:fetch_array:402 - Fetching NCAR ERA5 variable: Z in file s3://nsf-ncar-era5/e5.oper.an.pl/202401/e5.oper.an.pl.128_129_z.ll025sc.2024010100_2024010123.nc

Fetching NCAR ERA5 data:   0%|          | 0/2 [00:00<?, ?it/s]
Fetching NCAR ERA5 data: 100%|██████████| 2/2 [00:00<00:00, 23.89it/s]
diff_titles = ["Conv+Sat - ERA5", "Sat - ERA5", "Conv - ERA5"]
diff_results = [result_both, result_sat, result_conv]
for title, da_pred in zip(diff_titles, diff_results):
    for var in plot_vars:
        field_pred = da_pred.sel(variable=var).data[0]
        if hasattr(field_pred, "get"):
            field_pred = field_pred.get()
        field_era5 = era5_interp.sel(variable=var).data[0]
        mae = float(np.abs(field_pred - field_era5).mean())
        logger.info(f"{title} | {var} MAE: {mae:.4f}")
2026-03-23 20:54:33.022 | INFO     | __main__:<module>:284 - Conv+Sat - ERA5 | t2m MAE: 0.7994
2026-03-23 20:54:33.023 | INFO     | __main__:<module>:284 - Conv+Sat - ERA5 | z500 MAE: 51.2045
2026-03-23 20:54:33.024 | INFO     | __main__:<module>:284 - Sat - ERA5 | t2m MAE: 0.8488
2026-03-23 20:54:33.024 | INFO     | __main__:<module>:284 - Sat - ERA5 | z500 MAE: 164.9523
2026-03-23 20:54:33.025 | INFO     | __main__:<module>:284 - Conv - ERA5 | t2m MAE: 4.3312
2026-03-23 20:54:33.026 | INFO     | __main__:<module>:284 - Conv - ERA5 | z500 MAE: 571.0181
plt.close("all")

diff_ranges = {"t2m": (-10, 10), "z500": (-500, 500)}
fig, axes = plt.subplots(
    len(diff_results),
    len(plot_vars),
    subplot_kw={"projection": projection},
    figsize=(14, 8),
)
fig.subplots_adjust(wspace=0.02, hspace=0.08, left=0.1, right=0.9)

for row, (title, da_pred) in enumerate(zip(diff_titles, diff_results)):
    for col, var in enumerate(plot_vars):
        ax = axes[row, col]
        field_pred = (
            da_pred.sel(variable=var).data[0].get()
        )  # [nlat, nlon] cupy -> numpy
        field_era5 = era5_interp.sel(variable=var).data[0]  # [nlat, nlon]
        diff = field_pred - field_era5
        im = ax.pcolormesh(
            lon,
            lat,
            diff,
            transform=ccrs.PlateCarree(),
            cmap="RdBu_r",
            vmin=diff_ranges[var][0],
            vmax=diff_ranges[var][1],
        )
        ax.coastlines(linewidth=0.5)
        ax.gridlines(linewidth=0.3, alpha=0.5)
        fig.colorbar(im, ax=ax, shrink=0.6)
        if row == 0:
            ax.set_title(var, fontsize=14)
        if col == 0:
            bbox = ax.get_position()
            ax.text(
                -0.05,
                0.5,
                title,
                fontsize=12,
                va="bottom",
                ha="center",
                rotation="vertical",
                rotation_mode="anchor",
                transform=ax.transAxes,
            )

fig.suptitle(
    f"HealDA Analysis Error {str(analysis_time[0])[:16]} UTC",
    fontsize=18,
    y=0.97,
)
plt.savefig("outputs/22_healda_differences.jpg", dpi=150, bbox_inches="tight")
HealDA Analysis Error 2024-01-01T00:00 UTC, t2m, z500

Total running time of the script: (2 minutes 20.222 seconds)

Gallery generated by Sphinx-Gallery