earth2studio.data.CMIP6#

class earth2studio.data.CMIP6(experiment_id, source_id, table_id, variant_label, file_start=None, file_end=None, cache=True, verbose=True, exact_time_match=False)[source]#

CMIP6 data source for Earth2Studio.

This class provides a thin convenience wrapper around the intake-esgf catalog that hosts the Coupled Model Inter-comparison Project Phase 6 (CMIP6) archive. This is meant to provide a seemless interface to the CMIP6 archive for Earth2Studio however the CMIP6 archive is very large there may be data that will break this interface. Currently this supports both atmospheric and oceanic datasets.

Parameters:
  • experiment_id (str) – CMIP6 experiment identifier (e.g. “historical”, “ssp585”)

  • source_id (str) – CMIP6 model identifier (e.g. “MPI-ESM1-2-LR”)

  • table_id (str) – CMOR table describing variable realm/frequency (“Amon”, “Omon”, “SImon”)

  • variant_label (str) – Ensemble member / initial-condition label such as “r1i1p1f1”.

  • file_start (str, optional) – Optional filename prefix filters forwarded to ESGFCatalog.search to constrain the final dataset selection. Leave None to accept all, by default None

  • file_end (str, optional) – Optional filename suffix filters forwarded to ESGFCatalog.search to constrain the final dataset selection. Leave None to accept all, by default None

  • cache (bool, optional) – Cache data source on local memory, by default True. Multiple CMIP6 instances can safely share the same cache directory as intake-esgf automatically organizes files into detailed subdirectories by project, model, experiment, variant, variable, and version, preventing any conflicts.

  • verbose (bool, optional) – Print download progress, by default True

  • exact_time_match (bool, optional) – If True, raise an error when requested times don’t match dataset times exactly. If False (default), use nearest neighbor time matching and issue a warning, by default False

Warning

This is a remote data source and can potentially download a large amount of data to your local machine for large requests. The source data used is provided in an unoptimized format.

Note

Additional information on the CMIP6 data repository can be referenced here:

The intake-esgf package is used to search the CMIP6 data repository and load the data into an xarray dataset. Additional information on the intake-esgf package can be referenced here:

Note

By default, this data source will retrieve the closest time available using nearest neighbor matching. Depending on the experiment and temporal resolution, this may be significantly different than what was requested. Set exact_time_match=True to enforce exact time matching if precise timestamps are critical.

__call__(time, variable)[source]#

Function to get data.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return data for.

  • variable (str | list[str] | VariableArray) – Strings or list of strings that refer to variables to return.

Returns:

Loaded data array

Return type:

xr.DataArray

classmethod available(time, experiment_id, source_id, table_id, variant_label)[source]#

Check if the requested exact timestamp exists in the ESGF archive.

Parameters:
  • time (datetime | np.datetime64) – Timestamp to test (UTC).

  • experiment_id (str) – CMIP6 experiment identifier (e.g. “historical”, “ssp585”).

  • source_id (str) – CMIP6 model identifier (e.g. “MPI-ESM1-2-LR”).

  • table_id (str) – CMOR table describing variable realm/frequency (“Amon”, “Omon”, “SImon” …).

  • variant_label (str) – Ensemble member / initial-condition label such as “r1i1p1f1”.

Return type:

bool

Warning

This method downloads data from ESGF servers to verify the time coordinate range. It is not a lightweight metadata-only check.

Notes

The check performs an ESGF search and downloads at least one file to read its time coordinate bounds. If the target timestamp lies within the dataset’s time span, True is returned. Otherwise returns False.