earth2studio.data.CMIP6#

class earth2studio.data.CMIP6(experiment_id, source_id, table_id, variant_label, file_start=None, file_end=None, cache=True, verbose=True)[source]#

CMIP6 data source for Earth2Studio.

This class provides a thin convenience wrapper around the intake-esgf catalog that hosts the Coupled Model Inter-comparison Project Phase 6 (CMIP6) archive. This is meant to provide a seemless interface to the CMIP6 archive for Earth2Studio however the CMIP6 archive is very large there may be data that will break this interface. Currently this supports both atmospheric and oceanic datasets.

Parameters:
  • experiment_id (str) – CMIP6 experiment identifier (e.g. “historical”, “ssp585”)

  • source_id (str) – CMIP6 model identifier (e.g. “MPI-ESM1-2-LR”)

  • table_id (str) – CMOR table describing variable realm/frequency (“Amon”, “Omon”, “SImon”)

  • variant_label (str) – Ensemble member / initial-condition label such as “r1i1p1f1”.

  • file_start (str, optional) – Optional filename prefix filters forwarded to ESGFCatalog.search to constrain the final dataset selection. Leave None to accept all, by default None

  • file_end (str, optional) – Optional filename suffix filters forwarded to ESGFCatalog.search to constrain the final dataset selection. Leave None to accept all, by default None

  • cache (bool, optional) – Cache data source on local memory, by default True

  • verbose (bool, optional) – Print download progress, by default True

Warning

This is a remote data source and can potentially download a large amount of data to your local machine for large requests. The source data used is provided in an unoptimized format.

Note

Additional information on the CMIP6 data repository can be referenced here:

The intake-esgf package is used to search the CMIP6 data repository and load the data into an xarray dataset. Additional information on the intake-esgf package can be referenced here:

Note

This data source will retrieve the closest time possible, depending on the experiment this may be significantly different than what was requested.

__call__(time, variable)[source]#

Function to get data.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return data for.

  • variable (str | list[str] | VariableArray) – Strings or list of strings that refer to variables to return.

Returns:

Loaded data array

Return type:

xr.DataArray

classmethod available(time, experiment_id, source_id, table_id, variant_label)[source]#

Check if the requested exact timestamp exists in the ESGF archive.

Parameters:
  • time (datetime | np.datetime64) – Timestamp to test (UTC).

  • experiment_id (str) – CMIP6 experiment identifier (e.g. “historical”, “ssp585”).

  • source_id (str) – CMIP6 model identifier (e.g. “MPI-ESM1-2-LR”).

  • table_id (str) – CMOR table describing variable realm/frequency (“Amon”, “Omon”, “SImon” …).

  • variant_label (str) – Ensemble member / initial-condition label such as “r1i1p1f1”.

Return type:

bool

Notes

The check performs a lightweight ESGF search restricted to a one-day window surrounding time. If any dataset is returned and the target timestamp lies within the dataset’s time span, True is returned. Otherwise returns False.