GHCNDaily#

class earth2studio.data.GHCNDaily(
stations,
time_tolerance=numpy.timedelta64(0),
cache=True,
verbose=True,
async_timeout=600,
async_workers=16,
retries=3,
)[source]#
Global

NOAA’s Global Historical Climatology Network Daily (GHCN-D) is a dataset that contains daily climate summaries from land surface stations across the globe.

Parameters:
  • stations (list[str]) – GHCN station IDs (11-character strings, e.g. "USW00023273") to attempt to fetch data from.

  • time_tolerance (TimeTolerance, optional) – Time tolerance window for filtering observations. Accepts a single value (symmetric +/- window) or a tuple (lower, upper) for asymmetric windows, by default np.timedelta64(0)

  • cache (bool, optional) – Cache data source on local memory, by default True

  • verbose (bool, optional) – Print download progress and missing data warnings, by default True

  • async_timeout (int, optional) – Time in sec after which download will be cancelled if not finished successfully, by default 600

  • async_workers (int, optional) – Maximum number of concurrent async fetch tasks, by default 16

  • retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3

Warning

This is a remote data source and can potentially download a large amount of data to your local machine for large requests.

Note

To help get a list of possible station IDs, this class includes GHCNDaily.get_stations_bbox() which accepts a lat-lon bounding box and will return known station IDs. For more information on the stations, users should consult the ghcnd-stations.txt which can be accessed with GHCNDaily.get_station_metadata().

Example

# Southeast US, lat lon bounding box (lat min, lon min, lat max, lon max)
stations = GHCNDaily.get_stations_bbox((30, -90, 36, -80))
ds = GHCNDaily(stations, time_tolerance=timedelta(days=1))
df = ds(datetime(2024, 1, 1), ["t2m_max", "tp"])
__call__(time, variable, fields=None)[source]#

Function to get data.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return data for (UTC).

  • variable (str | list[str] | VariableArray) – String, list of strings or array of strings that refer to variables to return. Must be in the GHCN lexicon.

  • fields (str | list[str] | pa.Schema | None, optional) – Fields to include in output, by default None (all fields).

Returns:

GHCN data frame

Return type:

pd.DataFrame

async fetch(time, variable, fields=None)[source]#

Async function to get data.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return data for (UTC).

  • variable (str | list[str] | VariableArray) – String, list of strings or array of strings that refer to variables (column ids) to return. Must be in the GHCNDaily lexicon.

  • fields (str | list[str] | pa.Schema | None, optional) – Fields to include in output, by default None (all fields).

Returns:

GHCN data frame

Return type:

pd.DataFrame

classmethod available(time)[source]#

Check if given date time is available by verifying the partition exists on S3.

Parameters:

time (datetime | np.datetime64) – Date time to check

Returns:

If date time is available

Return type:

bool