GHCNDaily#
- class earth2studio.data.GHCNDaily(
- stations,
- time_tolerance=numpy.timedelta64(0),
- cache=True,
- verbose=True,
- async_timeout=600,
- async_workers=16,
- retries=3,
- Global
NOAA’s Global Historical Climatology Network Daily (GHCN-D) is a dataset that contains daily climate summaries from land surface stations across the globe.
- Parameters:
stations (list[str]) – GHCN station IDs (11-character strings, e.g.
"USW00023273") to attempt to fetch data from.time_tolerance (TimeTolerance, optional) – Time tolerance window for filtering observations. Accepts a single value (symmetric +/- window) or a tuple (lower, upper) for asymmetric windows, by default np.timedelta64(0)
cache (bool, optional) – Cache data source on local memory, by default True
verbose (bool, optional) – Print download progress and missing data warnings, by default True
async_timeout (int, optional) – Time in sec after which download will be cancelled if not finished successfully, by default 600
async_workers (int, optional) – Maximum number of concurrent async fetch tasks, by default 16
retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3
Warning
This is a remote data source and can potentially download a large amount of data to your local machine for large requests.
Note
To help get a list of possible station IDs, this class includes
GHCNDaily.get_stations_bbox()which accepts a lat-lon bounding box and will return known station IDs. For more information on the stations, users should consult theghcnd-stations.txtwhich can be accessed withGHCNDaily.get_station_metadata().Note
Additional information on the data repository can be referenced here:
Example
# Southeast US, lat lon bounding box (lat min, lon min, lat max, lon max) stations = GHCNDaily.get_stations_bbox((30, -90, 36, -80)) ds = GHCNDaily(stations, time_tolerance=timedelta(days=1)) df = ds(datetime(2024, 1, 1), ["t2m_max", "tp"])
- __call__(time, variable, fields=None)[source]#
Function to get data.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Timestamps to return data for (UTC).
variable (str | list[str] | VariableArray) – String, list of strings or array of strings that refer to variables to return. Must be in the GHCN lexicon.
fields (str | list[str] | pa.Schema | None, optional) – Fields to include in output, by default None (all fields).
- Returns:
GHCN data frame
- Return type:
pd.DataFrame
- async fetch(time, variable, fields=None)[source]#
Async function to get data.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Timestamps to return data for (UTC).
variable (str | list[str] | VariableArray) – String, list of strings or array of strings that refer to variables (column ids) to return. Must be in the GHCNDaily lexicon.
fields (str | list[str] | pa.Schema | None, optional) – Fields to include in output, by default None (all fields).
- Returns:
GHCN data frame
- Return type:
pd.DataFrame