NNJAObsConv#

class earth2studio.data.NNJAObsConv(
source='prepbufr',
time_tolerance=numpy.timedelta64(0, 'm'),
cache=True,
verbose=True,
async_timeout=600,
async_workers=24,
decode_workers=8,
retries=3,
)[source]#
Global

NNJA conventional (in-situ + GPS RO) observational data source. NOAA-NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis is an archive ideal for developing observation-driven weather forecasting tools, as it includes a wide cross-section of data from a plethora of sensing platforms (satellites, surface stations, weather balloons, and more) and features data from 1979 to the present.

Parameters:
  • source ({"prepbufr", "convbufr", "prepbufr.acft_profiles"}, optional) – Which encoding family of the NNJA conventional archive to read, by default "prepbufr".

  • time_tolerance (TimeTolerance, optional) – Time tolerance window for filtering observations. Accepts a single value (symmetric ± window) or a tuple (lower, upper) for asymmetric windows, by default np.timedelta64(0, 'm').

  • cache (bool, optional) – Cache downloaded files in the local filesystem cache, by default True.

  • verbose (bool, optional) – Show progress bars, by default True.

  • async_timeout (int, optional) – Total timeout in seconds for the async fetch, by default 600.

  • async_workers (int, optional) – Maximum number of concurrent async fetch tasks, by default 24.

  • decode_workers (int, optional) – Number of parallel processes for BUFR message decoding. Higher values speed up decoding of large PrepBUFR files at the cost of more memory. Set to 1 to disable multiprocessing, by default 8.

  • retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3.

Warning

This is a remote data source and can potentially download a large amount of data to your local machine for large requests.

__call__(time, variable, fields=None)[source]#

Fetch observations for a set of timestamps.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Cycle timestamps (UTC). Must align to a 6-hour cycle (00, 06, 12, 18z); the time tolerance is used to bracket the cycle when selecting observations.

  • variable (str | list[str] | VariableArray) – Variable ids defined in earth2studio.lexicon.NNJAObsConvLexicon.

  • fields (str | list[str] | pa.Schema | None, optional) – Output column subset. None (default) returns all schema fields.

Returns:

Observation DataFrame with columns matching the resolved schema.

Return type:

pd.DataFrame

async fetch(time, variable, fields=None)[source]#

Async function to get data.

Parameters:
  • time (datetime | list[datetime] | ndarray[datetime64])

  • variable (str | list[str] | ndarray[str])

  • fields (str | list[str] | Schema | None)

Return type:

DataFrame

classmethod available(time)[source]#

Check if given date time is available.

Parameters:

time (datetime | np.datetime64) – Date time to check

Returns:

If date time is available

Return type:

bool