NNJAObsConv#
- class earth2studio.data.NNJAObsConv(
- source='prepbufr',
- time_tolerance=numpy.timedelta64(0, 'm'),
- cache=True,
- verbose=True,
- async_timeout=600,
- async_workers=24,
- decode_workers=8,
- retries=3,
- Global
NNJA conventional (in-situ + GPS RO) observational data source. NOAA-NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis is an archive ideal for developing observation-driven weather forecasting tools, as it includes a wide cross-section of data from a plethora of sensing platforms (satellites, surface stations, weather balloons, and more) and features data from 1979 to the present.
- Parameters:
source ({"prepbufr", "convbufr", "prepbufr.acft_profiles"}, optional) –
Which encoding family of the NNJA conventional archive to read, by default
"prepbufr". These sources are different stages of the NCEP observation-processing pipeline, not independent replacement datasets:"convbufr"points at raw dump streams grouped by family, such asaircft/aircar/adpupa/adpsfc. These files preserve source-native schemas and generally require family-specific decoding and QC interpretation before they resemble GSI-ready observations. They are listed here for completeness, but the genericNNJAObsConvPrepBUFR decoder does not yet implement those raw family schemas."prepbufr"points at the merged PrepBUFR cycle file. This is the preferred source for GSI-like conventional observations because upstream obsproc has already merged dump families, standardized many mnemonics, and attached report types / quality marks."prepbufr.acft_profiles"points at an aircraft-only PrepBUFR profile product. It groups aircraft points into flight-level, ascending, and descending profile report types that GSI remaps back to ordinary aircraft report types during processing.
time_tolerance (TimeTolerance, optional) – Time tolerance window for filtering observations. Accepts a single value (symmetric ± window) or a tuple
(lower, upper)for asymmetric windows, by defaultnp.timedelta64(0, 'm').cache (bool, optional) – Cache downloaded files in the local filesystem cache, by default True.
verbose (bool, optional) – Show progress bars, by default True.
async_timeout (int, optional) – Total timeout in seconds for the async fetch, by default 600.
async_workers (int, optional) – Maximum number of concurrent async fetch tasks, by default 24.
decode_workers (int, optional) – Number of parallel processes for BUFR message decoding. Higher values speed up decoding of large PrepBUFR files at the cost of more memory. Set to 1 to disable multiprocessing, by default 8.
retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3.
Warning
This is a remote data source and can potentially download a large amount of data to your local machine for large requests.
Note
Additional information on the data repository can be referenced here:
- __call__(time, variable, fields=None)[source]#
Fetch observations for a set of timestamps.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Cycle timestamps (UTC). Must align to a 6-hour cycle (00, 06, 12, 18z); the time tolerance is used to bracket the cycle when selecting observations.
variable (str | list[str] | VariableArray) – Variable ids defined in
earth2studio.lexicon.NNJAObsConvLexicon.fields (str | list[str] | pa.Schema | None, optional) – Output column subset.
None(default) returns all schema fields.
- Returns:
Observation DataFrame with columns matching the resolved schema.
- Return type:
pd.DataFrame