GOESGLM#
- class earth2studio.data.GOESGLM(
- satellite='east',
- lat_lon_bbox=None,
- time_tolerance=numpy.timedelta64(2, 'm'),
- cache=True,
- verbose=True,
- async_timeout=600,
- async_workers=24,
- retries=3,
- NASA
NOAA GOES Geostationary Lightning Mapper (GLM) Level 2 Lightning Cluster-Filter Algorithm (LCFA) event data source.
Returns per-event lightning observations from the GLM instrument on GOES-16/17/18/19, served as point observations in a pandas DataFrame. Each row corresponds to a single optical event detected by GLM with a sub-second timestamp, latitude/longitude, and the requested measurement (
flashefor optical energy in Joules,flashcfor a constant 1.0 per detected event suitable for density aggregation).Files in the public NOAA AWS bucket are NetCDFs produced at roughly 20 second cadence covering the GOES full-disk field of view. A spatial bounding box can be supplied to restrict events at parse time and reduce memory usage for large windows.
- Parameters:
satellite (str, optional) – Source satellite selector. Pass
"east"(default) or"west"to auto-select the active GOES-East / GOES-West platform for each requested timestamp; pass"G16","G17","G18"or"G19"to pin a single platform.lat_lon_bbox (tuple[float, float, float, float] | None, optional) – Bounding box
(lat_min, lon_min, lat_max, lon_max)in degrees, applied at parse time. Accepts either[-180, 180)or[0, 360)longitude convention (auto-detected whenlon_max >= 180).None(default) returns the full disk. For example, CONUS in the[-180, 180)convention is(24.5, -125.0, 49.5, -66.0).time_tolerance (TimeTolerance, optional) – Time tolerance window for selecting events around each requested timestamp. Accepts a single value (symmetric ± window) or a tuple
(lower, upper)for asymmetric windows, by defaultnp.timedelta64(2, "m").cache (bool, optional) – Cache downloaded NetCDF files on local disk, by default True.
verbose (bool, optional) – Show download progress bar, by default True.
async_timeout (int, optional) – Total timeout in seconds for the entire fetch operation, by default 600.
async_workers (int, optional) – Maximum number of concurrent S3 fetch tasks, by default 24.
retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3.
Warning
GLM produces hundreds of files per hour. Large time windows can download tens to hundreds of gigabytes of NetCDFs. Use
lat_lon_bboxto discard out-of-region events on parse and keeptime_tolerancebounded.Note
Output longitudes are normalised to
[0, 360)(Earth2Studio convention). Each event’s timestamp is computed from the file’sevent_time_offsetvariable so per-event precision (~ms) is preserved.Note
Additional information on the data repository:
Example
from datetime import datetime import numpy as np from earth2studio.data import GOESGLM ds = GOESGLM( satellite="east", lat_lon_bbox=(24.5, -125.0, 49.5, -66.0), # CONUS time_tolerance=np.timedelta64(5, "m"), ) df = ds(datetime(2024, 6, 1, 18, 0), ["flashe", "flashc"])
- __call__(time, variable, fields=None)[source]#
Fetch GLM lightning events for a set of timestamps.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Timestamps to return events for (UTC). Timezone-aware datetimes are converted to UTC automatically.
variable (str | list[str] | VariableArray) – Variable ids defined in
earth2studio.lexicon.GOESGLMLexicon("flashe"and/or"flashc").fields (str | list[str] | pa.Schema | None, optional) – Output column subset.
None(default) returns all schema fields.
- Returns:
Event-level lightning observations with columns matching the resolved schema.
- Return type:
pd.DataFrame
- async fetch(time, variable, fields=None)[source]#
Async function to fetch GLM events.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Timestamps to return events for (UTC).
variable (str | list[str] | VariableArray) – Variable ids defined in
GOESGLMLexicon.fields (str | list[str] | pa.Schema | None, optional) – Output column subset.
None(default) returns all schema fields.
- Returns:
Event-level lightning observations.
- Return type:
pd.DataFrame