GOESGLM#

class earth2studio.data.GOESGLM(
satellite='east',
lat_lon_bbox=None,
time_tolerance=numpy.timedelta64(2, 'm'),
cache=True,
verbose=True,
async_timeout=600,
async_workers=24,
retries=3,
)[source]#
NASA

NOAA GOES Geostationary Lightning Mapper (GLM) Level 2 Lightning Cluster-Filter Algorithm (LCFA) event data source.

Returns per-event lightning observations from the GLM instrument on GOES-16/17/18/19, served as point observations in a pandas DataFrame. Each row corresponds to a single optical event detected by GLM with a sub-second timestamp, latitude/longitude, and the requested measurement (flashe for optical energy in Joules, flashc for a constant 1.0 per detected event suitable for density aggregation).

Files in the public NOAA AWS bucket are NetCDFs produced at roughly 20 second cadence covering the GOES full-disk field of view. A spatial bounding box can be supplied to restrict events at parse time and reduce memory usage for large windows.

Parameters:
  • satellite (str, optional) – Source satellite selector. Pass "east" (default) or "west" to auto-select the active GOES-East / GOES-West platform for each requested timestamp; pass "G16", "G17", "G18" or "G19" to pin a single platform.

  • lat_lon_bbox (tuple[float, float, float, float] | None, optional) – Bounding box (lat_min, lon_min, lat_max, lon_max) in degrees, applied at parse time. Accepts either [-180, 180) or [0, 360) longitude convention (auto-detected when lon_max >= 180). None (default) returns the full disk. For example, CONUS in the [-180, 180) convention is (24.5, -125.0, 49.5, -66.0).

  • time_tolerance (TimeTolerance, optional) – Time tolerance window for selecting events around each requested timestamp. Accepts a single value (symmetric ± window) or a tuple (lower, upper) for asymmetric windows, by default np.timedelta64(2, "m").

  • cache (bool, optional) – Cache downloaded NetCDF files on local disk, by default True.

  • verbose (bool, optional) – Show download progress bar, by default True.

  • async_timeout (int, optional) – Total timeout in seconds for the entire fetch operation, by default 600.

  • async_workers (int, optional) – Maximum number of concurrent S3 fetch tasks, by default 24.

  • retries (int, optional) – Number of retry attempts per failed fetch task with exponential backoff, by default 3.

Warning

GLM produces hundreds of files per hour. Large time windows can download tens to hundreds of gigabytes of NetCDFs. Use lat_lon_bbox to discard out-of-region events on parse and keep time_tolerance bounded.

Note

Output longitudes are normalised to [0, 360) (Earth2Studio convention). Each event’s timestamp is computed from the file’s event_time_offset variable so per-event precision (~ms) is preserved.

Example

from datetime import datetime
import numpy as np
from earth2studio.data import GOESGLM

ds = GOESGLM(
    satellite="east",
    lat_lon_bbox=(24.5, -125.0, 49.5, -66.0),  # CONUS
    time_tolerance=np.timedelta64(5, "m"),
)
df = ds(datetime(2024, 6, 1, 18, 0), ["flashe", "flashc"])
__call__(time, variable, fields=None)[source]#

Fetch GLM lightning events for a set of timestamps.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return events for (UTC). Timezone-aware datetimes are converted to UTC automatically.

  • variable (str | list[str] | VariableArray) – Variable ids defined in earth2studio.lexicon.GOESGLMLexicon ("flashe" and/or "flashc").

  • fields (str | list[str] | pa.Schema | None, optional) – Output column subset. None (default) returns all schema fields.

Returns:

Event-level lightning observations with columns matching the resolved schema.

Return type:

pd.DataFrame

async fetch(time, variable, fields=None)[source]#

Async function to fetch GLM events.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Timestamps to return events for (UTC).

  • variable (str | list[str] | VariableArray) – Variable ids defined in GOESGLMLexicon.

  • fields (str | list[str] | pa.Schema | None, optional) – Output column subset. None (default) returns all schema fields.

Returns:

Event-level lightning observations.

Return type:

pd.DataFrame

classmethod available(time)[source]#

Check whether data is available for a given time.

Offline check against the GLM archive window; per-slot platform cutovers are enforced when the source is called.

Parameters:

time (datetime | np.datetime64) – Date-time to check.

Return type:

bool