earth2studio.data
.DataArrayPathList#
- class earth2studio.data.DataArrayPathList(paths, **xr_args)[source]#
A local xarray dataarray directory data source that handles multiple files.
This class provides functionality to work with multiple xarray-compatible files (e.g., netCDF) as a single data source. All input files must have consistent dimensions and variables. Under the hood, it uses xarray’s open_mfdataset which leverages Dask for parallel and memory-efficient data processing.
- Parameters:
paths (str | list[str]) – Either a string glob pattern (e.g., “path/to/files/*.nc”) or an explicit list of files. All specified files must exist and be readable.
xr_args (Any) – Additional keyword arguments passed to xarray’s open_mfdataset method.
- Raises:
FileNotFoundError – If no files match the provided path pattern or if any specified file doesn’t exist.
ValueError – If the files have inconsistent dimensions or variables.
RuntimeError – If there are issues opening or processing the dataset.
Notes
The class uses Dask arrays internally through xarray’s open_mfdataset, providing efficient parallel processing and lazy evaluation. Operations are only computed when data is actually requested through the __call__ method.
All files must share the same coordinate system and variable structure.
Required dimensions are: time, variable, lat, and lon.
- __call__(time, variable)[source]#
Retrieve data for specified timestamps and variables.
- Parameters:
time (datetime | list[datetime] | TimeArray) – Single timestamp or list of timestamps to retrieve data for.
variable (str | list[str] | VariableArray) – Single variable name or list of variable names to retrieve.
- Returns:
Data array containing the requested time and variable selections.
- Return type:
xr.DataArray
- Raises:
ValueError – If requested time or variable values are not present in the dataset.