earth2studio.data.DataArrayPathList#

class earth2studio.data.DataArrayPathList(paths, **xr_args)[source]#

A local xarray dataarray directory data source that handles multiple files.

This class provides functionality to work with multiple xarray-compatible files (e.g., netCDF) as a single data source. All input files must have consistent dimensions and variables. Under the hood, it uses xarray’s open_mfdataset which leverages Dask for parallel and memory-efficient data processing.

Parameters:
  • paths (str | list[str]) – Either a string glob pattern (e.g., “path/to/files/*.nc”) or an explicit list of files. All specified files must exist and be readable.

  • xr_args (Any) – Additional keyword arguments passed to xarray’s open_mfdataset method.

Raises:
  • FileNotFoundError – If no files match the provided path pattern or if any specified file doesn’t exist.

  • ValueError – If the files have inconsistent dimensions or variables.

  • RuntimeError – If there are issues opening or processing the dataset.

Notes

  • The class uses Dask arrays internally through xarray’s open_mfdataset, providing efficient parallel processing and lazy evaluation. Operations are only computed when data is actually requested through the __call__ method.

  • All files must share the same coordinate system and variable structure.

  • Required dimensions are: time, variable, lat, and lon.

__call__(time, variable)[source]#

Retrieve data for specified timestamps and variables.

Parameters:
  • time (datetime | list[datetime] | TimeArray) – Single timestamp or list of timestamps to retrieve data for.

  • variable (str | list[str] | VariableArray) – Single variable name or list of variable names to retrieve.

Returns:

Data array containing the requested time and variable selections.

Return type:

xr.DataArray

Raises:

ValueError – If requested time or variable values are not present in the dataset.