zarr_writer#
Zarr writer sink for xarray DataArrays.
Writes incoming xarray.DataArray objects to a Zarr store,
creating one Zarr group per variable with dimensions
(time, lat, lon). Supports user-specified chunking and Zarr v3
sharding.
Classes#
Write |
Module Contents#
- class physicsnemo_curator.domains.da.sinks.zarr_writer.ZarrSink( )#
Bases:
physicsnemo_curator.core.base.Sink[xarray.DataArray]Write
xarray.DataArrayfields to a Zarr store.Each incoming DataArray is expected to carry coordinate metadata (e.g.
time,variable,lat,lon). The sink uses these coordinates — not the pipeline index — to organise the output.DataArrays with a
variabledimension are split along it so that each variable gets its own Zarr group:<output_path>/<variable_name>/, with dimensions(time, lat, lon). Subsequent calls append along thetimedimension, so the sink accumulates data across pipeline indices based on the time coordinate in the incoming data.- Parameters:
output_path (str) – Path to the output Zarr store directory.
chunks (dict[str, int] | None) – Chunk sizes per dimension for the Zarr arrays. Defaults to
{"time": 1, "lat": 721, "lon": 1440}(one time-step per chunk, full spatial extent).shards (dict[str, int] | None) – Shard sizes per dimension (Zarr v3 only). When provided, each shard is a container for multiple chunks. Requires
zarr>=3.0. If None, sharding is not used.
Examples
>>> sink = ZarrSink( ... output_path="output.zarr", ... chunks={"time": 1, "lat": 721, "lon": 1440}, ... )
- classmethod params() list[physicsnemo_curator.core.base.Param]#
Return parameter descriptors for the Zarr sink.
- description: ClassVar[str] = 'Write DataArrays to a Zarr store with configurable chunking and sharding'#
Short description shown in the interactive CLI.
- property output_path: pathlib.Path#
Return the output Zarr store path.