netcdf_writer#
NetCDF4 writer sink for xarray DataArrays.
Writes incoming xarray.DataArray objects to NetCDF4 files,
creating one file per variable. Files are split by a configurable
coordinate dimension (default: time, grouped by year) so that each
output file covers a self-contained slice of the data.
The directory layout is:
<output_dir>/
<variable>/
<split_key>.nc # e.g. 2020.nc, 2021.nc
<variable>/
<split_key>.nc
When no variable dimension is present, the variable subdirectory
is called data.
Supports user-specified chunking (HDF5 chunk sizes) and zlib compression.
Classes#
Write |
Module Contents#
- class physicsnemo_curator.domains.da.sinks.netcdf_writer.NetCDF4Sink(
- output_dir: str,
- chunks: dict[str, int] | None = None,
- compression_level: int = 4,
- unlimited_dims: list[str] | None = None,
- split_dim: str | None = 'time',
- split_func: collections.abc.Callable[[Any], str] | None = None,
Bases:
physicsnemo_curator.core.base.Sink[xarray.DataArray]Write
xarray.DataArrayfields to NetCDF4 files.Each incoming DataArray is expected to carry coordinate metadata (e.g.
time,variable,lat,lon). The sink uses these coordinates — not the pipeline index — to organise the output.DataArrays with a
variabledimension are first split along it so that each variable gets its own subdirectory. Then the data is further split along split_dim (default"time") using split_func (default: extract year) so that each distinct split key becomes a separate.ncfile:<output_dir>/<variable>/<split_key>.ncWithin each file, data is appended along the split_dim (which is marked as an unlimited dimension), so multiple pipeline indices that share the same split key accumulate into the same file.
- Parameters:
output_dir (str) – Path to the output directory where
.ncfiles are created.chunks (dict[str, int] | None) – Chunk sizes per dimension for NetCDF4 internal chunking. Defaults to
{"time": 1, "lat": 721, "lon": 1440}(one time-step per chunk, full spatial extent). These control the HDF5 chunk layout used for on-disk storage and compression.compression_level (int) – Zlib compression level (0–9). 0 disables compression, 9 is maximum compression. Defaults to 4 (a good speed/size trade-off).
unlimited_dims (list[str] | None) – Dimensions to mark as unlimited in the NetCDF4 file. Defaults to
["time"]so new timesteps can be appended.split_dim (str | None) – Coordinate dimension along which to split the data into separate files. Defaults to
"time". Set toNoneto disable splitting (all data goes into a single file per variable).split_func (Callable[[Any], str] | None) – A function that takes a single coordinate value from split_dim and returns a string used as the file name (without the
.ncextension). Defaults to year extraction for"time"(e.g.datetime(2020, 6, 1)→"2020").
Examples
>>> # Default: split by year >>> sink = NetCDF4Sink(output_dir="output_nc") >>> >>> # Split by month >>> sink = NetCDF4Sink( ... output_dir="output_nc", ... split_func=lambda t: f"{t.astype('datetime64[M]')}", ... ) >>> >>> # No splitting — one file per variable >>> sink = NetCDF4Sink(output_dir="output_nc", split_dim=None)
- classmethod params() list[physicsnemo_curator.core.base.Param]#
Return parameter descriptors for the NetCDF4 sink.
- description: ClassVar[str] = 'Write DataArrays to NetCDF4 files with chunking and compression'#
Short description shown in the interactive CLI.
- property output_dir: pathlib.Path#
Return the output directory path.
- property split_func: collections.abc.Callable[[Any], str]#
Return the function used to compute split keys.