netcdf_writer#

NetCDF4 writer sink for xarray DataArrays.

Writes incoming xarray.DataArray objects to NetCDF4 files, creating one file per variable. Files are split by a configurable coordinate dimension (default: time, grouped by year) so that each output file covers a self-contained slice of the data.

The directory layout is:

<output_dir>/
    <variable>/
        <split_key>.nc      # e.g. 2020.nc, 2021.nc
    <variable>/
        <split_key>.nc

When no variable dimension is present, the variable subdirectory is called data.

Supports user-specified chunking (HDF5 chunk sizes) and zlib compression.

Classes#

NetCDF4Sink

Write xarray.DataArray fields to NetCDF4 files.

Module Contents#

class physicsnemo_curator.domains.da.sinks.netcdf_writer.NetCDF4Sink( output_dir: str, chunks: dict[str, int] | None = None, compression_level: int = 4, unlimited_dims: list[str] | None = None, split_dim: str | None = 'time', split_func: collections.abc.Callable[[Any], str] | None = None, )#

Bases: physicsnemo_curator.core.base.Sink[xarray.DataArray]

Write xarray.DataArray fields to NetCDF4 files.

Each incoming DataArray is expected to carry coordinate metadata (e.g. time, variable, lat, lon). The sink uses these coordinates — not the pipeline index — to organise the output.

DataArrays with a variable dimension are first split along it so that each variable gets its own subdirectory. Then the data is further split along split_dim (default "time") using split_func (default: extract year) so that each distinct split key becomes a separate .nc file:

<output_dir>/<variable>/<split_key>.nc

Within each file, data is appended along the split_dim (which is marked as an unlimited dimension), so multiple pipeline indices that share the same split key accumulate into the same file.

Parameters:

output_dir (str) – Path to the output directory where .nc files are created.
chunks (dict[str, int] | None) – Chunk sizes per dimension for NetCDF4 internal chunking. Defaults to {"time": 1, "lat": 721, "lon": 1440} (one time-step per chunk, full spatial extent). These control the HDF5 chunk layout used for on-disk storage and compression.
compression_level (int) – Zlib compression level (0–9). 0 disables compression, 9 is maximum compression. Defaults to 4 (a good speed/size trade-off).
unlimited_dims (list[str] | None) – Dimensions to mark as unlimited in the NetCDF4 file. Defaults to ["time"] so new timesteps can be appended.
split_dim (str | None) – Coordinate dimension along which to split the data into separate files. Defaults to "time". Set to None to disable splitting (all data goes into a single file per variable).
split_func (Callable[[Any], str] | None) – A function that takes a single coordinate value from split_dim and returns a string used as the file name (without the .nc extension). Defaults to year extraction for "time" (e.g. datetime(2020, 6, 1) → "2020").

Examples

>>> # Default: split by year
>>> sink = NetCDF4Sink(output_dir="output_nc")
>>>
>>> # Split by month
>>> sink = NetCDF4Sink(
...     output_dir="output_nc",
...     split_func=lambda t: f"{t.astype('datetime64[M]')}",
... )
>>>
>>> # No splitting — one file per variable
>>> sink = NetCDF4Sink(output_dir="output_nc", split_dim=None)

classmethod params() → list[physicsnemo_curator.core.base.Param]#

Return parameter descriptors for the NetCDF4 sink.

Returns:: Descriptors for output_dir, chunks, compression_level, and split_dim.
Return type:: list[Param]

property compression_level: int#: Return the configured zlib compression level.

description: ClassVar[str] = 'Write DataArrays to NetCDF4 files with chunking and compression'#: Short description shown in the interactive CLI.

name: ClassVar[str] = 'NetCDF4 Writer'#: Human-readable display name for the interactive CLI.

property output_dir: pathlib.Path#: Return the output directory path.

property split_dim: str | None#: Return the dimension used for file splitting.

property split_func: collections.abc.Callable[[Any], str]#: Return the function used to compute split keys.

property unlimited_dims: list[str]#: Return the list of unlimited dimensions.