netcdf_writer#

NetCDF4 writer sink for xarray DataArrays.

Writes incoming xarray.DataArray objects to NetCDF4 files, creating one file per variable. Files are split by a configurable coordinate dimension (default: time, grouped by year) so that each output file covers a self-contained slice of the data.

The directory layout is:

<output_dir>/
    <variable>/
        <split_key>.nc      # e.g. 2020.nc, 2021.nc
    <variable>/
        <split_key>.nc

When no variable dimension is present, the variable subdirectory is called data.

Supports user-specified chunking (HDF5 chunk sizes) and zlib compression.

Classes#

NetCDF4Sink

Write xarray.DataArray fields to NetCDF4 files.

Module Contents#

class physicsnemo_curator.domains.da.sinks.netcdf_writer.NetCDF4Sink(
output_dir: str,
chunks: dict[str, int] | None = None,
compression_level: int = 4,
unlimited_dims: list[str] | None = None,
split_dim: str | None = 'time',
split_func: collections.abc.Callable[[Any], str] | None = None,
)#

Bases: physicsnemo_curator.core.base.Sink[xarray.DataArray]

Write xarray.DataArray fields to NetCDF4 files.

Each incoming DataArray is expected to carry coordinate metadata (e.g. time, variable, lat, lon). The sink uses these coordinates — not the pipeline index — to organise the output.

DataArrays with a variable dimension are first split along it so that each variable gets its own subdirectory. Then the data is further split along split_dim (default "time") using split_func (default: extract year) so that each distinct split key becomes a separate .nc file:

<output_dir>/<variable>/<split_key>.nc

Within each file, data is appended along the split_dim (which is marked as an unlimited dimension), so multiple pipeline indices that share the same split key accumulate into the same file.

Parameters:
  • output_dir (str) – Path to the output directory where .nc files are created.

  • chunks (dict[str, int] | None) – Chunk sizes per dimension for NetCDF4 internal chunking. Defaults to {"time": 1, "lat": 721, "lon": 1440} (one time-step per chunk, full spatial extent). These control the HDF5 chunk layout used for on-disk storage and compression.

  • compression_level (int) – Zlib compression level (0–9). 0 disables compression, 9 is maximum compression. Defaults to 4 (a good speed/size trade-off).

  • unlimited_dims (list[str] | None) – Dimensions to mark as unlimited in the NetCDF4 file. Defaults to ["time"] so new timesteps can be appended.

  • split_dim (str | None) – Coordinate dimension along which to split the data into separate files. Defaults to "time". Set to None to disable splitting (all data goes into a single file per variable).

  • split_func (Callable[[Any], str] | None) – A function that takes a single coordinate value from split_dim and returns a string used as the file name (without the .nc extension). Defaults to year extraction for "time" (e.g. datetime(2020, 6, 1)"2020").

Examples

>>> # Default: split by year
>>> sink = NetCDF4Sink(output_dir="output_nc")
>>>
>>> # Split by month
>>> sink = NetCDF4Sink(
...     output_dir="output_nc",
...     split_func=lambda t: f"{t.astype('datetime64[M]')}",
... )
>>>
>>> # No splitting — one file per variable
>>> sink = NetCDF4Sink(output_dir="output_nc", split_dim=None)
classmethod params() list[physicsnemo_curator.core.base.Param]#

Return parameter descriptors for the NetCDF4 sink.

Returns:

Descriptors for output_dir, chunks, compression_level, and split_dim.

Return type:

list[Param]

property compression_level: int#

Return the configured zlib compression level.

description: ClassVar[str] = 'Write DataArrays to NetCDF4 files with chunking and compression'#

Short description shown in the interactive CLI.

name: ClassVar[str] = 'NetCDF4 Writer'#

Human-readable display name for the interactive CLI.

property output_dir: pathlib.Path#

Return the output directory path.

property split_dim: str | None#

Return the dimension used for file splitting.

property split_func: collections.abc.Callable[[Any], str]#

Return the function used to compute split keys.

property unlimited_dims: list[str]#

Return the list of unlimited dimensions.