earth2studio.io.AsyncZarrBackend#

class earth2studio.io.AsyncZarrBackend(file_name, parallel_coords, fs_factory=<class 'fsspec.implementations.local.LocalFileSystem'>, blocking=True, pool_size=8, async_timeout=600, zarr_kwargs={'mode': 'a'}, zarr_codecs=None)[source]#

Async Zarr v3 IO Backend

Warning

This IO backend has a non-blocking mode which will execute IO writes in seperate threads. There is an assumption that the data being written will not be re-allocated during the writes.

Warning

This IO backend presently does not support overwritting existing Zarr stores. Only creation of new arrays or writing to existing.

Parameters:
  • file_name (str) – Path location to place zarr store

  • parallel_coords (CoordSystem) – Coordinates that enable parallel writes during inference. These coordinates specify which dimensions will be written in parallel via async operations, typically representing dimensions that are iteratively generated (such as time or lead_time). The chunk size for each of these dimensions will be set to 1. These coordinates should contain the complete set of values needed for the entire inference pipeline. The remaining coordinates of a given array will be populated upon the first write to the respective array.

  • fs_factory (Callable[..., fsspec.spec.AbstractFileSystem], optional) – FSSpec file system factory method. This is a callable object that should return an instance of the desired filesystem to use, by default LocalFileSystem

  • blocking (bool, optional) – Blocking write calls in the synchronous API. When set to false, the IO backend will execute write calls in seperate threads. Users should call the close() API to ensure all threads have finished / cleaned up, by default True

  • pool_size (int, optional) – The thread / async loop pool used with the synchronous write API in non-blocking mode, by default 8

  • async_timeout (int, optional) – Async operation timeout for a given write operation, by default 600

  • zarr_kwargs (dict[str, Any], optional) – Additional keyword arguments to provide to the ` zarr.api.asynchronous.open` function, by default {“mode”: “a”}

  • zarr_codecs (CompressorsLike, optional) – Compression codec to use when creating any new arrays. Sharding is not supported for thread safety at the moment. If None, will use no compressor, by default None

Raises:
  • ImportError – If Zarr 2.0 is installed. This io backend only supports Zarr 3.0

  • TypeError – If fs_factory is not a callable, this should be a callable method not an object

add_array(coords, array_name, **kwargs)[source]#

Pass through, arrays are initialized lazily in this io object

Parameters:
  • coords (OrderedDict[str, ndarray])

  • array_name (str | list[str])

  • kwargs (dict[str, Any])

Return type:

None

write(x, coords, array_name)[source]#

Write data

Parameters:
  • x (torch.Tensor | list[torch.Tensor]) – Tensor(s) to be written to zarr store.

  • coords (OrderedDict) – Coordinates of the passed data.

  • array_name (str | list[str]) – Name(s) of the array(s) that will be written to.

Return type:

None

Examples using earth2studio.io.AsyncZarrBackend#

IO Backend Performance

IO Backend Performance