Higher-Level Libraries

The MSC adapters for higher-level libraries use shortcuts under the hood.

fsspec

multistorageclient.async_fs aliases the multistorageclient.contrib.async_fs module.

This module provides the multistorageclient.contrib.async_fs.MultiStorageAsyncFileSystem class which implements fsspec’s AsyncFileSystem class.

Note

The msc:// protocol is automatically registered when pip install multi-storage-client is run.

 1import multistorageclient as msc
 2
 3# Create an MSC-based AsyncFileSystem instance.
 4fs = msc.async_fs.MultiStorageAsyncFileSystem()
 5
 6# Create a client for the data-s3-iad profile and open a file.
 7file = fs.open("msc://data-s3-iad/animal-photos/red-panda.png")
 8
 9# Reuse the client for the data-s3-iad profile and download a file.
10fs.get_file(
11    rpath="msc://data-s3-iad/animal-photos/red-panda.png",
12    lpath="/tmp/animal-photos/red-panda.png"
13)

NumPy

multistorageclient.numpy aliases the multistorageclient.contrib.numpy module.

This module provides load, memmap, and save methods for loading and saving NumPy arrays.

 1import multistorageclient as msc
 2import numpy
 3
 4# Create a client for the data-s3-iad profile and load an array.
 5array = msc.numpy.load("msc://data-s3-iad/numpy-arrays/ndarray-1.npz")
 6
 7# Reuse the client for the data-s3-iad profile and load a memory-mapped array.
 8mmarray = msc.numpy.memmap("msc://data-s3-iad/numpy-arrays/ndarray-1.bin")
 9
10# Reuse the client for the data-s3-iad profile and save an array.
11msc.numpy.save(
12    numpy.array([1, 2, 3, 4, 5], dtype=numpy.int32),
13    "msc://data-s3-iad/numpy-arrays/ndarray-2.npz"
14)

PyTorch

multistorageclient.torch aliases the multistorageclient.contrib.torch module.

This module provides load and save methods for loading and saving PyTorch data.

 1import multistorageclient as msc
 2import torch
 3
 4# Create a client for the data-s3-iad profile and load a tensor.
 5tensor = msc.torch.load("msc://data-s3-iad/pytorch-tensors/tensor-1.pt")
 6
 7# Reuse the client for the data-s3-iad profile and save a tensor.
 8msc.torch.save(
 9    torch.tensor([1, 2, 3, 4]),
10    "msc://data-s3-iad/pytorch-tensors/tensor-2.pt"
11)

In addition to the load and save methods, the torch module provides the MultiStorageFileSystemReader and MultiStorageFileSystemWriter classes for reading and writing PyTorch objects to multiple storage backends.

 1import multistorageclient as msc
 2import torch
 3import torch.distributed.checkpoint as dcp
 4
 5# Create a MultiStorageFileSystemWriter for the data-s3-iad profile.
 6writer = msc.torch.MultiStorageFileSystemWriter("msc://data-s3-iad/checkpoint/1")
 7dcp.save(
 8    state_dict=state_dict,
 9    storage_writer=writer,
10)
11
12# Create a MultiStorageFileSystemReader for the data-s3-iad profile.
13reader = msc.torch.MultiStorageFileSystemReader("msc://data-s3-iad/checkpoint/1")
14dcp.load(
15    state_dict=loaded_state_dict,
16    storage_reader=reader,
17)

Xarray

multistorageclient.xz aliases the multistorageclient.contrib.xarray module.

This module provides open_zarr for reading Xarray datasets from Zarr files/objects.

1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array into an Xarray dataset.
4xarray_dataset = msc.xz.open_zarr("msc://data-s3-iad/abc.zarr")

Note: Xarray supports fsspec URLs natively, so you can use Xarray standard interface with msc:// URLs.

1import xarray
2
3# Use Xarray native interface to load a Zarr array into an Xarray dataset.
4xarray_dataset = xarray.open_zarr("msc://data-s3-iad/abc.zarr")

Zarr

multistorageclient.zarr aliases the multistorageclient.contrib.zarr module.

This module provides open_consolidated for reading Zarr groups from files/objects.

1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array.
4z = msc.zarr.open_consolidated("msc://data-s3-iad/abc.zarr")

Note

Zarr supports fsspec URLs natively, so you can use Zarr standard interface with msc:// URLs.

1import zarr
2
3# Use Zarr native interface to load a Zarr array.
4z = zarr.open("msc://data-s3-iad/abc.zarr")

Path

multistorageclient.path aliases the multistorageclient.contrib.path module.

This module provides the Path class for working with paths in a way similar to pathlib.Path.

 1import multistorageclient as msc
 2
 3# Create a Path object for a file in the data-s3-iad profile
 4path = msc.Path("msc://data-s3-iad/data/file.txt")
 5
 6# Get parent directory
 7parent = path.parent  # msc://data-s3-iad/data
 8
 9# Get file name
10name = path.name  # file.txt
11
12# Join paths
13new_path = path.parent / "other.txt"  # msc://data-s3-iad/data/other.txt
14
15# Check if path exists
16exists = path.exists()
17
18# List contents of a directory
19for child in msc.Path("msc://data-s3-iad/data").iterdir():
20    print(child)
21
22# Find files matching a pattern
23for matched in msc.Path("msc://data-s3-iad/data").glob("*.txt"):
24    print(matched)

Note

The Path class implements much of the same interface as pathlib.Path, making it familiar to use while working with remote storage.