Higher-Level Libraries¶
The MSC adapters for higher-level libraries use shortcuts under the hood.
fsspec¶
multistorageclient.async_fs aliases the multistorageclient.contrib.async_fs module.
This module provides the multistorageclient.contrib.async_fs.MultiStorageAsyncFileSystem class which implements fsspec’s AsyncFileSystem class.
Note
The msc:// protocol is automatically registered when pip install multi-storage-client is run.
1import multistorageclient as msc
2
3# Create an MSC-based AsyncFileSystem instance.
4fs = msc.async_fs.MultiStorageAsyncFileSystem()
5
6# Create a client for the data-s3-iad profile and open a file.
7file = fs.open("msc://data-s3-iad/animal-photos/red-panda.png")
8
9# Reuse the client for the data-s3-iad profile and download a file.
10fs.get_file(
11 rpath="msc://data-s3-iad/animal-photos/red-panda.png",
12 lpath="/tmp/animal-photos/red-panda.png"
13)
Hydra¶
The MSC Hydra plugin enables loading Hydra configurations directly from object storage using msc:// URLs.
Note
The plugin is automatically registered by Hydra when both multistorageclient and hydra-core are installed.
import hydra
import multistorageclient as msc
from omegaconf import DictConfig
# Load config directly from object storage
@hydra.main(version_base=None, config_path="msc://profile/configs", config_name="training")
def your_app(cfg: DictConfig) -> None:
print(f"Loaded config: {cfg}")
python your_app.py --config-path="msc://profile/configs" --config-name=training
NumPy¶
multistorageclient.numpy aliases the multistorageclient.contrib.numpy module.
This module provides load, memmap, and save methods for loading and saving NumPy arrays.
1import multistorageclient as msc
2import numpy
3
4# Create a client for the data-s3-iad profile and load an array.
5array = msc.numpy.load("msc://data-s3-iad/numpy-arrays/ndarray-1.npz")
6
7# Reuse the client for the data-s3-iad profile and load a memory-mapped array.
8mmarray = msc.numpy.memmap("msc://data-s3-iad/numpy-arrays/ndarray-1.bin")
9
10# Reuse the client for the data-s3-iad profile and save an array.
11msc.numpy.save(
12 numpy.array([1, 2, 3, 4, 5], dtype=numpy.int32),
13 "msc://data-s3-iad/numpy-arrays/ndarray-2.npz"
14)
PyTorch¶
multistorageclient.torch aliases the multistorageclient.contrib.torch module.
This module provides load and save methods for loading and saving PyTorch data.
1import multistorageclient as msc
2import torch
3
4# Create a client for the data-s3-iad profile and load a tensor.
5tensor = msc.torch.load("msc://data-s3-iad/pytorch-tensors/tensor-1.pt")
6
7# Reuse the client for the data-s3-iad profile and save a tensor.
8msc.torch.save(
9 torch.tensor([1, 2, 3, 4]),
10 "msc://data-s3-iad/pytorch-tensors/tensor-2.pt"
11)
In addition to the load and save methods, the torch module provides the MultiStorageFileSystemReader and MultiStorageFileSystemWriter classes for reading and writing PyTorch objects to multiple storage backends.
1import multistorageclient as msc
2import torch
3import torch.distributed.checkpoint as dcp
4
5# Create a MultiStorageFileSystemWriter for the data-s3-iad profile.
6writer = msc.torch.MultiStorageFileSystemWriter("msc://data-s3-iad/checkpoint/1")
7dcp.save(
8 state_dict=state_dict,
9 storage_writer=writer,
10)
11
12# Create a MultiStorageFileSystemReader for the data-s3-iad profile.
13reader = msc.torch.MultiStorageFileSystemReader("msc://data-s3-iad/checkpoint/1")
14dcp.load(
15 state_dict=loaded_state_dict,
16 storage_reader=reader,
17)
Xarray¶
multistorageclient.xz aliases the multistorageclient.contrib.xarray module.
This module provides open_zarr for reading Xarray datasets from Zarr files/objects.
1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array into an Xarray dataset.
4xarray_dataset = msc.xz.open_zarr("msc://data-s3-iad/abc.zarr")
Note: Xarray supports fsspec URLs natively, so you can use Xarray standard interface with msc:// URLs.
1import xarray
2
3# Use Xarray native interface to load a Zarr array into an Xarray dataset.
4xarray_dataset = xarray.open_zarr("msc://data-s3-iad/abc.zarr")
Zarr¶
multistorageclient.zarr aliases the multistorageclient.contrib.zarr module.
This module provides open_consolidated for reading Zarr groups from files/objects.
1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array.
4z = msc.zarr.open_consolidated("msc://data-s3-iad/abc.zarr")
Note
Zarr supports fsspec URLs natively, so you can use Zarr standard interface with msc:// URLs.
1import zarr
2
3# Use Zarr native interface to load a Zarr array.
4z = zarr.open("msc://data-s3-iad/abc.zarr")
Path¶
multistorageclient.Path aliases the multistorageclient.pathlib.MultiStoragePath class.
This module provides the Path class for working with paths in a way similar to pathlib.Path.
1import multistorageclient as msc
2
3# Create a Path object for a file in the data-s3-iad profile
4path = msc.Path("msc://data-s3-iad/data/file.txt")
5
6# Get parent directory
7parent = path.parent # msc://data-s3-iad/data
8
9# Get file name
10name = path.name # file.txt
11
12# Join paths
13new_path = path.parent / "other.txt" # msc://data-s3-iad/data/other.txt
14
15# Check if path exists
16exists = path.exists()
17
18# List contents of a directory
19for child in msc.Path("msc://data-s3-iad/data").iterdir():
20 print(child)
21
22# Find files matching a pattern
23for matched in msc.Path("msc://data-s3-iad/data").glob("*.txt"):
24 print(matched)
Note
The Path class implements much of the same interface as pathlib.Path, making it familiar to use while working with remote storage.