Higher-Level Libraries¶
The MSC adapters for higher-level libraries use shortcuts under the hood.
fsspec¶
multistorageclient.async_fs
aliases the multistorageclient.contrib.async_fs
module.
This module provides the multistorageclient.contrib.async_fs.MultiStorageAsyncFileSystem
class which implements fsspec’s AsyncFileSystem
class.
Note
The msc://
protocol is automatically registered when pip install multi-storage-client
is run.
1import multistorageclient as msc
2
3# Create an MSC-based AsyncFileSystem instance.
4fs = msc.async_fs.MultiStorageAsyncFileSystem()
5
6# Create a client for the data-s3-iad profile and open a file.
7file = fs.open("msc://data-s3-iad/animal-photos/red-panda.png")
8
9# Reuse the client for the data-s3-iad profile and download a file.
10fs.get_file(
11 rpath="msc://data-s3-iad/animal-photos/red-panda.png",
12 lpath="/tmp/animal-photos/red-panda.png"
13)
NumPy¶
multistorageclient.numpy
aliases the multistorageclient.contrib.numpy
module.
This module provides load
, memmap
, and save
methods for loading and saving NumPy arrays.
1import multistorageclient as msc
2import numpy
3
4# Create a client for the data-s3-iad profile and load an array.
5array = msc.numpy.load("msc://data-s3-iad/numpy-arrays/ndarray-1.npz")
6
7# Reuse the client for the data-s3-iad profile and load a memory-mapped array.
8mmarray = msc.numpy.memmap("msc://data-s3-iad/numpy-arrays/ndarray-1.bin")
9
10# Reuse the client for the data-s3-iad profile and save an array.
11msc.numpy.save(
12 numpy.array([1, 2, 3, 4, 5], dtype=numpy.int32),
13 "msc://data-s3-iad/numpy-arrays/ndarray-2.npz"
14)
PyTorch¶
multistorageclient.torch
aliases the multistorageclient.contrib.torch
module.
This module provides load
and save
methods for loading and saving PyTorch data.
1import multistorageclient as msc
2import torch
3
4# Create a client for the data-s3-iad profile and load a tensor.
5tensor = msc.torch.load("msc://data-s3-iad/pytorch-tensors/tensor-1.pt")
6
7# Reuse the client for the data-s3-iad profile and save a tensor.
8msc.torch.save(
9 torch.tensor([1, 2, 3, 4]),
10 "msc://data-s3-iad/pytorch-tensors/tensor-2.pt"
11)
In addition to the load
and save
methods, the torch
module provides the MultiStorageFileSystemReader
and MultiStorageFileSystemWriter
classes for reading and writing PyTorch objects to multiple storage backends.
1import multistorageclient as msc
2import torch
3import torch.distributed.checkpoint as dcp
4
5# Create a MultiStorageFileSystemWriter for the data-s3-iad profile.
6writer = msc.torch.MultiStorageFileSystemWriter("msc://data-s3-iad/checkpoint/1")
7dcp.save(
8 state_dict=state_dict,
9 storage_writer=writer,
10)
11
12# Create a MultiStorageFileSystemReader for the data-s3-iad profile.
13reader = msc.torch.MultiStorageFileSystemReader("msc://data-s3-iad/checkpoint/1")
14dcp.load(
15 state_dict=loaded_state_dict,
16 storage_reader=reader,
17)
Xarray¶
multistorageclient.xz
aliases the multistorageclient.contrib.xarray
module.
This module provides open_zarr
for reading Xarray datasets from Zarr files/objects.
1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array into an Xarray dataset.
4xarray_dataset = msc.xz.open_zarr("msc://data-s3-iad/abc.zarr")
Note: Xarray
supports fsspec URLs natively, so you can use Xarray standard interface with msc://
URLs.
1import xarray
2
3# Use Xarray native interface to load a Zarr array into an Xarray dataset.
4xarray_dataset = xarray.open_zarr("msc://data-s3-iad/abc.zarr")
Zarr¶
multistorageclient.zarr
aliases the multistorageclient.contrib.zarr
module.
This module provides open_consolidated
for reading Zarr groups from files/objects.
1import multistorageclient as msc
2
3# Create a client for the data-s3-iad profile and load a Zarr array.
4z = msc.zarr.open_consolidated("msc://data-s3-iad/abc.zarr")
Note
Zarr
supports fsspec URLs natively, so you can use Zarr standard interface with msc://
URLs.
1import zarr
2
3# Use Zarr native interface to load a Zarr array.
4z = zarr.open("msc://data-s3-iad/abc.zarr")
Path¶
multistorageclient.path
aliases the multistorageclient.contrib.path
module.
This module provides the Path
class for working with paths in a way similar to pathlib.Path
.
1import multistorageclient as msc
2
3# Create a Path object for a file in the data-s3-iad profile
4path = msc.Path("msc://data-s3-iad/data/file.txt")
5
6# Get parent directory
7parent = path.parent # msc://data-s3-iad/data
8
9# Get file name
10name = path.name # file.txt
11
12# Join paths
13new_path = path.parent / "other.txt" # msc://data-s3-iad/data/other.txt
14
15# Check if path exists
16exists = path.exists()
17
18# List contents of a directory
19for child in msc.Path("msc://data-s3-iad/data").iterdir():
20 print(child)
21
22# Find files matching a pattern
23for matched in msc.Path("msc://data-s3-iad/data").glob("*.txt"):
24 print(matched)
Note
The Path
class implements much of the same interface as pathlib.Path
, making it familiar to use while working with remote storage.