API Reference

Core

class multistorageclient.CacheConfig(size: str, use_etag: bool = True, eviction_policy: ~multistorageclient.caching.cache_config.EvictionPolicyConfig = <factory>, backend: ~multistorageclient.caching.cache_config.CacheBackendConfig = <factory>)[source]

Configuration for the CacheManager.

This class defines the complete configuration for the cache system, including size limits, etag usage, eviction policy, and backend settings.

Parameters:
  • size (str)

  • use_etag (bool)

  • eviction_policy (EvictionPolicyConfig)

  • backend (CacheBackendConfig)

backend: CacheBackendConfig

Cache backend configuration. Default is filesystem.

eviction_policy: EvictionPolicyConfig

Cache eviction policy configuration. Default is LRU with 300s refresh.

get_eviction_policy() str[source]

Get the eviction policy.

Returns:

The current eviction policy type.

Return type:

str

get_storage_provider_profile() str | None[source]

Get the storage provider profile.

Returns:

The storage provider profile name if set, None otherwise.

Return type:

str | None

size: str

The maximum size of the cache in megabytes.

size_bytes() int[source]

Convert cache size to bytes.

Returns:

The cache size in bytes.

Return type:

int

use_etag: bool = True

Use etag to update the cached files. Default is True.

multistorageclient.Path

alias of MultiStoragePath

class multistorageclient.StorageClient(config: StorageClientConfig)[source]

A client for interacting with different storage providers.

Initializes the StorageClient with the given configuration.

Parameters:

config (StorageClientConfig) – The configuration object for the storage client.

commit_metadata(prefix: str | None = None) None[source]

Commits any pending updates to the metadata provider. No-op if not using a metadata provider.

Parameters:

prefix (str | None) – If provided, scans the prefix to find files to commit.

Return type:

None

copy(src_path: str, dest_path: str) None[source]

Copies an object from source to destination in the storage provider.

Parameters:
  • src_path (str) – The virtual path of the source object to copy.

  • dest_path (str) – The virtual path of the destination.

Return type:

None

delete(path: str, recursive: bool = False) None[source]

Deletes an object from the storage provider at the specified path.

Parameters:
  • path (str) – The virtual path of the object to delete.

  • recursive (bool) – Whether to delete objects in the path recursively.

Return type:

None

download_file(**kwargs: Any) Any
Parameters:
Return type:

Any

glob(pattern: str, include_url_prefix: bool = False) list[str][source]

Matches and retrieves a list of objects in the storage provider that match the specified pattern.

Parameters:
  • pattern (str) – The pattern to match object paths against, supporting wildcards (e.g., *.txt).

  • include_url_prefix (bool) – Whether to include the URL prefix msc://profile in the result.

Returns:

A list of object paths that match the pattern.

Return type:

list[str]

info(path: str, strict: bool = True) ObjectMetadata[source]

Retrieves metadata or information about an object stored at the specified path.

Parameters:
  • path (str) – The path to the object for which metadata or information is being retrieved.

  • strict (bool) – If True, performs additional validation to determine whether the path refers to a directory.

Returns:

A dictionary containing metadata about the object.

Return type:

ObjectMetadata

is_default_profile() bool[source]

Return True if the storage client is using the default profile.

Return type:

bool

is_empty(path: str) bool[source]

Checks whether the specified path is empty. A path is considered empty if there are no objects whose keys start with the given path as a prefix.

Parameters:

path (str) – The path to check. This is typically a prefix representing a directory or folder.

Returns:

True if no objects exist under the specified path prefix, False otherwise.

Return type:

bool

is_file(path: str) bool[source]

Checks whether the specified path points to a file (rather than a directory or folder).

Parameters:

path (str) – The path to check.

Returns:

True if the path points to a file, False otherwise.

Return type:

bool

list(prefix: str = '', start_after: str | None = None, end_at: str | None = None, include_directories: bool = False, include_url_prefix: bool = False) Iterator[ObjectMetadata][source]

Lists objects in the storage provider under the specified prefix.

Parameters:
  • prefix (str) – The prefix to list objects under.

  • start_after (str | None) – The key to start after (i.e. exclusive). An object with this key doesn’t have to exist.

  • end_at (str | None) – The key to end at (i.e. inclusive). An object with this key doesn’t have to exist.

  • include_directories (bool) – Whether to include directories in the result. When True, directories are returned alongside objects.

  • include_url_prefix (bool) – Whether to include the URL prefix msc://profile in the result.

Returns:

An iterator over objects.

Return type:

Iterator[ObjectMetadata]

open(path: str, mode: str = 'rb', buffering: int = -1, encoding: str | None = None, disable_read_cache: bool = False, memory_load_limit: int = 536870912, atomic: bool = True, check_source_version: SourceVersionCheckMode = SourceVersionCheckMode.INHERIT) PosixFile | ObjectFile[source]

Returns a file-like object from the storage provider at the specified path.

Parameters:
  • path (str) – The path of the object to read.

  • mode (str) – The file mode, only “w”, “r”, “a”, “wb”, “rb” and “ab” are supported.

  • buffering (int) – The buffering mode. Only applies to PosixFile.

  • encoding (str | None) – The encoding to use for text files.

  • disable_read_cache (bool) – When set to True, disables caching for the file content. This parameter is only applicable to ObjectFile when the mode is “r” or “rb”.

  • memory_load_limit (int) – Size limit in bytes for loading files into memory. Defaults to 512MB. This parameter is only applicable to ObjectFile when the mode is “r” or “rb”.

  • atomic (bool) – When set to True, the file will be written atomically (rename upon close). This parameter is only applicable to PosixFile in write mode.

  • check_source_version (SourceVersionCheckMode) – Whether to check the source version of cached objects.

Returns:

A file-like object (PosixFile or ObjectFile) for the specified path.

Return type:

PosixFile | ObjectFile

property profile: str
read(**kwargs: Any) Any
Parameters:
Return type:

Any

sync_from(source_client: StorageClient, source_path: str = '', target_path: str = '', delete_unmatched_files: bool = False, description: str = 'Syncing', num_worker_processes: int | None = None) None[source]

Syncs files from the source storage client to “path/”.

Parameters:
  • source_client (StorageClient) – The source storage client.

  • source_path (str) – The path to sync from.

  • target_path (str) – The path to sync to.

  • delete_unmatched_files (bool) – Whether to delete files at the target that are not present at the source.

  • description (str) – Description of sync process for logging purposes.

  • num_worker_processes (int | None) – The number of worker processes to use.

Return type:

None

upload_file(**kwargs: Any) Any
Parameters:
Return type:

Any

write(**kwargs: Any) Any
Parameters:
Return type:

Any

class multistorageclient.StorageClientConfig(profile: str, storage_provider: StorageProvider, credentials_provider: CredentialsProvider | None = None, metadata_provider: MetadataProvider | None = None, cache_config: CacheConfig | None = None, cache_manager: CacheBackend | None = None, retry_config: RetryConfig | None = None)[source]

Configuration class for the multistorageclient.StorageClient.

Parameters:
cache_config: CacheConfig | None
cache_manager: CacheBackend | None
credentials_provider: CredentialsProvider | None
static from_dict(config_dict: dict[str, Any], profile: str = 'default', skip_validation: bool = False, telemetry: Telemetry | None = None) StorageClientConfig[source]
Parameters:
Return type:

StorageClientConfig

static from_file(profile: str = 'default', telemetry: Telemetry | None = None) StorageClientConfig[source]
Parameters:
Return type:

StorageClientConfig

static from_json(config_json: str, profile: str = 'default', telemetry: Telemetry | None = None) StorageClientConfig[source]
Parameters:
Return type:

StorageClientConfig

static from_provider_bundle(config_dict: dict[str, Any], provider_bundle: ProviderBundle, telemetry: Telemetry | None = None) StorageClientConfig[source]
Parameters:
Return type:

StorageClientConfig

static from_yaml(config_yaml: str, profile: str = 'default', telemetry: Telemetry | None = None) StorageClientConfig[source]
Parameters:
Return type:

StorageClientConfig

metadata_provider: MetadataProvider | None
profile: str
static read_msc_config() dict[str, Any] | None[source]

Get the MSC configuration dictionary.

Returns:

The MSC configuration dictionary or empty dict if no config was found

Return type:

dict[str, Any] | None

static read_path_mapping() PathMapping[source]

Get the path mapping defined in the MSC configuration.

Path mappings create a nested structure of protocol -> bucket -> [(prefix, profile)] where entries are sorted by prefix length (longest first) for optimal matching. Longer paths take precedence when matching.

Returns:

A PathMapping instance with translation mappings

Return type:

PathMapping

retry_config: RetryConfig | None
storage_provider: StorageProvider
multistorageclient.commit_metadata(url: str) None[source]

Commits the metadata updates for the specified storage client profile.

Parameters:

url (str) – The URL of the path to commit metadata for.

Return type:

None

multistorageclient.delete(url: str, recursive: bool = False) None[source]

Deletes the specified object(s) from the storage provider.

This function retrieves the corresponding multistorageclient.StorageClient for the given URL and deletes the object(s) at the specified path.

Parameters:
  • url (str) – The URL of the object to delete. (example: msc://profile/prefix/file.txt)

  • recursive (bool) – Whether to delete objects in the path recursively.

Return type:

None

multistorageclient.download_file(url: str, local_path: str) None[source]

Download a file in a given remote_path to a local path

The function utilizes the multistorageclient.StorageClient to download a file (object) at the provided path. The URL is parsed, and the corresponding multistorageclient.StorageClient is retrieved or built.

Parameters:
  • url (str) – The URL of the file to download. (example: msc://profile/prefix/dataset.tar)

  • local_path (str) – The local path where the file should be downloaded.

Raises:

ValueError – If the URL’s protocol does not match the expected protocol msc.

Return type:

None

multistorageclient.get_telemetry() Telemetry | None[source]

Get the :py:class:Telemetry instance to use for storage clients created by shortcuts.

Returns:

A telemetry instance.

Return type:

Telemetry | None

multistorageclient.glob(pattern: str) list[str][source]

Return a list of files matching a pattern.

This function supports glob-style patterns for matching multiple files within a storage system. The pattern is parsed, and the associated multistorageclient.StorageClient is used to retrieve the list of matching files.

Parameters:

pattern (str) – The glob-style pattern to match files. (example: msc://profile/prefix/**/*.tar)

Returns:

A list of file paths matching the pattern.

Raises:

ValueError – If the URL’s protocol does not match the expected protocol msc.

Return type:

list[str]

multistorageclient.is_empty(url: str) bool[source]

Checks whether the specified URL contains any objects.

Parameters:

url (str) – The URL to check, typically pointing to a storage location.

Returns:

True if there are no objects/files under this URL, False otherwise.

Raises:

ValueError – If the URL’s protocol does not match the expected protocol msc.

Return type:

bool

multistorageclient.is_file(url: str) bool[source]

Checks whether the specified url points to a file (rather than a directory or folder).

The function utilizes the multistorageclient.StorageClient to check if a file (object) exists at the provided path. The URL is parsed, and the corresponding multistorageclient.StorageClient is retrieved or built.

Parameters:

url (str) – The URL to check the existence of a file. (example: msc://profile/prefix/dataset.tar)

Return type:

bool

multistorageclient.list(url: str, start_after: str | None = None, end_at: str | None = None, include_directories: bool = False) Iterator[ObjectMetadata][source]

Lists the contents of the specified URL prefix.

This function retrieves the corresponding multistorageclient.StorageClient for the given URL and returns an iterator of objects (files or directories) stored under the provided prefix.

Parameters:
  • url (str) – The prefix to list objects under.

  • start_after (str | None) – The key to start after (i.e. exclusive). An object with this key doesn’t have to exist.

  • end_at (str | None) – The key to end at (i.e. inclusive). An object with this key doesn’t have to exist.

  • include_directories (bool) – Whether to include directories in the result. When True, directories are returned alongside objects.

Returns:

An iterator of ObjectMetadata objects representing the files (and optionally directories) accessible under the specified URL prefix. The returned keys will always be prefixed with msc://.

Return type:

Iterator[ObjectMetadata]

multistorageclient.open(url: str, mode: str = 'rb', **kwargs: Any) PosixFile | ObjectFile[source]

Open a file at the given URL using the specified mode.

The function utilizes the multistorageclient.StorageClient to open a file at the provided path. The URL is parsed, and the corresponding multistorageclient.StorageClient is retrieved or built.

Parameters:
  • url (str) – The URL of the file to open. (example: msc://profile/prefix/dataset.tar)

  • mode (str) – The file mode to open the file in.

  • kwargs (Any)

Returns:

A file-like object that allows interaction with the file.

Raises:

ValueError – If the URL’s protocol does not match the expected protocol msc.

Return type:

PosixFile | ObjectFile

multistorageclient.resolve_storage_client(url: str) tuple[StorageClient, str][source]

Build and return a multistorageclient.StorageClient instance based on the provided URL or path.

This function parses the given URL or path and determines the appropriate storage profile and path. It supports URLs with the protocol msc://, as well as POSIX paths or file:// URLs for local file system access. If the profile has already been instantiated, it returns the cached client. Otherwise, it creates a new StorageClient and caches it.

The function also supports implicit profiles for non-MSC URLs. When a non-MSC URL is provided (like s3://, gs://, ais://, file://), MSC will infer the storage provider based on the URL protocol and create an implicit profile with the naming convention “_protocol-bucket” (e.g., “_s3-bucket1”, “_gs-bucket1”).

Path mapping defined in the MSC configuration are also applied before creating implicit profiles. This allows for explicit mappings between source paths and destination MSC profiles.

Parameters:

url (str) – The storage location, which can be: - A URL in the format msc://profile/path for object storage. - A local file system path (absolute POSIX path) or a file:// URL. - A non-MSC URL with a supported protocol (s3://, gs://, ais://).

Returns:

A tuple containing the multistorageclient.StorageClient instance and the parsed path.

Raises:

ValueError – If the URL’s protocol is neither msc nor a valid local file system path or a supported non-MSC protocol.

Return type:

tuple[StorageClient, str]

multistorageclient.set_telemetry(telemetry: Telemetry | None) None[source]

Set the :py:class:Telemetry instance to use for storage clients created by shortcuts.

Parameters:

telemetry (Telemetry | None) – A telemetry instance.

Return type:

None

multistorageclient.sync(source_url: str, target_url: str, delete_unmatched_files: bool = False) None[source]

Syncs files from the source storage to the target storage.

Parameters:
  • source_url (str) – The URL for the source storage.

  • target_url (str) – The URL for the target storage.

  • delete_unmatched_files (bool) – Whether to delete files at the target that are not present at the source.

Return type:

None

multistorageclient.upload_file(url: str, local_path: str) None[source]

Upload a file to the given URL from a local path.

The function utilizes the multistorageclient.StorageClient to upload a file (object) to the provided path. The URL is parsed, and the corresponding multistorageclient.StorageClient is retrieved or built.

Parameters:
  • url (str) – The URL of the file. (example: msc://profile/prefix/dataset.tar)

  • local_path (str) – The local path of the file.

Raises:

ValueError – If the URL’s protocol does not match the expected protocol msc.

Return type:

None

multistorageclient.write(url: str, body: bytes) None[source]

Writes an object to the storage provider at the specified path.

Parameters:
  • url (str) – The path where the object should be written.

  • body (bytes) – The content to write to the object.

Return type:

None

Types

class multistorageclient.types.Credentials(access_key: str, secret_key: str, token: str | None, expiration: str | None, custom_fields: dict[str, ~typing.Any] = <factory>)[source]

A data class representing the credentials needed to access a storage provider.

Parameters:
access_key: str

The access key for authentication.

custom_fields: dict[str, Any]

A dictionary for storing custom key-value pairs.

expiration: str | None

The expiration time of the credentials in ISO 8601 format.

get_custom_field(key: str, default: Any | None = None) Any[source]

Retrieves a value from custom fields by its key.

Parameters:
  • key (str) – The key to look up in custom fields.

  • default (Any | None) – The default value to return if the key is not found.

Returns:

The value associated with the key, or the default value if not found.

Return type:

Any

is_expired() bool[source]

Checks if the credentials are expired based on the expiration time.

Returns:

True if the credentials are expired, False otherwise.

Return type:

bool

secret_key: str

The secret key for authentication.

token: str | None

An optional security token for temporary credentials.

class multistorageclient.types.CredentialsProvider[source]

Abstract base class for providing credentials to access a storage provider.

abstract get_credentials() Credentials[source]

Retrieves the current credentials.

Returns:

The current credentials used for authentication.

Return type:

Credentials

abstract refresh_credentials() None[source]

Refreshes the credentials if they are expired or about to expire.

Return type:

None

class multistorageclient.types.MetadataProvider[source]

Abstract base class for accessing file metadata.

abstract add_file(path: str, metadata: ObjectMetadata) None[source]

Add a file to be tracked by the MetadataProvider. Does not have to be reflected in listing until a MetadataProvider.commit_updates() forces a persist. This function must tolerate duplicate calls (idempotent behavior).

Parameters:
  • path (str) – User-supplied virtual path

  • metadata (ObjectMetadata) – physical file metadata from StorageProvider

Return type:

None

abstract commit_updates() None[source]

Commit any newly adding files, used in conjunction with MetadataProvider.add_file(). MetadataProvider will persistently record any metadata changes.

Return type:

None

abstract get_object_metadata(path: str, include_pending: bool = False) ObjectMetadata[source]

Retrieves metadata or information about an object stored in the provider.

Parameters:
  • path (str) – The path of the object.

  • include_pending (bool) – Whether to include metadata that is not yet committed.

Returns:

A metadata object containing the information about the object.

Return type:

ObjectMetadata

abstract glob(pattern: str) list[str][source]

Matches and retrieves a list of object keys in the storage provider that match the specified pattern.

Parameters:

pattern (str) – The pattern to match object keys against, supporting wildcards (e.g., *.txt).

Returns:

A list of object keys that match the specified pattern.

Return type:

list[str]

abstract is_writable() bool[source]

Returns True if the MetadataProvider supports writes else False.

Return type:

bool

abstract list_objects(prefix: str, start_after: str | None = None, end_at: str | None = None, include_directories: bool = False) Iterator[ObjectMetadata][source]

Lists objects in the storage provider under the specified prefix.

Parameters:
  • prefix (str) – The prefix or path to list objects under.

  • start_after (str | None) – The key to start after (i.e. exclusive). An object with this key doesn’t have to exist.

  • end_at (str | None) – The key to end at (i.e. inclusive). An object with this key doesn’t have to exist.

  • include_directories (bool) – Whether to include directories in the result. When True, directories are returned alongside objects.

Returns:

A iterator over objects metadata under the specified prefix.

Return type:

Iterator[ObjectMetadata]

abstract realpath(path: str) tuple[str, bool][source]

Returns the canonical, full real physical path for use by a StorageProvider. This provides translation from user-visible paths to the canonical paths needed by a StorageProvider.

Parameters:

path (str) – user-supplied virtual path

Returns:

A canonical physical path and if the object at the path is valid

Return type:

tuple[str, bool]

abstract remove_file(path: str) None[source]

Remove a file tracked by the MetadataProvider. Does not have to be reflected in listing until a MetadataProvider.commit_updates() forces a persist. This function must tolerate duplicate calls (idempotent behavior).

Parameters:

path (str) – User-supplied virtual path

Return type:

None

exception multistorageclient.types.NotModifiedError[source]

Raised when a conditional operation fails because the resource has not been modified.

This typically occurs when using if-none-match with a specific generation/etag and the resource’s current generation/etag matches the specified one.

class multistorageclient.types.ObjectMetadata(key: str, content_length: int, last_modified: datetime, type: str = 'file', content_type: str | None = None, etag: str | None = None, storage_class: str | None = None, metadata: dict[str, Any] | None = None)[source]

A data class that represents the metadata associated with an object stored in a cloud storage service. This metadata includes both required and optional information about the object.

Parameters:
content_length: int

The size of the object in bytes.

content_type: str | None = None

The MIME type of the object.

etag: str | None = None

The entity tag (ETag) of the object.

static from_dict(data: dict) ObjectMetadata[source]

Creates an ObjectMetadata instance from a dictionary (parsed from JSON).

Parameters:

data (dict)

Return type:

ObjectMetadata

key: str

Relative path of the object.

last_modified: datetime

The timestamp indicating when the object was last modified.

metadata: dict[str, Any] | None = None
storage_class: str | None = None

The storage class of the object.

to_dict() dict[source]
Return type:

dict

type: str = 'file'
exception multistorageclient.types.PreconditionFailedError[source]

Exception raised when a precondition fails. e.g. if-match, if-none-match, etc.

class multistorageclient.types.ProviderBundle[source]

Abstract base class that serves as a container for various providers (storage, credentials, and metadata) that interact with a storage service. The ProviderBundle abstracts access to these providers, allowing for flexible implementations of cloud storage solutions.

abstract property credentials_provider: CredentialsProvider | None
Returns:

The credentials provider responsible for managing authentication credentials required to access the storage service.

abstract property metadata_provider: MetadataProvider | None
Returns:

The metadata provider responsible for retrieving metadata about objects in the storage service.

abstract property storage_provider_config: StorageProviderConfig
Returns:

The configuration for the storage provider, which includes the provider name/type and additional options.

class multistorageclient.types.Range(offset: int, size: int)[source]

Byte-range read.

Parameters:
offset: int
size: int
class multistorageclient.types.RetryConfig(attempts: int = 3, delay: float = 1.0)[source]

A data class that represents the configuration for retry strategy.

Parameters:
attempts: int = 3

The number of attempts before giving up. Must be at least 1.

delay: float = 1.0

The delay (in seconds) between retry attempts. Must be a non-negative value.

exception multistorageclient.types.RetryableError[source]

Exception raised for errors that should trigger a retry.

class multistorageclient.types.SourceVersionCheckMode(value)[source]

Enum for controlling source version checking behavior.

DISABLE = 'disable'
ENABLE = 'enable'
INHERIT = 'inherit'
class multistorageclient.types.StorageProvider[source]

Abstract base class for interacting with a storage provider.

abstract copy_object(src_path: str, dest_path: str) None[source]

Copies an object from source to destination in the storage provider.

Parameters:
  • src_path (str) – The path of the source object to copy.

  • dest_path (str) – The path of the destination.

Return type:

None

abstract delete_object(path: str, if_match: str | None = None) None[source]

Deletes an object from the storage provider.

Parameters:
  • path (str) – The path of the object to delete.

  • if_match (str | None) – Optional if-match value to use for conditional deletion.

Return type:

None

abstract download_file(remote_path: str, f: str | IO, metadata: ObjectMetadata | None = None) None[source]

Downloads a file from the storage provider to the local file system.

Parameters:
  • remote_path (str) – The path of the file to download.

  • f (str | IO) – The destination for the downloaded file. This can either be a string representing the local file path where the file will be saved, or a file-like object to write the downloaded content into.

  • metadata (ObjectMetadata | None) – Metadata about the object to download.

Return type:

None

abstract get_object(path: str, byte_range: Range | None = None) bytes[source]

Retrieves an object from the storage provider.

Parameters:
  • path (str) – The path where the object is stored.

  • byte_range (Range | None)

Returns:

The content of the retrieved object.

Return type:

bytes

abstract get_object_metadata(path: str, strict: bool = True) ObjectMetadata[source]

Retrieves metadata or information about an object stored in the provider.

Parameters:
  • path (str) – The path of the object.

  • strict (bool) – If True, performs additional validation to determine whether the path refers to a directory.

Returns:

A metadata object containing the information about the object.

Return type:

ObjectMetadata

abstract glob(pattern: str) list[str][source]

Matches and retrieves a list of object keys in the storage provider that match the specified pattern.

Parameters:

pattern (str) – The pattern to match object keys against, supporting wildcards (e.g., *.txt).

Returns:

A list of object keys that match the specified pattern.

Return type:

list[str]

abstract is_file(path: str) bool[source]

Checks whether the specified key in the storage provider points to a file (as opposed to a folder or directory).

Parameters:

path (str) – The path to check.

Returns:

True if the key points to a file, False if it points to a directory or folder.

Return type:

bool

abstract list_objects(prefix: str, start_after: str | None = None, end_at: str | None = None, include_directories: bool = False) Iterator[ObjectMetadata][source]

Lists objects in the storage provider under the specified prefix.

Parameters:
  • prefix (str) – The prefix or path to list objects under.

  • start_after (str | None) – The key to start after (i.e. exclusive). An object with this key doesn’t have to exist.

  • end_at (str | None) – The key to end at (i.e. inclusive). An object with this key doesn’t have to exist.

  • include_directories (bool) – Whether to include directories in the result. When True, directories are returned alongside objects.

Returns:

An iterator over objects metadata under the specified prefix.

Return type:

Iterator[ObjectMetadata]

abstract put_object(path: str, body: bytes, metadata: dict[str, str] | None = None, if_match: str | None = None, if_none_match: str | None = None) None[source]

Uploads an object to the storage provider.

Parameters:
  • path (str) – The path where the object will be stored.

  • body (bytes) – The content of the object to store.

  • metadata (dict[str, str] | None) – Metadata to associate with the object.

  • if_match (str | None)

  • if_none_match (str | None)

Return type:

None

abstract upload_file(remote_path: str, f: str | IO) None[source]

Uploads a file from the local file system to the storage provider.

Parameters:
  • remote_path (str) – The path where the object will be stored.

  • f (str | IO) – The source file to upload. This can either be a string representing the local file path, or a file-like object (e.g., an open file handle).

Return type:

None

class multistorageclient.types.StorageProviderConfig(type: str, options: dict[str, Any] | None = None)[source]

A data class that represents the configuration needed to initialize a storage provider.

Parameters:
options: dict[str, Any] | None = None

Additional options required to configure the storage provider (e.g., endpoint URLs, region, etc.).

type: str

The name or type of the storage provider (e.g., s3, gcs, oci, azure).

Providers

class multistorageclient.providers.posix_file.PosixFileStorageProvider(base_path: str, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: Any)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with POSIX file systems.

Parameters:
  • base_path (str) – The root prefix path within the POSIX file system where all operations will be scoped.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • kwargs (Any)

glob(pattern: str) list[str][source]

Matches and retrieves a list of object keys in the storage provider that match the specified pattern.

Parameters:

pattern (str) – The pattern to match object keys against, supporting wildcards (e.g., *.txt).

Returns:

A list of object keys that match the specified pattern.

Return type:

list[str]

is_file(path: str) bool[source]

Checks whether the specified key in the storage provider points to a file (as opposed to a folder or directory).

Parameters:

path (str) – The path to check.

Returns:

True if the key points to a file, False if it points to a directory or folder.

Return type:

bool

rmtree(path: str) None[source]
Parameters:

path (str)

Return type:

None

multistorageclient.providers.posix_file.atomic_write(source: str | IO, destination: str)[source]

Writes the contents of a file to the specified destination path.

This function ensures that the file write operation is atomic, meaning the output file is either fully written or not modified at all. This is achieved by writing to a temporary file first and then renaming it to the destination path.

Parameters:
  • source (str | IO) – The input file to read from. It can be a string representing the path to a file, or an open file-like object (IO).

  • destination (str) – The path to the destination file where the contents should be written.

class multistorageclient.providers.manifest_metadata.Manifest(version: str, parts: list[ManifestPartReference])[source]

A data class representing a dataset manifest.

Parameters:
static from_dict(data: dict) Manifest[source]

Creates a Manifest instance from a dictionary (parsed from JSON).

Parameters:

data (dict)

Return type:

Manifest

parts: list[ManifestPartReference]

References to manifest parts.

to_json() str[source]
Return type:

str

version: str

Defines the version of the manifest schema.

class multistorageclient.providers.manifest_metadata.ManifestMetadataProvider(storage_provider: StorageProvider, manifest_path: str, writable: bool = False)[source]

Creates a ManifestMetadataProvider.

Parameters:
  • storage_provider (StorageProvider) – Storage provider.

  • manifest_path (str) – Main manifest file path.

  • writable (bool) – If true, allows modifications and new manifests to be written.

add_file(path: str, metadata: ObjectMetadata) None[source]

Add a file to be tracked by the MetadataProvider. Does not have to be reflected in listing until a MetadataProvider.commit_updates() forces a persist. This function must tolerate duplicate calls (idempotent behavior).

Parameters:
  • path (str) – User-supplied virtual path

  • metadata (ObjectMetadata) – physical file metadata from StorageProvider

Return type:

None

commit_updates() None[source]

Commit any newly adding files, used in conjunction with MetadataProvider.add_file(). MetadataProvider will persistently record any metadata changes.

Return type:

None

get_object_metadata(path: str, include_pending: bool = False) ObjectMetadata[source]

Retrieves metadata or information about an object stored in the provider.

Parameters:
  • path (str) – The path of the object.

  • include_pending (bool) – Whether to include metadata that is not yet committed.

Returns:

A metadata object containing the information about the object.

Return type:

ObjectMetadata

glob(pattern: str) list[str][source]

Matches and retrieves a list of object keys in the storage provider that match the specified pattern.

Parameters:

pattern (str) – The pattern to match object keys against, supporting wildcards (e.g., *.txt).

Returns:

A list of object keys that match the specified pattern.

Return type:

list[str]

is_writable() bool[source]

Returns True if the MetadataProvider supports writes else False.

Return type:

bool

list_objects(prefix: str, start_after: str | None = None, end_at: str | None = None, include_directories: bool = False) Iterator[ObjectMetadata][source]

Lists objects in the storage provider under the specified prefix.

Parameters:
  • prefix (str) – The prefix or path to list objects under.

  • start_after (str | None) – The key to start after (i.e. exclusive). An object with this key doesn’t have to exist.

  • end_at (str | None) – The key to end at (i.e. inclusive). An object with this key doesn’t have to exist.

  • include_directories (bool) – Whether to include directories in the result. When True, directories are returned alongside objects.

Returns:

A iterator over objects metadata under the specified prefix.

Return type:

Iterator[ObjectMetadata]

realpath(path: str) tuple[str, bool][source]

Returns the canonical, full real physical path for use by a StorageProvider. This provides translation from user-visible paths to the canonical paths needed by a StorageProvider.

Parameters:

path (str) – user-supplied virtual path

Returns:

A canonical physical path and if the object at the path is valid

Return type:

tuple[str, bool]

remove_file(path: str) None[source]

Remove a file tracked by the MetadataProvider. Does not have to be reflected in listing until a MetadataProvider.commit_updates() forces a persist. This function must tolerate duplicate calls (idempotent behavior).

Parameters:

path (str) – User-supplied virtual path

Return type:

None

class multistorageclient.providers.manifest_metadata.ManifestPartReference(path: str)[source]

A data class representing a reference to dataset manifest part.

Parameters:

path (str)

static from_dict(data: dict[str, Any]) ManifestPartReference[source]

Creates a ManifestPartReference instance from a dictionary.

Parameters:

data (dict[str, Any])

Return type:

ManifestPartReference

path: str

The path of the manifest part relative to the main manifest.

to_dict() dict[source]

Converts ManifestPartReference instance to a dictionary.

Return type:

dict

class multistorageclient.providers.ais.AIStoreStorageProvider(endpoint: str = '', provider: str = 'ais', skip_verify: bool = True, ca_cert: str | None = None, timeout: float | tuple[float, float] | None = None, retry: dict[str, Any] | None = None, base_path: str = '', credentials_provider: CredentialsProvider | None = None, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: Any)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with NVIDIA AIStore.

AIStore client for managing buckets, objects, and ETL jobs.

Parameters:
  • endpoint (str) – The AIStore endpoint.

  • skip_verify (bool) – Whether to skip SSL certificate verification.

  • ca_cert (str | None) – Path to a CA certificate file for SSL verification.

  • timeout (float | tuple[float, float] | None) – Request timeout in seconds; a single float for both connect/read timeouts (e.g., 5.0), a tuple for separate connect/read timeouts (e.g., (3.0, 10.0)), or None to disable timeout.

  • retry (dict[str, Any] | None) – urllib3.util.Retry parameters.

  • token – Authorization token. If not provided, the AIS_AUTHN_TOKEN environment variable will be used.

  • base_path (str) – The root prefix path within the bucket where all operations will be scoped.

  • credentials_provider (CredentialsProvider | None) – The provider to retrieve AIStore credentials.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • provider (str)

  • kwargs (Any)

class multistorageclient.providers.ais.StaticAISCredentialProvider(username: str | None = None, password: str | None = None, authn_endpoint: str | None = None, token: str | None = None, skip_verify: bool = True, ca_cert: str | None = None)[source]

A concrete implementation of the multistorageclient.types.CredentialsProvider that provides static S3 credentials.

Initializes the StaticAISCredentialProvider with the given credentials.

Parameters:
  • username (str | None) – The username for the AIStore authentication.

  • password (str | None) – The password for the AIStore authentication.

  • authn_endpoint (str | None) – The AIStore authentication endpoint.

  • token (str | None) – The AIStore authentication token. This is used for authentication if username, password and authn_endpoint are not provided.

  • skip_verify (bool) – If true, skip SSL certificate verification.

  • ca_cert (str | None) – Path to a CA certificate file for SSL verification.

get_credentials() Credentials[source]

Retrieves the current credentials.

Returns:

The current credentials used for authentication.

Return type:

Credentials

refresh_credentials() None[source]

Refreshes the credentials if they are expired or about to expire.

Return type:

None

class multistorageclient.providers.azure.AzureBlobStorageProvider(endpoint_url: str, base_path: str = '', credentials_provider: CredentialsProvider | None = None, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: dict[str, Any])[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with Azure Blob Storage.

Initializes the AzureBlobStorageProvider with the endpoint URL and optional credentials provider.

Parameters:
  • endpoint_url (str) – The Azure storage account URL.

  • base_path (str) – The root prefix path within the container where all operations will be scoped.

  • credentials_provider (CredentialsProvider | None) – The provider to retrieve Azure credentials.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • kwargs (dict[str, Any])

class multistorageclient.providers.azure.StaticAzureCredentialsProvider(connection: str)[source]

A concrete implementation of the multistorageclient.types.CredentialsProvider that provides static Azure credentials.

Initializes the StaticAzureCredentialsProvider with the provided connection string.

Parameters:

connection (str) – The connection string for Azure Blob Storage authentication.

get_credentials() Credentials[source]

Retrieves the current credentials.

Returns:

The current credentials used for authentication.

Return type:

Credentials

refresh_credentials() None[source]

Refreshes the credentials if they are expired or about to expire.

Return type:

None

class multistorageclient.providers.gcs.GoogleIdentityPoolCredentialsProvider(audience: str, token_supplier: str)[source]

A concrete implementation of the multistorageclient.types.CredentialsProvider that provides Google’s identity pool credentials.

Initializes the GoogleIdentityPoolCredentials with the audience and token supplier.

Parameters:
  • audience (str) – The audience for the Google Identity Pool.

  • token_supplier (str) – The token supplier for the Google Identity Pool.

get_credentials() Credentials[source]

Retrieves the current credentials.

Returns:

The current credentials used for authentication.

Return type:

Credentials

refresh_credentials() None[source]

Refreshes the credentials if they are expired or about to expire.

Return type:

None

class multistorageclient.providers.gcs.GoogleStorageProvider(project_id: str = '', endpoint_url: str = '', base_path: str = '', credentials_provider: CredentialsProvider | None = None, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: Any)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with Google Cloud Storage.

Initializes the GoogleStorageProvider with the project ID and optional credentials provider.

Parameters:
  • project_id (str) – The Google Cloud project ID.

  • endpoint_url (str) – The custom endpoint URL for the GCS service.

  • base_path (str) – The root prefix path within the bucket where all operations will be scoped.

  • credentials_provider (CredentialsProvider | None) – The provider to retrieve GCS credentials.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • kwargs (Any)

class multistorageclient.providers.gcs.StringTokenSupplier(token: str)[source]

Supply a string token to the Google Identity Pool.

Parameters:

token (str)

get_subject_token(context, request)[source]

Returns the requested subject token. The subject token must be valid.

Args:
context (google.auth.externalaccount.SupplierContext): The context object

containing information about the requested audience and subject token type.

request (google.auth.transport.Request): The object used to make

HTTP requests.

Raises:
google.auth.exceptions.RefreshError: If an error is encountered during

subject token retrieval logic.

Returns:

str: The requested subject token string.

class multistorageclient.providers.gcs_s3.GoogleS3StorageProvider(*args, **kwargs)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with GCS via its S3 interface.

Initializes the S3StorageProvider with the region, endpoint URL, and optional credentials provider.

Parameters:
  • region_name – The AWS region where the S3 bucket is located.

  • endpoint_url – The custom endpoint URL for the S3 service.

  • base_path – The root prefix path within the S3 bucket where all operations will be scoped.

  • credentials_provider – The provider to retrieve S3 credentials.

  • metric_counters – Metric counters.

  • metric_gauges – Metric gauges.

  • metric_attributes_providers – Metric attributes providers.

class multistorageclient.providers.oci.OracleStorageProvider(namespace: str, base_path: str = '', credentials_provider: CredentialsProvider | None = None, retry_strategy: dict[str, Any] | None = None, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: Any)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with Oracle Cloud Infrastructure (OCI) Object Storage.

Initializes an instance of OracleStorageProvider.

Parameters:
  • namespace (str) – The OCI Object Storage namespace. This is a unique identifier assigned to each tenancy.

  • base_path (str) – The root prefix path within the bucket where all operations will be scoped.

  • credentials_provider (CredentialsProvider | None) – The provider to retrieve OCI credentials.

  • retry_strategy (dict[str, Any] | None) – oci.retry.RetryStrategyBuilder parameters.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • kwargs (Any)

class multistorageclient.providers.s3.S3StorageProvider(region_name: str = '', endpoint_url: str = '', base_path: str = '', credentials_provider: CredentialsProvider | None = None, metric_counters: dict[CounterName, Counter] = {}, metric_gauges: dict[GaugeName, Gauge] = {}, metric_attributes_providers: Sequence[AttributesProvider] = (), **kwargs: Any)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with Amazon S3 or S3-compatible object stores.

Initializes the S3StorageProvider with the region, endpoint URL, and optional credentials provider.

Parameters:
  • region_name (str) – The AWS region where the S3 bucket is located.

  • endpoint_url (str) – The custom endpoint URL for the S3 service.

  • base_path (str) – The root prefix path within the S3 bucket where all operations will be scoped.

  • credentials_provider (CredentialsProvider | None) – The provider to retrieve S3 credentials.

  • metric_counters (dict[CounterName, Counter]) – Metric counters.

  • metric_gauges (dict[GaugeName, Gauge]) – Metric gauges.

  • metric_attributes_providers (Sequence[AttributesProvider]) – Metric attributes providers.

  • kwargs (Any)

class multistorageclient.providers.s3.StaticS3CredentialsProvider(access_key: str, secret_key: str, session_token: str | None = None)[source]

A concrete implementation of the multistorageclient.types.CredentialsProvider that provides static S3 credentials.

Initializes the StaticS3CredentialsProvider with the provided access key, secret key, and optional session token.

Parameters:
  • access_key (str) – The access key for S3 authentication.

  • secret_key (str) – The secret key for S3 authentication.

  • session_token (str | None) – An optional session token for temporary credentials.

get_credentials() Credentials[source]

Retrieves the current credentials.

Returns:

The current credentials used for authentication.

Return type:

Credentials

refresh_credentials() None[source]

Refreshes the credentials if they are expired or about to expire.

Return type:

None

class multistorageclient.providers.s8k.S8KStorageProvider(*args, **kwargs)[source]

A concrete implementation of the multistorageclient.types.StorageProvider for interacting with SwiftStack.

Initializes the S3StorageProvider with the region, endpoint URL, and optional credentials provider.

Parameters:
  • region_name – The AWS region where the S3 bucket is located.

  • endpoint_url – The custom endpoint URL for the S3 service.

  • base_path – The root prefix path within the S3 bucket where all operations will be scoped.

  • credentials_provider – The provider to retrieve S3 credentials.

  • metric_counters – Metric counters.

  • metric_gauges – Metric gauges.

  • metric_attributes_providers – Metric attributes providers.

Telemetry

class multistorageclient.telemetry.Telemetry[source]

Provides telemetry resources.

Instances shouldn’t be copied between processes. Not fork-safe or pickleable.

Instances can be shared between processes by registering with a multiprocessing.managers.BaseManager and using proxy objects.

class multistorageclient.telemetry.TelemetryManager(address=None, authkey=None, serializer='pickle', ctx=None)[source]

A multiprocessing.managers.BaseManager for telemetry resources.

The OpenTelemetry Python SDK isn’t fork-safe since telemetry sample buffers can be duplicated.

In addition, Python ≤3.12 doesn’t call exit handlers for forked processes. This causes the OpenTelemetry Python SDK to not flush telemetry before exiting.

Forking is multiprocessing’s default start method for non-macOS POSIX systems until Python 3.14.

To fully support multiprocessing, resampling + publishing is handled by a single process that’s (ideally) a child of (i.e. directly under) the main process. This:

  • Relieves other processes of this work.

    • Avoids issues with duplicate samples when forking and unpublished samples when exiting forks.

  • Allows cross-process resampling.

  • Reuses a single connection pool to telemetry backends.

The downside is it essentially re-introduces global interpreter lock (GIL) with additional IPC overhead. Telemetry operations, however, should be lightweight so this isn’t expected to be a problem. Remote data store latency should still be the primary throughput limiter for storage clients.

multiprocessing.managers.BaseManager is used for this since it creates a separate server process for shared objects.

Telemetry resources are provided as proxy objects for location transparency.

The documentation isn’t particularly detailed, but others have written comprehensively on this:

By specification, metric and tracer providers must call shutdown on any underlying metric readers + span processors + exporters.

In the OpenTelemetry Python SDK, provider shutdown is called automatically by exit handlers (when they work at least). Consequently, clients should:

  • Only receive proxy objects.

    • Enables metric reader + span processor + exporter re-use across processes.

  • Never call shutdown on the proxy objects.

    • The shutdown exit handler is registered on the manager’s server process.

    • ⚠️ We expect a finite number of providers (i.e. no dynamic configs) so we don’t leak them.

class multistorageclient.telemetry.TelemetryMode(value)[source]

How to create a Telemetry object.

CLIENT = 'client'

Connect to a telemetry IPC server.

LOCAL = 'local'

Keep everything local to the process (not fork-safe).

SERVER = 'server'

Start + connect to a telemetry IPC server.

multistorageclient.telemetry.init(mode: TelemetryMode = TelemetryMode.SERVER, address: str | tuple[str, int] | None = None) Telemetry[source]

Create or return an existing Telemetry instance or Telemetry proxy object.

Parameters:
Returns:

A telemetry instance.

Return type:

Telemetry

Attributes

class multistorageclient.telemetry.attributes.base.AttributesProvider[source]

Provides opentelemetry.util.types.Attributes.

abstract attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.environment_variables.EnvironmentVariablesAttributesProvider(attributes: Mapping[str, str])[source]

Provides opentelemetry.util.types.Attributes from environment variables.

Parameters:

attributes (Mapping[str, str]) – Map of attribute key to environment variable key.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.host.HostAttributesProvider(attributes: Mapping[str, str])[source]

Provides opentelemetry.util.types.Attributes from host information.

Parameters:

attributes (Mapping[str, str]) – Map of attribute key to host attribute.

class HostAttribute(value)[source]

Host attribute.

Use the enum value in the attributes dictionary values.

NAME = 'name'

Hostname.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.msc_config.MSCConfigAttributesProvider(attributes: Mapping[str, AttributeValueOptions], config_dict: Mapping[str, Any])[source]

Provides opentelemetry.util.types.Attributes from a multi-storage client configuration.

Parameters:
class AttributeValueOptions[source]

MSC configuration attribute value options.

expression: str

JMESPath expression.

Additional JMESPath functions:

  • hash(algorithm: str, value: str)
    • Calculate the hash digest of a value using a specific hash algorithm (e.g. sha3-256).

    • See hashlib.new() for algorithms.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.process.ProcessAttributesProvider(attributes: Mapping[str, str])[source]

Provides opentelemetry.util.types.Attributes from current process information.

Parameters:

attributes (Mapping[str, str]) – Map of attribute key to process attribute.

class ProcessAttribute(value)[source]

Process attribute.

Use the enum value in the attributes dictionary values.

PID = 'pid'

Process ID.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.static.StaticAttributesProvider(attributes: Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None)[source]

Provides opentelemetry.util.types.Attributes from static attributes.

Parameters:

attributes (Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None) – Map of attribute key to static attribute value.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

class multistorageclient.telemetry.attributes.thread.ThreadAttributesProvider(attributes: Mapping[str, str])[source]

Provides opentelemetry.util.types.Attributes from current thread information.

Parameters:

attributes (Mapping[str, str]) – Map of attribute key to thread attribute.

class ThreadAttribute(value)[source]

Thread attribute.

Use the enum value in the attributes dictionary values.

IDENT = 'ident'

Python thread ID.

NATIVE_ID = 'native_id'

OS thread ID.

attributes() Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None[source]

Collect attributes.

Return type:

Mapping[str, str | bool | int | float | Sequence[str] | Sequence[bool] | Sequence[int] | Sequence[float]] | None

Metrics

Readers

class multistorageclient.telemetry.metrics.readers.diperiodic_exporting.DiperiodicExportingMetricReader(exporter: MetricExporter, collect_interval_millis: float | None = None, collect_timeout_millis: float | None = None, export_interval_millis: float | None = None, export_timeout_millis: float | None = None)[source]

opentelemetry.sdk.metrics.export.MetricReader that collects + exports metrics on separate user-configurable time intervals. This is in contrast with opentelemetry.sdk.metrics.export.PeriodicExportingMetricReader which couples them with a 1 minute default.

The metrics collection interval limits the temporal resolution. Most metric backends have 1 millisecond or finer temporal resolution.

Parameters:
  • exporter (MetricExporter) – Metrics exporter.

  • collect_interval_millis (float | None) – Collect interval in milliseconds.

  • collect_timeout_millis (float | None) – Collect timeout in milliseconds.

  • export_interval_millis (float | None) – Export interval in milliseconds.

  • export_timeout_millis (float | None) – Export timeout in milliseconds.

force_flush(timeout_millis: float = 40000) bool[source]
Parameters:

timeout_millis (float)

Return type:

bool

shutdown(timeout_millis: float = 40000, **kwargs) None[source]

Shuts down the MetricReader. This method provides a way for the MetricReader to do any cleanup required. A metric reader can only be shutdown once, any subsequent calls are ignored and return failure status.

When a MetricReader is registered on a MeterProvider, shutdown() will invoke this automatically.

Parameters:

timeout_millis (float)

Return type:

None

Generators

class multistorageclient.generators.ManifestMetadataGenerator[source]

Generates a file metadata manifest for use with a multistorageclient.providers.ManifestMetadataProvider.

static generate_and_write_manifest(data_storage_client: StorageClient, manifest_storage_client: StorageClient, partition_keys: list[str] | None = None) None[source]

Generates a file metadata manifest.

The data storage client’s base path should be set to the root path for data objects (e.g. my-bucket/my-data-prefix).

The manifest storage client’s base path should be set to the root path for manifest objects (e.g. my-bucket/my-manifest-prefix).

The following manifest objects will be written with the destination storage client (with the total number of manifest parts being variable):

.msc_manifests/
├── msc_manifest_index.json
└── parts/
    ├── msc_manifest_part000001.jsonl
    ├── ...
    └── msc_manifest_part999999.jsonl
Parameters:
  • data_storage_client (StorageClient) – Storage client for reading data objects.

  • manifest_storage_client (StorageClient) – Storage client for writing manifest objects.

  • partition_keys (list[str] | None) – Optional list of keys to partition the listing operation. If provided, objects will be listed concurrently using these keys as boundaries.

Return type:

None

Higher-Level Libraries

fsspec

class multistorageclient.contrib.async_fs.MultiStorageAsyncFileSystem(*args, **kwargs)[source]

Custom fsspec.asyn.AsyncFileSystem implementation for MSC protocol (msc://). Uses multistorageclient.StorageClient for backend operations.

Initializes the MultiStorageAsyncFileSystem.

Parameters:

kwargs – Additional arguments for the fsspec.asyn.AsyncFileSystem.

static asynchronize_sync(func: Callable[[...], Any], *args: Any, **kwargs: Any) Any[source]

Runs a synchronous function asynchronously using asyncio.

Parameters:
  • func (Callable[[...], Any]) – The synchronous function to be executed asynchronously.

  • args (Any) – Positional arguments to pass to the function.

  • kwargs (Any) – Keyword arguments to pass to the function.

Returns:

The result of the asynchronous execution of the function.

Return type:

Any

cat_file(path: str, **kwargs: Any) bytes[source]

Reads the contents of a file at the given path.

Parameters:
  • path (str) – The file path to read from.

  • kwargs (Any) – Additional arguments for file reading functionality.

Returns:

The contents of the file as bytes.

Return type:

bytes

cp_file(path1: str, path2: str, **kwargs: Any)[source]

Copies a file from the source path to the destination path.

Parameters:
  • path1 (str) – The source file path.

  • path2 (str) – The destination file path.

  • kwargs (Any) – Additional arguments for copy functionality.

Raises:

AttributeError – If the source and destination paths are associated with different profiles.

get_file(rpath: str, lpath: str, **kwargs: Any) None[source]

Downloads a file from the remote path to the local path.

Parameters:
  • rpath (str) – The remote path of the file to download.

  • lpath (str) – The local path to store the file.

  • kwargs (Any) – Additional arguments for file retrieval functionality.

Return type:

None

info(path: str, **kwargs: Any) dict[str, Any][source]

Retrieves metadata information for a file.

Parameters:
  • path (str) – The file path to retrieve information for.

  • kwargs (Any) – Additional arguments for info functionality.

Returns:

A dictionary containing file metadata such as ETag, last modified, and size.

Return type:

dict[str, Any]

ls(path: str, detail: bool = True, **kwargs: Any) list[dict[str, Any]] | list[str][source]

Lists the contents of a directory.

Parameters:
  • path (str) – The directory path to list.

  • detail (bool) – Whether to return detailed information for each file.

  • kwargs (Any) – Additional arguments for list functionality.

Returns:

A list of file names or detailed information depending on the ‘detail’ argument.

Return type:

list[dict[str, Any]] | list[str]

open(path: str, mode: str = 'rb', **kwargs: Any) PosixFile | ObjectFile[source]

Opens a file at the given path.

Parameters:
  • path (str) – The file path to open.

  • mode (str) – The mode in which to open the file.

  • kwargs (Any) – Additional arguments for file opening.

Returns:

A ManagedFile object representing the opened file.

Return type:

PosixFile | ObjectFile

pipe_file(path: str, value: bytes, **kwargs: Any) None[source]

Writes a value (bytes) directly to a file at the given path.

Parameters:
  • path (str) – The file path to write the value to.

  • value (bytes) – The bytes to write to the file.

  • kwargs (Any) – Additional arguments for writing functionality.

Return type:

None

protocol: ClassVar[str | tuple[str, ...]] = 'msc'
put_file(lpath: str, rpath: str, **kwargs: Any) None[source]

Uploads a local file to the remote path.

Parameters:
  • lpath (str) – The local path of the file to upload.

  • rpath (str) – The remote path to store the file.

  • kwargs (Any) – Additional arguments for file upload functionality.

Return type:

None

resolve_path_and_storage_client(path: str | PathLike) tuple[StorageClient, str][source]

Resolves the path and retrieves the associated multistorageclient.StorageClient.

Parameters:

path (str | PathLike) – The file path to resolve.

Returns:

A tuple containing the multistorageclient.StorageClient and the resolved path.

Return type:

tuple[StorageClient, str]

rm_file(path: str, **kwargs: Any)[source]

Removes a file.

Parameters:
  • path (str) – The file or directory path to remove.

  • kwargs (Any) – Additional arguments for remove functionality.

NumPy

multistorageclient.contrib.numpy.load(*args: Any, **kwargs: Any) ndarray | dict[str, ndarray] | NpzFile[source]

Adapt numpy.load.

Parameters:
Return type:

ndarray | dict[str, ndarray] | NpzFile

multistorageclient.contrib.numpy.memmap(*args: Any, **kwargs: Any) memmap[source]

Adapt numpy.memmap.

Parameters:
Return type:

memmap

multistorageclient.contrib.numpy.save(*args: Any, **kwargs: Any) None[source]

Adapt numpy.save.

Parameters:
Return type:

None

PyTorch

class multistorageclient.contrib.torch.MultiStorageFileSystem[source]

A filesystem implementation that uses the MultiStoragePath class to handle paths.

concat_path(path: str | PathLike, suffix: str) str | PathLike[source]
Parameters:
Return type:

str | PathLike

create_stream(path: str | PathLike, mode: str) Generator[IOBase, None, None][source]
Parameters:
Return type:

Generator[IOBase, None, None]

exists(path: str | PathLike) bool[source]
Parameters:

path (str | PathLike)

Return type:

bool

init_path(path: str | PathLike) str | PathLike[source]
Parameters:

path (str | PathLike)

Return type:

str | PathLike

ls(path: str | PathLike) list[str][source]
Parameters:

path (str | PathLike)

Return type:

list[str]

mkdir(path: str | PathLike) None[source]
Parameters:

path (str | PathLike)

Return type:

None

rename(path: str | PathLike, new_path: str | PathLike) None[source]
Parameters:
Return type:

None

rm_file(path: str | PathLike) None[source]
Parameters:

path (str | PathLike)

Return type:

None

classmethod validate_checkpoint_id(checkpoint_id: str | PathLike) bool[source]
Parameters:

checkpoint_id (str | PathLike)

Return type:

bool

class multistorageclient.contrib.torch.MultiStorageFileSystemReader(path: str | PathLike, thread_count: int = 1)[source]

A reader implementation that uses the MultiStorageFileSystem class to handle file system operations.

Initialize the MultiStorageFileSystemReader with the MultiStorageFileSystem.

Parameters:
  • path (str | PathLike) – The path to the checkpoint.

  • thread_count (int) – The number of threads to use for prefetching.

read_data(plan: LoadPlan, planner: LoadPlanner) Future[None][source]

Override the method to prefetch objects from object storage.

Parameters:
  • plan (LoadPlan)

  • planner (LoadPlanner)

Return type:

Future[None]

classmethod validate_checkpoint_id(checkpoint_id: str | PathLike) bool[source]

Check if the given checkpoint_id is supported by the stroage. This allow us to enable automatic storage selection.

Parameters:

checkpoint_id (str | PathLike)

Return type:

bool

class multistorageclient.contrib.torch.MultiStorageFileSystemWriter(path: str | PathLike, single_file_per_rank: bool = True, sync_files: bool = True, thread_count: int = 1, per_thread_copy_ahead: int = 10000000, cache_staged_state_dict: bool = False, overwrite: bool = True)[source]

A writer implementation that uses the MultiStorageFileSystem class to handle file system operations.

Initialize the MultiStorageFileSystemWriter with the MultiStorageFileSystem.

Parameters:
classmethod validate_checkpoint_id(checkpoint_id: str | PathLike) bool[source]

Check if the given checkpoint_id is supported by the stroage. This allow us to enable automatic storage selection.

Parameters:

checkpoint_id (str | PathLike)

Return type:

bool

multistorageclient.contrib.torch.load(f: str | PathLike[str] | IO[bytes], *args: Any, **kwargs: Any) Any[source]

Adapt torch.load.

Parameters:
Return type:

Any

multistorageclient.contrib.torch.save(obj: object, f: str | PathLike[str] | IO[bytes], *args: Any, **kwargs: Any) Any[source]

Adapt torch.save.

Parameters:
Return type:

Any

Xarray

multistorageclient.contrib.xarray.open_zarr(*args: Any, **kwargs: Any) Dataset[source]

Adapt xarray.open_zarr to use multistorageclient.contrib.zarr.LazyZarrStore when path matches the msc protocol.

If the path starts with the MSC protocol, it uses multistorageclient.contrib.zarr.LazyZarrStore with a resolved storage client and prefix, passing msc_max_workers if provided. Otherwise, it directly calls xarray.open_zarr.

Parameters:
Return type:

Dataset

Zarr

class multistorageclient.contrib.zarr.LazyZarrStore(storage_client: StorageClient, prefix: str = '', msc_max_workers: int | None = None)[source]
Parameters:
getitems(keys: Sequence[str], *, contexts: Any) Mapping[str, Any][source]

Retrieve data from multiple keys.

Parameters

keysIterable[str]

The keys to retrieve

contexts: Mapping[str, Context]

A mapping of keys to their context. Each context is a mapping of store specific information. E.g. a context could be a dict telling the store the preferred output array type: {“meta_array”: cupy.empty(())}

Returns

Mapping

A collection mapping the input keys to their results.

Notes

This default implementation uses __getitem__() to read each key sequentially and ignores contexts. Overwrite this method to implement concurrent reads of multiple keys and/or to utilize the contexts.

Parameters:
Return type:

Mapping[str, Any]

keys() a set-like object providing a view on D's keys[source]
Return type:

Iterator[str]

multistorageclient.contrib.zarr.open_consolidated(*args: Any, **kwargs: Any) Group[source]

Adapt zarr.open_consolidated to use LazyZarrStore when path matches the msc protocol.

If the path starts with the MSC protocol, it uses LazyZarrStore with a resolved storage client and prefix, passing msc_max_workers if provided. Otherwise, it directly calls zarr.open_consolidated.

Parameters:
Return type:

Group