distributed

Utility functions for using torch.distributed.

Classes

`DistributedProcessGroup`	A convenient wrapper around torch.distributed.ProcessGroup objects.
`ParallelState`	A class to manage various parallel groups such as data parallel, tensor parallel etc.

Functions

`backend`	Returns the distributed backend.
`barrier`	Synchronizes all processes.
`is_available`	Returns whether the distributed package is available.
`is_initialized`	Returns whether the distributed package is initialized.
`is_master`	Returns whether the current process is the master process.
`rank`	Returns the rank of the current process.
`size`	Returns the number of processes.

class DistributedProcessGroup

Bases: object

A convenient wrapper around torch.distributed.ProcessGroup objects.

__init__(group=None)

Initialize the distributed process group.

Parameters:: group (ProcessGroup | int | None)

static get_dist_syncd_obj(obj, groups, op)

Get the distributed synchronized object across the specified distributed groups.

Parameters:

obj (Any)
groups (DistributedProcessGroup | list[DistributedProcessGroup])
op (Callable)

is_initialized()

Check if the distributed process group is initialized.

Return type:: bool

rank()

Get the rank of the current process group.

Return type:: int

world_size()

Get the world size of the current process group.

Return type:: int

class ParallelState

Bases: object

A class to manage various parallel groups such as data parallel, tensor parallel etc.

Specify the parallel groups of type torch.distributed.ProcessGroup for the current module. If the parallel group is not used, it should be set to -1. if a parallel group is None, it will use the default PyTorch distributed process group which is the whole world.

__init__(data_parallel_group=None, tensor_parallel_group=-1)

Initialize the parallel state.

Parameters:

data_parallel_group (ProcessGroup | int | None)
tensor_parallel_group (ProcessGroup | int | None)

backend()

Returns the distributed backend.

Return type:: str | None

barrier(group=None)

Synchronizes all processes.

Return type:: None

is_available()

Returns whether the distributed package is available.

Return type:: bool

is_initialized()

Returns whether the distributed package is initialized.

Return type:: bool

is_master(group=None)

Returns whether the current process is the master process.

Return type:: bool

rank(group=None)

Returns the rank of the current process.

Return type:: int

size(group=None)

Returns the number of processes.

Return type:: int