distributed

Utility functions for using torch.distributed.

Classes

ParallelState

A class to manage various parallel groups such as data parallel, tensor parallel etc.

DistributedProcessGroup

A convenient wrapper around torch.distributed.ProcessGroup objects.

Functions

backend

Returns the distributed backend.

barrier

Synchronizes all processes.

is_available

Returns whether the distributed package is available.

is_initialized

Returns whether the distributed package is initialized.

is_master

Returns whether the current process is the master process.

rank

Returns the rank of the current process.

size

Returns the number of processes.

class DistributedProcessGroup

Bases: object

A convenient wrapper around torch.distributed.ProcessGroup objects.

__init__(group=None)

Initialize the distributed process group.

Parameters:

group (ProcessGroup | int | None) –

static get_dist_syncd_obj(obj, groups, op)

Get the distributed synchronized object across the specified distributed groups.

Parameters:
is_initialized()

Check if the distributed process group is initialized.

Return type:

bool

rank()

Get the rank of the current process group.

Return type:

int

world_size()

Get the world size of the current process group.

Return type:

int

class ParallelState

Bases: object

A class to manage various parallel groups such as data parallel, tensor parallel etc.

Specify the parallel groups of type torch.distributed.ProcessGroup for the current module. If the parallel group is not used, it should be set to -1. if a parallel group is None, it will use the default PyTorch distributed process group which is the whole world.

__init__(data_parallel_group=None, tensor_parallel_group=-1)

Initialize the parallel state.

Parameters:
  • data_parallel_group (ProcessGroup | int | None) –

  • tensor_parallel_group (ProcessGroup | int | None) –

backend()

Returns the distributed backend.

Return type:

str | None

barrier(group=None)

Synchronizes all processes.

Return type:

None

is_available()

Returns whether the distributed package is available.

Return type:

bool

is_initialized()

Returns whether the distributed package is initialized.

Return type:

bool

is_master(group=None)

Returns whether the current process is the master process.

Return type:

bool

rank(group=None)

Returns the rank of the current process.

Return type:

int

size(group=None)

Returns the number of processes.

Return type:

int