distributed
Utility functions for using torch.distributed.
Classes
A class to manage various parallel groups such as data parallel, tensor parallel etc. |
|
A convenient wrapper around torch.distributed.ProcessGroup objects. |
Functions
Returns the distributed backend. |
|
Synchronizes all processes. |
|
Returns whether the distributed package is available. |
|
Returns whether the distributed package is initialized. |
|
Returns whether the current process is the master process. |
|
Returns the rank of the current process. |
|
Returns the number of processes. |
- class DistributedProcessGroup
Bases:
object
A convenient wrapper around torch.distributed.ProcessGroup objects.
- __init__(group=None)
Initialize the distributed process group.
- Parameters:
group (ProcessGroup | int | None) –
- static get_dist_syncd_obj(obj, groups, op)
Get the distributed synchronized object across the specified distributed groups.
- Parameters:
obj (Any) –
groups (DistributedProcessGroup | list[DistributedProcessGroup]) –
op (Callable) –
- is_initialized()
Check if the distributed process group is initialized.
- Return type:
bool
- rank()
Get the rank of the current process group.
- Return type:
int
- world_size()
Get the world size of the current process group.
- Return type:
int
- class ParallelState
Bases:
object
A class to manage various parallel groups such as data parallel, tensor parallel etc.
Specify the parallel groups of type
torch.distributed.ProcessGroup
for the current module. If the parallel group is not used, it should be set to -1. if a parallel group is None, it will use the default PyTorch distributed process group which is the whole world.- __init__(data_parallel_group=None, tensor_parallel_group=-1)
Initialize the parallel state.
- Parameters:
data_parallel_group (ProcessGroup | int | None) –
tensor_parallel_group (ProcessGroup | int | None) –
- backend()
Returns the distributed backend.
- Return type:
str | None
- barrier(group=None)
Synchronizes all processes.
- Return type:
None
- is_available()
Returns whether the distributed package is available.
- Return type:
bool
- is_initialized()
Returns whether the distributed package is initialized.
- Return type:
bool
- is_master(group=None)
Returns whether the current process is the master process.
- Return type:
bool
- rank(group=None)
Returns the rank of the current process.
- Return type:
int
- size(group=None)
Returns the number of processes.
- Return type:
int