Documentation contents:
API documentation
CallWrapper
CallWrapper.atomic()
CallWrapper.disable_hang_protection()
CallWrapper.iteration
CallWrapper.ping()
Wrapper
Initialize
Initialize.__call__()
RetryController
Abort
Abort.__call__()
AbortPersistentCheckpointProcesses
AbortTorchDistributed
AbortTransformerEngine
Finalize
Finalize.__call__()
ThreadedFinalize
HealthCheck
HealthCheck.__call__()
ChainedGPUHealthCheck
ChainedNVLHealthCheck
ChainedNicHealthCheck
CudaHealthCheck
FaultCounter
FaultCounterExceeded
HealthCheckError
InternalError
RestartAbort
RestartError
TimeoutError