Exceptions
- exception nvidia_resiliency_ext.inprocess.exception.HealthCheckError[source]
RestartError
exception to indicate thatinprocess.health_check.HealthCheck
raised errors, and execution shouldn’t be restarted on this distributed rank.
- exception nvidia_resiliency_ext.inprocess.exception.InternalError[source]
inprocess.Wrapper
internal error.
- exception nvidia_resiliency_ext.inprocess.exception.RestartAbort[source]
A terminal Python
BaseException
indicating that theinprocess.Wrapper
should be aborted immediately, bypassing any further restart attempts.