Exceptions

exception nvidia_resiliency_ext.inprocess.exception.HealthCheckError[source]

RestartError exception to indicate that inprocess.health_check.HealthCheck raised errors, and execution shouldn’t be restarted on this distributed rank.

exception nvidia_resiliency_ext.inprocess.exception.InternalError[source]

inprocess.Wrapper internal error.

exception nvidia_resiliency_ext.inprocess.exception.RestartAbort[source]

A terminal Python BaseException indicating that the inprocess.Wrapper should be aborted immediately, bypassing any further restart attempts.

exception nvidia_resiliency_ext.inprocess.exception.RestartError[source]

Base Exception for exceptions raised by inprocess.Wrapper.

exception nvidia_resiliency_ext.inprocess.exception.TimeoutError[source]

inprocess.Wrapper timeout error.