Exceptions
- exception nvidia_resiliency_ext.inprocess.exception.HealthCheckError[source]
RestartErrorexception to indicate thatinprocess.health_check.HealthCheckraised errors, and execution shouldn’t be restarted on this distributed rank.
- exception nvidia_resiliency_ext.inprocess.exception.InternalError[source]
inprocess.Wrapperinternal error.
- exception nvidia_resiliency_ext.inprocess.exception.RestartAbort[source]
A terminal Python
BaseExceptionindicating that theinprocess.Wrappershould be aborted immediately, bypassing any further restart attempts.