nvidia-resiliency-ext

Documentation contents:

  • Fault Tolerance
  • Inprocess Restart
    • Usage Guide
    • API documentation
      • Wrapper
      • Compose
      • State
      • Rank Assignment
      • Rank Filter
      • Initialize
      • Abort
      • Finalize
      • Health Check
      • Exceptions
    • Examples
  • Async Checkpointing
  • Local Checkpointing
  • Straggler Detection
  • Shared Utilities
nvidia-resiliency-ext
  • Inprocess Restart
  • API documentation
  • View page source

API documentation

API documentation

  • Wrapper
  • Compose
  • State
  • Rank Assignment
    • Rank Assignment
      • Base class
      • Tree
      • Composable Rank Assignments
    • Rank Filtering
      • Base class
      • Rank Filters
  • Rank Filter
  • Initialize
  • Abort
  • Finalize
    • Finalize
      • Finalize.__call__()
    • ThreadedFinalize
  • Health Check
    • Enhanced Health Check Features
  • Exceptions
    • HealthCheckError
    • InternalError
    • RestartAbort
    • RestartError
    • TimeoutError
Previous Next

© Copyright 2024, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.