nvidia-resiliency-ext

Documentation contents:

  • Fault Tolerance
  • Inprocess Restart
  • Async Checkpointing
  • Local Checkpointing
  • Straggler Detection
    • Usage guide
    • API documentation
      • Straggler
      • Reporting
      • Statistics
      • Callback
    • Examples
nvidia-resiliency-ext
  • Straggler Detection
  • API documentation
  • View page source

API documentation

API documentation

  • Straggler
  • Reporting
  • Statistics
  • Callback
    • StragglerDetectionCallback
      • StragglerDetectionCallback.on_train_batch_end()
      • StragglerDetectionCallback.setup()
      • StragglerDetectionCallback.teardown()
Previous Next

© Copyright 2024, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.