nvidia-resiliency-ext

Documentation contents:

  • Fault Tolerance
  • Inprocess Restart
  • Async Checkpointing
  • Local Checkpointing
  • Straggler Detection
    • Usage guide
    • API documentation
      • Straggler
      • Reporting
      • Statistics
      • Callback
    • Examples
nvidia-resiliency-ext
  • Straggler Detection
  • API documentation
  • View page source

API documentation

API documentation

  • Straggler
    • CallableId
      • CallableId.obj
      • CallableId.name
      • CallableId.__str__()
    • CustomSection
    • Detector
      • Detector.initialized
      • Detector.scores_to_compute
      • Detector.gather_on_rank0
      • Detector.profiling_interval
      • Detector.report_time_interval
      • Detector.custom_sections
      • Detector.cupti_manager
      • Detector.reporter
      • Detector.report_interval_tracker
      • Detector.original_callables
      • Detector.detection_section()
      • Detector.generate_report()
      • Detector.generate_report_if_interval_elapsed()
      • Detector.initialize()
      • Detector.is_interval_elapsed()
      • Detector.restore_original_callables()
      • Detector.shutdown()
      • Detector.wrap_callables()
  • Reporting
    • Report
      • Report.identify_stragglers()
    • ReportGenerator
      • ReportGenerator.generate_report()
    • StragglerId
  • Statistics
    • Statistic
  • Callback
Previous Next

© Copyright 2024, NVIDIA Corporation.

Built with Sphinx using a theme provided by Read the Docs.