memory_monitor

GPU Memory Monitoring Utilities.

This module provides utilities for monitoring GPU memory usage in real-time using NVIDIA Management Library (NVML). It includes a GPUMemoryMonitor class that tracks peak memory usage across all available GPUs and provides functionality to start/stop monitoring in a separate thread.

Classes:: GPUMemoryMonitor: A class that monitors GPU memory usage and tracks peak memory consumption.
Functions:: launch_memory_monitor: Helper function to create and start a GPU memory monitor instance.

Example

>>> monitor = launch_memory_monitor(monitor_interval=1.0)
>>> # Run your GPU operations
>>> monitor.stop()  # Will print peak memory usage per GPU

Note

This module requires the NVIDIA Management Library (NVML) through the pynvml package. It automatically initializes NVML when creating a monitor instance and shuts it down when monitoring is stopped.

Dependencies:

pynvml: For accessing NVIDIA GPU metrics
threading: For running the monitor in a background thread
atexit: For ensuring proper cleanup when the program exits

Classes

GPUMemoryMonitor

GPU Memory Monitor for tracking NVIDIA GPU memory usage.

Functions

launch_memory_monitor

Launch a GPU memory monitor in a separate thread.

class GPUMemoryMonitor

Bases: object

GPU Memory Monitor for tracking NVIDIA GPU memory usage.

This class provides functionality to monitor and track peak memory usage across all available NVIDIA GPUs on the system. It runs in a separate thread and periodically samples memory usage.

__init__(monitor_interval=10.0)

Initialize a NVIDIA GPU memory monitor.

This class monitors the memory usage of NVIDIA GPUs at specified intervals. It initializes NVIDIA Management Library (NVML) and gets the count of available GPUs.

Parameters:: monitor_interval (float, optional) – Time interval in seconds between memory usage checks. Defaults to 10.0.

monitor_interval

Time interval between memory checks.

Type:: float

peak_memory

Dictionary mapping GPU indices to their peak memory usage.

Type:: dict

is_running

Flag indicating if the monitor is currently running.

Type:: bool

monitor_thread: Thread object for memory monitoring.

device_count

Number of NVIDIA GPUs available in the system.

Type:: int

Raises:: NVMLError – If NVIDIA Management Library initialization fails.
Parameters:: monitor_interval (float) –

start()

Start the GPU memory monitoring in a separate daemon thread.

This method initializes and starts a daemon thread that continuously monitors GPU memory usage at the specified interval. The thread will run until stop() is called or the program exits.

stop()

Stop the GPU memory monitoring and display peak memory usage.

This method stops the monitoring thread, prints the peak memory usage for each GPU that was monitored, and properly shuts down the NVML interface. It will wait for the monitoring thread to complete before returning.

The peak memory usage is displayed in GB for each GPU index.

launch_memory_monitor(monitor_interval=1.0)

Launch a GPU memory monitor in a separate thread.

Parameters:: monitor_interval (float) – Time interval between memory checks in seconds
Returns:: The monitor instance that was launched
Return type:: GPUMemoryMonitor