NVTX Range Wrapper Example

The NVTX range wrapper helps annotate code with NVTX ranges to profile code with Nsight Systems. It follows the same singleton pattern as the stopwatch (see Stopwatch Example): enable once, then push/pop ranges from any part of your code.

Compared to using nvtx ranges directly, it offers the following advantages:
  • It is possible to centrally configure whether CUDA synchronization is performed when pushing/popping a range. While the ranges themselves have already minimal overhead if there is no profiling, the synchronization adds an additional overhead, and otherwise needs to be handled manually.

  • It is possible to check for range push/pop mismatches (only use for debugging purposes and leave disabled otherwise, as it has an overhead). It can be very handy to check for unexpected range pops for some key ranges, as manual search for mismatches can be tedious for large codebases.

See also

The code of this example can be found in the repository under packages/optim_test_tools/examples/nvtx_range_wrapper_example.py.

Overview

  • Singleton: NVTXRangeWrapper() returns the global instance.

  • Call enable(...) once to activate; otherwise calls are no-ops with minimal overhead.

  • Push/pop named ranges; optionally verify that the popped name matches expectations.

Example

Please see the notes in the code for more details.

packages/optim_test_tools/examples/nvtx_range_wrapper_example.py
  1# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  2#
  3# Licensed under the Apache License, Version 2.0 (the "License");
  4# you may not use this file except in compliance with the License.
  5# You may obtain a copy of the License at
  6#
  7#     http://www.apache.org/licenses/LICENSE-2.0
  8#
  9# Unless required by applicable law or agreed to in writing, software
 10# distributed under the License is distributed on an "AS IS" BASIS,
 11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12# See the License for the specific language governing permissions and
 13# limitations under the License.
 14
 15import time
 16import torch
 17from accvlab.optim_test_tools import NVTXRangeWrapper
 18
 19# @NOTE
 20# In this example, the individual sections (such as Main Script, Code Part I, Code Part II, etc.)
 21# indicate parts of the code which, in the actual use case, would be in different files, and would rely
 22# on the `NVTXRangeWrapper` being a singleton to enable its use throughout the files as shown here.
 23
 24
 25# --------------------------- Main Script ---------------------------
 26
 27nvtx_wrp = NVTXRangeWrapper()
 28# @NOTE
 29# To activate the NVTX wrapper, it needs to be enabled.
 30# If the following step is omitted, pushing and popping ranges will have no effect.
 31# Try commenting out the following call to `enable()`.
 32#
 33# Also, note that the `keep_track_of_range_order` parameter should be set to `False` during actual profiling,
 34# as it has an overhead and should only be enabled for debugging purposes.
 35nvtx_wrp.enable(
 36    sync_on_push=True,
 37    sync_on_pop=True,
 38    keep_track_of_range_order=True,  # Only set to `True` for debugging purposes (adds overhead)
 39)
 40# @NOTE
 41# Note that if the wrapper is not enabled, calling its methods has minimal overhead
 42# (call to an empty method).
 43
 44# -------------------------------------------------------------------
 45
 46
 47# @NOTE
 48# If a code part (see below) is used in isolation (meaning that there is no other code which already
 49# enabled the NVTX wrapper), the wrapper will be disabled and any related calls will have no effect.
 50# The overhead for the wrapper is minimal in this case (call to an empty function).
 51
 52num_iters = 16
 53
 54# "Initialize" the GPU
 55torch.cuda.synchronize()
 56
 57for i in range(num_iters):
 58    # --------------------------- Code Part I ---------------------------
 59
 60    # @NOTE
 61    # This will not create a new instance, but re-use the instance created above.
 62    nvtx_wrp = NVTXRangeWrapper()
 63    nvtx_wrp.range_push("meas1")
 64    time.sleep(0.02)
 65    # ... continue and at some point call code part II
 66
 67    # -------------------------------------------------------------------
 68
 69    # --------------------------- Code Part II --------------------------
 70
 71    nvtx_wrp = NVTXRangeWrapper()
 72    nvtx_wrp.range_push("meas2")
 73    time.sleep(0.05)
 74    nvtx_wrp.range_pop()
 75    # @NOTE
 76    # If the "unexpected range" range is pushed but not popped, then
 77    # `nvtx_wrp.range_pop("meas1")` (see below) will trigger an error.
 78    # Try uncommenting the "unexpected range" push below to see this.
 79
 80    # >> nvtx_wrp.range_push("unexpected range")
 81
 82    # ... continue and at some point call code part III
 83
 84    # -------------------------------------------------------------------
 85
 86    # -------------------------- Code Part III --------------------------
 87
 88    # @NOTE
 89    # Here we want to check whether the range that we are popping is the expected one.
 90    # This can be done by specifying the expected range name when popping.
 91    # This will trigger an error if we pushed an "unexpected range" in Code Part II.
 92    #
 93    # Note that keeping track of the range stack internallyadds overhead and should only be enabled for
 94    # debugging purposes, not when actually performing the profiling (can be configured using the
 95    # `keep_track_of_range_order` parameter when calling `enable()` for the wrapper). If you set this
 96    # parameter to `False`, the mismatch check will be skipped (and no error will be raised even if the
 97    # "unexpected range" is pushed).
 98    nvtx_wrp.range_pop("meas1")
 99    # ... continue and at some point call code part IV
100
101    # -------------------------------------------------------------------
102
103    # -------------------------- Code Part IV --------------------------
104
105    nvtx_wrp = NVTXRangeWrapper()
106    if i % 3 == 2:
107        nvtx_wrp.range_push("meas3")
108        time.sleep(0.01)
109        nvtx_wrp.range_pop()
110    nvtx_wrp.range_push("meas2")
111    time.sleep(0.01)
112    nvtx_wrp.range_pop()
113
114    # -------------------------------------------------------------------
115
116print("Script ran without errors")