API Reference

Complete API documentation for the optim_test_tools package.

class accvlab.optim_test_tools.Stopwatch(*args, **kwargs)[source]

Bases: SingletonBase

Stopwatch for performing runtime evaluations.

This is a singleton class for performing runtime measurements and obtaining the total as well as average run-time for the measurements

  1. per measurement.

  2. per (training) iteration, where a single measurement can be performed multiple times, or only in some iterations.

A warm-up phase can be defined when configuring the stopwatch. During the warm-up phase, no measurements are performed.

Multiple measurements can be performed, and are distinguished by name. The end of a (training) iteration is indicated explicitly (by calling the method finish_iter()). This is done to automatically start measurements after the warm-up phase is finished, to average the measurements over the iterations, and to print the measurements in certain intervals.

The CPU usage can be measured for one “type” of measurement (i.e. one measurement name). This is done by calling the set_cpu_usage_meas_name() before the first measurement with the corresponding name is started. The CPU usage is then measured whenever the measurement is running and the average CPU usage is printed together with the other measurements.

One-time measurements can be performed at any point in the code (see start_one_time_measurement(), end_one_time_measurement()). They are not affected by the warm-up phase and are reported as such (i.e. in own section and without averages etc.). Each one-time measurement (i.e. measurement with a given name) can be performed only once.

Warning

The CPU usage is measured using psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the function psutil.cpu_percent() must not be called outside of the stopwatch during the measurement.

The stopwatch must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. To ensure this, the methods are empty in disabled state and replaced with methods implementing the functionality when enabled. This means that the runtime overhead for using the stopwatch is negligible when it is not enabled. The enabling can be done from any part of the code (as this is a singleton).

Note

When obtaining an object using (Stopwatch()) the singleton is returned if already created.

If parameters are provided when calling Stopwatch(), this will enable the stopwatch (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:
  • num_warmup_iters – The number of warmup iterations to be performed before the runtime measurement is started.

  • print_every_n_iters – Once in how many iterations to print the measured runtime. If None, the runtime is not printed automatically (but can still be printed manually by calling print_eval_times()).

  • do_cuda_sync – Whether to synchronize the CUDA device every time a measurement is started or stopped.

enable(num_warmup_iters, print_every_n_iters, do_cuda_sync)[source]

Enable the stopwatch

This method can be called only once and enables the Stopwatch singleton. Any measurements started or performed before calling this method are ignored.

Parameters:
  • num_warmup_iters (int) – The number of warmup iterations to be performed before the runtime measurement is started

  • print_every_n_iters (Optional[int]) – Once in how many iterations to print the measured runtime. If None, the runtime is not printed automatically (but can be still printed manually by calling print_eval_times()).

  • do_cuda_sync (bool) – Whether to synchronize CUDA device every time a measurement is started or stopped.

property is_enabled: bool

Whether the stopwatch is enabled

print_eval_times()[source]

Print the evaluation times

set_cpu_usage_meas_name(name)[source]

Set the name of the CPU usage measurement

This method must be called before the first measurement with the corresponding name is started.

Important

If the CPU usage measurement is already set, it cannot be changed. Calling this method with a different name will raise an error.

Calling this method with the same name as the current CPU usage measurement name is allowed and will have no effect. This is useful to set the CPU usage name right before starting the first measurement with the corresponding name, even if the corresponding code region is called iteratively.

Warning

The CPU usage is measured using psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the function psutil.cpu_percent() must not be called outside of the stopwatch during the measurement.

Parameters:

name (str) – Name of the measurement

start_meas(name)[source]

Start a measurement with given name.

Parameters:

name (str) – Name for the measurement

end_meas(name)[source]

End a measurement with given name.

Parameters:

name (str) – Name of the measurements

start_one_time_measurement(name)[source]

Start a one-time measurement with given name.

Parameters:

name (str) – Name of the measurement

end_one_time_measurement(name)[source]

End a one-time measurement with given name.

Parameters:

name (str) – Name of the measurement

finish_iter()[source]

Finish the current iteration.

get_num_nonwarmup_iters_measured()[source]

Get the number of non-warmup iterations performed.

Returns:

int – Number of measured non-warmup iterations

class accvlab.optim_test_tools.NVTXRangeWrapper(*args, **kwargs)[source]

Bases: SingletonBase

Wrapper for NVTX ranges.

This is a singleton class which allows for enabling the use of NVTX ranges and configuring how the ranges are used from any part of the implementation.

The wrapper must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. Enabling can be done from any part of the code (as this is a singleton).

Compared to using the NVTX range push/pop functionality directly, it offers the following advantages:

  • It is possible to easily configure whether CUDA synchronization is performed when pushing/popping a range. The synchronization is part of the push/pop methods and so can be turned on and off without changes to the code where the ranges are used, and is not performed if not needed.

  • If not enabled, calls to push/pop have minimal overhead (call to an empty function). Note that while the pushing/popping of ranges itself also has negligible overhead using NVTX directly, profiling-related CUDA synchronizations need to be handled manually in this case.

  • Range mismatch checks: The wrapper allows for checks whether the popped range corresponds to the range that is expected to be popped. This functionality can be turned on or off as part of the configuration when enabling the wrapper. This functionality has an overhead, and so should be only enabled for debugging purposes, and be turned off when actual profiling is performed.

Note

When obtaining an object using (NVTXRangeWrapper()) the singleton is returned if already created.

If parameters are provided when calling NVTXRangeWrapper(), this will enable the NVTX range wrapper (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:
  • sync_on_push – Whether to synchronize the CUDA device every time before pushing a range

  • sync_on_pop – Whether to synchronize the CUDA device every time before popping a range

  • keep_track_of_range_order – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set to True. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.

enable(sync_on_push, sync_on_pop, keep_track_of_range_order)[source]

Enable the NVTX range wrapper.

This method can be called only once and enables the NVTXRangeWrapper singleton. Any use of the singleton before enabling it is ignored.

Parameters:
  • sync_on_push (bool) – Whether to synchronize the CUDA device every time before pushing a range

  • sync_on_pop (bool) – Whether to synchronize the CUDA device every time before popping a range

  • keep_track_of_range_order (bool) – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set to True. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.

property is_enabled: bool

Whether the NVTXRangeWrapper is enabled

range_push(range_name)[source]

Push a NVTX range

Parameters:

range_name (str) – Range name

range_pop(range_name=None)[source]

Pop a NVTX range and optionally check if the popped range is the expected range to be popped.

Note that the check is performed only if configured to be used when calling enable().

Parameters:

range_name (Optional[str], default: None) – Range name. If set, will be used to check whether the popped range name corresponds to the given name and raise an assertion error if not.

class accvlab.optim_test_tools.TensorDumper(*args, **kwargs)[source]

Bases: SingletonBase

Singleton class for dumping tensor & gradient data to a directory and comparing to previously dumped data.

This class provides a way to dump tensor data to a directory in a structured format.

The dumper is able to dump tensors, gradients, RaggedBatch objects, as well as data with user-defined & auto-applied converters. Furthermore, it supports custom processing prior to dumping (e.g. converting of bounding boxes to images containing the bounding boxes), which is performed only if the dumper is enabled, and does not incur overhead if the dumper is not enabled.

Main JSON files are created for each dump (one for the data and one for the gradients). The individual tensors (or converted data) can be stored inside the main JSON file, or in separate binary/image files (can be configured, and can vary for individual data entries). In case of the binary/image files, the main JSON file contains a reference to the file, and the file is stored in the same directory as the main JSON file.

The dumper can also be used to compare to previously dumped data, to detect mismatches. This can be useful for debugging e.g. to rerun the same code multiple times, while always comparing to the same dumped data. This can be use used when modifying (e.g. optimizing) the implementation, or to check for determinism.

Important

The dumper is a singleton, so that it can be used in different source files without having to pass the instance around.

Note

The comparison is only supported if all data is dumped in the Type.JSON format. This can be enforced by calling set_dump_type_for_all() before dumping/comparing the data (so easy switching between dumping for manual inspection and comparison is possible).

Note

When in the disabled state, all dumping-related methods (dump, add data, compare to dumped data etc) are empty methods, which means they have no effect and minimal overhead.

Note

When obtaining an object using (TensorDumper()) the singleton is returned if already created.

If parameters are provided when calling TensorDumper(), this will enable the dumper (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:

dump_dir – The directory to dump the data to. If provided, the dumper will be enabled automatically. If not provided, the dumper will be disabled and can be enabled later by calling enable().

class Type(value)[source]

Bases: Enum

Dump format types.

The format type determines how tensor data is serialized when dumped.

Note

For binary types except PICKLE (i.e. BINARY, IMAGE_RGB, IMAGE_BGR, IMAGE_I), entries are added to the main JSON file indicating the filenames of the stored data. Also, files containing meta-data are created and stored in the same directory. For BINARY, the meta-data is the shape and dtype of the tensor. For IMAGE_*, the meta-data is the original range of the image data (min and max value) and the image format (RGB, BGR, Intensity). For PICKLE, no meta-data is written as the pickled object is self-contained.

Note

For BINARY, IMAGE_* and PICKLE formats, entries are added to the main JSON file indicating the filenames of the stored data. The filenames for these cases are:

  • blob/image data: [<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>

  • meta-data (if applicable): [<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>.meta.json

Note

For images containing multiple channels, the color channel is assumed to be the last dimension. If this is not the case, permutation of the axes needs to be applied to move the color channel to the last dimension. The permutation can be applied using the permute_axes parameter, e.g. of add_tensor_data().

If a tensor contains more than the necessary number of dimensions (3 for color images, 2 for grayscale images), the leading dimensions are treated as iterating over the images, and multiple images are dumped (with the indices of the leading dimensions indicated in the filename).

JSON = 0

Tensor data is serialized into the JSON file as nested lists. Suitable for small tensors and provides human-readable output.

BINARY = 1

Tensor data saved as binary files with metadata in separate JSON files. Efficient for large tensors; preserves exact numerical precision.

IMAGE_RGB = 2

Tensor data converted to PNG image format (RGB, 3 channels). Channel must be the last dimension; permute axes if necessary.

IMAGE_BGR = 3

Tensor data converted to PNG image format (BGR, 3 channels). Channel must be the last dimension; permute axes if necessary.

IMAGE_I = 4

Tensor data converted to PNG image format (grayscale). Single channel; no explicit channel dimension.

PICKLE = 5

Tensor data saved as pickle files.

classmethod is_image(dump_type)[source]
Return type:

bool

enable(dump_dir)[source]

Enable the TensorDumper singleton.

This method can be called only once and enables the TensorDumper singleton. Any use of the singleton before enabling it is ignored.

Parameters:

dump_dir (str) – The directory to dump the data to.

push_range(range_name)[source]

Push a range to the range stack.

Multiple ranges can be pushed and popped in a nested manner. The ranges will be prepended to the dump path (see e.g. the path argument in add_tensor_data()) in the order in which they were pushed.

The ranges can be used to conveniently disambiguate the names of data entries where the same name is used in multiple contexts (e.g. multiple iterations of a loop, function called from multiple places, etc.).

Important

To ensure that the formatting is performed only if the tensor dumper is enabled, the range should not be formatted when passing the argument. Instead, the formatting happens inside the method, and the additional arguments (args) are used to format the range name.

Parameters:
  • range_name (Union[str, Callable[[], str]]) – The name of the range to push.

  • *args – Additional arguments to format the range name. If not provided, the range name is used as is.

pop_range()[source]

Pop the last range from the range stack.

Parameters:

range_name – The name of the range to pop.

set_dump_is_compare(eps_numerical_data=1e-06, num_errors_per_tensor_to_show=1, allow_missing_data_in_current=False, allow_missing_data_in_previous=False, as_warning=False)[source]

Automatically replace calls to dump() with calls to compare_to_dumped_data().

Note

The parameters defined in this method will be forwarded to compare_to_dumped_data(). Note that compare_if_empty is not passed here. Instead, the value passed to dump() is used.

See also

Please see the documentation of compare_to_dumped_data() for more details.

Parameters:
  • eps_numerical_data (float, default: 1e-06) – The numerical tolerance for the comparison of numerical data.

  • num_errors_per_tensor_to_show (int, default: 1) – The number of most significant errors to show per tensor.

  • allow_missing_data_in_current (bool, default: False) – If True, the comparison will not raise an error if the current data is missing some keys which are present in the reference data.

  • allow_missing_data_in_previous (bool, default: False) – If True, the comparison will not raise an error if the reference data is missing some keys which are present in the current data.

  • as_warning (bool, default: False) – If True, no error is raised in case of a mismatch and instead, a warning is printed. If False, an error is raised.

Return type:

bool

property is_enabled: bool

Whether the TensorDumper is enabled

add_tensor_data(path, data, dump_type, dump_type_override=None, permute_axes=None, permute_axes_override=None, exclude=None)[source]

Add tensor data to the dump.

The data is formatted and inserted into the dump structure.

Parameters:
  • path (str) – Path where the data will be inserted. If the path does not exist, it will be created. If data is a dictionary, the path may be already present in the structure, but the direct children of data need to be non-existent in the element the path points to. If data is not a dictionary, the path must not be present in the structure and the data will be inserted at the path.

  • data (Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]], Callable[[], Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]]]]]) – The tensor data to add

  • dump_type (Type) – The type of dump to use

  • dump_type_override (Optional[Dict[str, Type]], default: None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.

  • permute_axes (Optional[Sequence[int]], default: None) – Permutation of axes to apply to the tensor data. If None, no permutation is applied.

  • permute_axes_override (Optional[Dict[str, Optional[Sequence[int]]]], default: None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.

  • exclude (Optional[Sequence[str]], default: None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.

add_grad_data(path, data, dump_type, dump_type_override=None, permute_grad_axes=None, permute_grad_axes_override=None, exclude=None)[source]

Add gradient data of the given tensor(s) to dump.

Note that if this method is called, set_gradients() must be called before dumping the next time.

The gradients are computed using torch.autograd.grad(), and do not influence the gradients as computed/used elsewhere in the code (e.g. in the training loop).

Note that tensors which do not require gradients or which are not part of the computation graph can be included in the dump, but no actual gradients will be computed for them. Instead, a note will be written to the json dump in case that requires_grad is False. If the tensor is not part of the computation graph, the written gradient will be null, and no image/binary file will be written for that tensor regardless of the dump_type setting.

Parameters:
  • path (str) – Path where the gradient data will be inserted. See add_tensor_data() for more details.

  • data (Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]], Callable[[], Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]]]]]) – The tensor data for which to dump the gradients.

  • dump_type (Type) – The type of dump to use

  • dump_type_override (Optional[Dict[str, Type]], default: None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.

  • permute_grad_axes (Optional[Sequence[int]], default: None) – Permutation of axes to apply to the gradient data. If None, no permutation is applied.

  • permute_grad_axes_override (Optional[Dict[str, Optional[Sequence[int]]]], default: None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.

  • exclude (Optional[Sequence[str]], default: None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.

set_dump_type_for_all(dump_type, include_tensors=True, include_grads=True)[source]

Set the dump type for all tensors and gradients.

This method is e.g. useful to quickly change the dump type to Type.JSON to generate reference data for comparison (using compare_to_dumped_data()) without the need to go through the code and change the dump type for each tensor manually.

Important

This method can sets the dumping type for the data which is already added. The dump type of data which is added after this method is called will not be affected.

Parameters:
  • dump_type (Type) – The type of dump to use

  • include_tensors (bool, default: True) – Whether to include tensors in the dump

  • include_grads (bool, default: True) – Whether to include gradients in the dump

dump(dump_if_empty=True)[source]

Dump the data to the dump directory.

Note

Setting dump_if_empty to False is useful to not count iterations where no data is dumped as a separate iteration.

Parameters:

dump_if_empty (bool, default: True) – If True, the data will be dumped even if it is empty. If False, the data will not be dumped if it is empty, and the dump count will not be incremented.

compare_to_dumped_data(eps_numerical_data=1e-06, num_errors_per_tensor_to_show=1, allow_missing_data_in_current=False, allow_missing_data_in_previous=False, as_warning=False, compare_if_empty=True)[source]

Compare the data to previously dumped data.

In case of a mismatch, a ValueError is raised with a detailed error message.

Important

Only comparisons to data stored in the JSON format (Type.JSON) are supported. Therefore, the reference data must be stored with the Type.JSON both when generating the reference data and when comparing to it.

An easy way to ensure that the reference data is stored in the JSON format without modifying multiple places in the code is to call set_dump_type_for_all() when generating the reference data.

Important

The compare_if_empty parameter needs to be consistent with the dump_if_empty parameter of dump() of the calls to dump() which are used to dump the reference data to compare to.

Note

The comparison can be set to allow missing keys in the current and/or reference data by setting allow_missing_data_in_current and/or allow_missing_data_in_previous to True. This is e.g. useful if the current data is based on an implementation in progress, so that some of the data is not yet available, or if the current run produces additional data which is not needed for the comparison (and not present in the reference).

Parameters:
  • eps_numerical_data (float, default: 1e-06) – The numerical tolerance for the comparison of numerical data.

  • num_errors_per_tensor_to_show (int, default: 1) – The number of most significant errors to show per tensor.

  • allow_missing_data_in_current (bool, default: False) – If True, the comparison will not raise an error if the current data is missing some keys which are present in the reference data.

  • allow_missing_data_in_previous (bool, default: False) – If True, the comparison will not raise an error if the reference data is missing some keys which are present in the current data.

  • as_warning (bool, default: False) – If True, no error is raised in case of a mismatch and instead, a warning is printed. If False, an error is raised.

  • compare_if_empty (bool, default: True) – If True, the comparison will be performed even if the current data is empty. If False, the comparison will not be performed if the current data is empty, and the dump count will not be incremented.

set_gradients(function_values)[source]

Set gradients for the tensors in the dump.

The gradients are computed using torch.autograd.grad(), and do not influence the gradients computed elsewhere (e.g. in the training loop).

This method must be called before dumping if add_grad_data() was called since the last dump.

Parameters:

function_values (Union[Tensor, List[Tensor]]) – The value(s) of the function(s) to compute the gradients for. This can be a single tensor or a list of tensors.

reset_dump_count()[source]

Reset the dump count.

This method can be used to reset the dump count to 0. This is useful for debugging (e.g. when comparing to previously dumped data) to start from the first dump.

See also

This method is equivalent to calling set_dump_count() with a value of 0. Please see the documentation of set_dump_count() for more details.

set_dump_count(count)[source]

Set the dump count.

This method can be used to set the dump count to a specific value. This is useful for debugging (e.g. when comparing to previously dumped data) to jump to a specific iteration.

Note

If any actions are registered to be performed after a given number of dumps, they will be triggered if the count corresponds to the number of dumps set.

Parameters:

count (int) – The dump count to set.

perform_after_dump_count(count, action)[source]

Register an action to be performed after a given number of dumps.

The action will be performed after the dump is completed.

This can e.g. be used to automatically exit the program after a given number of iterations have been dumped (by passing the exit()-function as the action).

Important

The action is performed after the dump count reaches the given count value. If set_dump_count() is called, the dump count is adjusted to a given value, and this also influences when the action is performed. For example, if set_dump_count() is called with a value of 3 and an action is registered to be performed after 5 dumps, the action will be performed after another 2 dumps.

Important

This method can be called multiple times with the same count. In this case, the action will be overwritten.

Note that as in case of other methods, this method has no effect if the TensorDumper is not enabled.

Parameters:
  • count (int) – The number of dumps after which the action should be performed.

  • action (Callable[[], None]) – The action to perform.

register_custom_converter(data_type, converter_func)[source]

Register a custom converter for a given data type.

This method can be used to register a custom converter function for a given data type. The converter function must take a single argument of type data_type and return one of the following, or a nested list/dict structure containing elements of the following types:

  • either a JSON-serializable object,

  • or a tensor,

  • or a numpy array

  • or an object for which a custom converter is registered

The conversion is performed iteratively, so that chains of conversions can be followed through.

The conversion is performed before any other processing steps. This means that if the converter returns tensors, these are handled in the same way as tensors which are directly added to the dumper.

Note

This is useful when the data to dump in not JSON-serializable by default. This may e.g. be the case for custom data types which are used in the training.

Parameters:
  • data_type (type) – The type of the data to convert.

  • converter_func (Callable) – The function to use for converting the data.

enable_ragged_batch_dumping(as_per_sample=False)[source]

Enable dumping of RaggedBatch data.

Note

It is possible to dump some RaggedBatch data as per sample, and some as a RaggedBatch structure. This can be achieved by calling this method multiple times with different values for as_per_sample, before adding the data which should be dumped with the desired format.

Parameters:

as_per_sample (bool, default: False) – If True, the RaggedBatch data is dumped as per sample. Otherwise, it is dumped as a RaggedBatch structure.

run_if_enabled(func)[source]

Run a function if the TensorDumper is enabled.

This method can be used to run a function only if the TensorDumper is enabled. This is useful to avoid running code which is only relevant for debugging.

The typical use-case for this method is the dumping of data which needs to be pre-processed first (e.g. drawing of bounding boxes into an image). This is done as follows:

  • Encapsulate the pre-processing logic in a function (inside the function which uses the dumper). Note that this means that func will enclose the data accessible in that function and therefore does not need to have any arguments. The function func should

    • Perform any debugging-related pre-processing needed

    • Add the pre-processed data to the dump (e.g. using add_tensor_data())

  • Call run_if_enabled() with the function func as its argument. This will ensure that the pre-processing is only performed if the dumper is enabled. Otherwise, the pre-processing is omitted, and there is no overhead (apart from calling an empty function).

Parameters:

func (Callable[[], None]) – The function to run. The function must take no arguments.

accvlab.optim_test_tools.numba_nvtx.register_string(name)[source]

Register a string with NVTX once and return an integer handle.

Returns 0 if profiler is not attached (the handle is still safe to pass to range_push(), which treats 0 as a no-op).

Return type:

int

accvlab.optim_test_tools.numba_nvtx.range_push(handle)[source]

Push an NVTX range using a previously-registered handle.

This function can be called from within Numba @njit functions.

Return type:

None

accvlab.optim_test_tools.numba_nvtx.range_pop()[source]

Pop an NVTX range.

This function can be called from within Numba @njit functions.

Return type:

None