API Reference

Complete API documentation for the optim_test_tools package.

class accvlab.optim_test_tools.Stopwatch(*args, **kwargs)[source]

Bases: SingletonBase

Stopwatch for performing runtime evaluations.

This is a singleton class for performing runtime measurements and obtaining the total as well as average run-time for the measurements

per measurement.

per (training) iteration, where a single measurement can be performed multiple times, or only in some iterations.

A warm-up phase can be defined when configuring the stopwatch. During the warm-up phase, no measurements are performed.

Multiple measurements can be performed, and are distinguished by name. The end of a (training) iteration is indicated explicitly (by calling the method finish_iter()). This is done to automatically start measurements after the warm-up phase is finished, to average the measurements over the iterations, and to print the measurements in certain intervals.

The CPU usage can be measured for one “type” of measurement (i.e. one measurement name). This is done by calling the set_cpu_usage_meas_name() before the first measurement with the corresponding name is started. The CPU usage is then measured whenever the measurement is running and the average CPU usage is printed together with the other measurements.

One-time measurements can be performed at any point in the code (see start_one_time_measurement(), end_one_time_measurement()). They are not affected by the warm-up phase and are reported as such (i.e. in own section and without averages etc.). Each one-time measurement (i.e. measurement with a given name) can be performed only once.

Warning

The CPU usage is measured using psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the function psutil.cpu_percent() must not be called outside of the stopwatch during the measurement.

The stopwatch must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. To ensure this, the methods are empty in disabled state and replaced with methods implementing the functionality when enabled. This means that the runtime overhead for using the stopwatch is negligible when it is not enabled. The enabling can be done from any part of the code (as this is a singleton).

Note

When obtaining an object using (Stopwatch()) the singleton is returned if already created.

If parameters are provided when calling Stopwatch(), this will enable the stopwatch (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:

num_warmup_iters – The number of warmup iterations to be performed before the runtime measurement is started.
print_every_n_iters – Once in how many iterations to print the measured runtime. If None, the runtime is not printed automatically (but can still be printed manually by calling print_eval_times()).
do_cuda_sync – Whether to synchronize the CUDA device every time a measurement is started or stopped.

enable(num_warmup_iters, print_every_n_iters, do_cuda_sync)[source]

Enable the stopwatch

This method can be called only once and enables the Stopwatch singleton. Any measurements started or performed before calling this method are ignored.

Parameters:

num_warmup_iters (int) – The number of warmup iterations to be performed before the runtime measurement is started
print_every_n_iters (Optional[int]) – Once in how many iterations to print the measured runtime. If None, the runtime is not printed automatically (but can be still printed manually by calling print_eval_times()).
do_cuda_sync (bool) – Whether to synchronize CUDA device every time a measurement is started or stopped.

property is_enabled: bool: Whether the stopwatch is enabled

print_eval_times()[source]: Print the evaluation times

set_cpu_usage_meas_name(name)[source]

Set the name of the CPU usage measurement

This method must be called before the first measurement with the corresponding name is started.

Important

If the CPU usage measurement is already set, it cannot be changed. Calling this method with a different name will raise an error.

Calling this method with the same name as the current CPU usage measurement name is allowed and will have no effect. This is useful to set the CPU usage name right before starting the first measurement with the corresponding name, even if the corresponding code region is called iteratively.

Warning

The CPU usage is measured using psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the function psutil.cpu_percent() must not be called outside of the stopwatch during the measurement.

Parameters:: name (str) – Name of the measurement

start_meas(name)[source]

Start a measurement with given name.

Parameters:: name (str) – Name for the measurement

end_meas(name)[source]

End a measurement with given name.

Parameters:: name (str) – Name of the measurements

start_one_time_measurement(name)[source]

Start a one-time measurement with given name.

Parameters:: name (str) – Name of the measurement

end_one_time_measurement(name)[source]

End a one-time measurement with given name.

Parameters:: name (str) – Name of the measurement

finish_iter()[source]: Finish the current iteration.

get_num_nonwarmup_iters_measured()[source]

Get the number of non-warmup iterations performed.

Returns:: int – Number of measured non-warmup iterations

class accvlab.optim_test_tools.NVTXRangeWrapper(*args, **kwargs)[source]

Bases: SingletonBase

Wrapper for NVTX ranges.

This is a singleton class which allows for enabling the use of NVTX ranges and configuring how the ranges are used from any part of the implementation.

The wrapper must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. Enabling can be done from any part of the code (as this is a singleton).

Compared to using the NVTX range push/pop functionality directly, it offers the following advantages:

It is possible to easily configure whether CUDA synchronization is performed when pushing/popping a range. The synchronization is part of the push/pop methods and so can be turned on and off without changes to the code where the ranges are used, and is not performed if not needed.
If not enabled, calls to push/pop have minimal overhead (call to an empty function). Note that while the pushing/popping of ranges itself also has negligible overhead using NVTX directly, profiling-related CUDA synchronizations need to be handled manually in this case.
Range mismatch checks: The wrapper allows for checks whether the popped range corresponds to the range that is expected to be popped. This functionality can be turned on or off as part of the configuration when enabling the wrapper. This functionality has an overhead, and so should be only enabled for debugging purposes, and be turned off when actual profiling is performed.

Note

When obtaining an object using (NVTXRangeWrapper()) the singleton is returned if already created.

If parameters are provided when calling NVTXRangeWrapper(), this will enable the NVTX range wrapper (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:

sync_on_push – Whether to synchronize the CUDA device every time before pushing a range
sync_on_pop – Whether to synchronize the CUDA device every time before popping a range
keep_track_of_range_order – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set to True. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.

enable(sync_on_push, sync_on_pop, keep_track_of_range_order)[source]

Enable the NVTX range wrapper.

This method can be called only once and enables the NVTXRangeWrapper singleton. Any use of the singleton before enabling it is ignored.

Parameters:

sync_on_push (bool) – Whether to synchronize the CUDA device every time before pushing a range
sync_on_pop (bool) – Whether to synchronize the CUDA device every time before popping a range
keep_track_of_range_order (bool) – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set to True. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.

property is_enabled: bool: Whether the NVTXRangeWrapper is enabled

range_push(range_name)[source]

Push a NVTX range

Parameters:: range_name (str) – Range name

range_pop(range_name=None)[source]

Pop a NVTX range and optionally check if the popped range is the expected range to be popped.

Note that the check is performed only if configured to be used when calling enable().

Parameters:: range_name (Optional[str], default: None) – Range name. If set, will be used to check whether the popped range name corresponds to the given name and raise an assertion error if not.

class accvlab.optim_test_tools.TensorDumper(*args, **kwargs)[source]

Bases: SingletonBase

Singleton class for dumping tensor & gradient data to a directory and comparing to previously dumped data.

This class provides a way to dump tensor data to a directory in a structured format.

The dumper is able to dump tensors, gradients, RaggedBatch objects, as well as data with user-defined & auto-applied converters. Furthermore, it supports custom processing prior to dumping (e.g. converting of bounding boxes to images containing the bounding boxes), which is performed only if the dumper is enabled, and does not incur overhead if the dumper is not enabled.

Main JSON files are created for each dump (one for the data and one for the gradients). The individual tensors (or converted data) can be stored inside the main JSON file, or in separate binary/image files (can be configured, and can vary for individual data entries). In case of the binary/image files, the main JSON file contains a reference to the file, and the file is stored in the same directory as the main JSON file.

The dumper can also be used to compare to previously dumped data, to detect mismatches. This can be useful for debugging e.g. to rerun the same code multiple times, while always comparing to the same dumped data. This can be use used when modifying (e.g. optimizing) the implementation, or to check for determinism.

Important

The dumper is a singleton, so that it can be used in different source files without having to pass the instance around.

Note

The comparison is only supported if all data is dumped in the Type.JSON format. This can be enforced by calling set_dump_type_for_all() before dumping/comparing the data (so easy switching between dumping for manual inspection and comparison is possible).

Note

When in the disabled state, all dumping-related methods (dump, add data, compare to dumped data etc) are empty methods, which means they have no effect and minimal overhead.

Note

When obtaining an object using (TensorDumper()) the singleton is returned if already created.

If parameters are provided when calling TensorDumper(), this will enable the dumper (equivalent to calling enable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.

Parameters:: dump_dir – The directory to dump the data to. If provided, the dumper will be enabled automatically. If not provided, the dumper will be disabled and can be enabled later by calling enable().

class Type(value)[source]

Bases: Enum

Dump format types.

The format type determines how tensor data is serialized when dumped.

Note

For binary types (BINARY, IMAGE_RGB, IMAGE_BGR, IMAGE_I), entries are added to the main JSON file indicating the filenames of the stored data. Also, files containing meta-data are created and stored in the same directory. For BINARY, the meta-data is the shape and dtype of the tensor. For IMAGE_*, the meta-data is the original range of the image data (min and max value) and the image format (RGB, BGR, Intensity).

Note

For BINARY and IMAGE_* formats, entries are added to the main JSON file indicating the filenames of the stored data. The filenames for these cases are:

blob/image data: [<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>

meta-data: [<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>.meta.json

Note

For images containing multiple channels, the color channel is the last dimension. If this is not the case, permutation of the axes needs to be applied to move the color channel to the last dimension. The permutation can be applied using the permute_axes parameter, e.g. of add_tensor_data().

If a tensor contains more than the necessary number of dimensions (3 for color images, 2 for grayscale images), the leading dimensions are treated as iterating over the images, and multiple images are dumped (with the indices of the leading dimensions indicated in the filename).

JSON = 0: Tensor data is serialized into the JSON file as nested lists. Suitable for small tensors and provides human-readable output.

BINARY = 1: Tensor data saved as binary files with metadata in separate JSON files. Efficient for large tensors; preserves exact numerical precision.

IMAGE_RGB = 2: Tensor data converted to PNG image format (RGB, 3 channels). Channel must be the last dimension; permute axes if necessary.

IMAGE_BGR = 3: Tensor data converted to PNG image format (BGR, 3 channels). Channel must be the last dimension; permute axes if necessary.

IMAGE_I = 4: Tensor data converted to PNG image format (grayscale). Single channel; no explicit channel dimension.

classmethod is_image(dump_type)[source]

Return type:: bool

enable(dump_dir)[source]

Enable the TensorDumper singleton.

This method can be called only once and enables the TensorDumper singleton. Any use of the singleton before enabling it is ignored.

Parameters:: dump_dir (str) – The directory to dump the data to.

property is_enabled: bool: Whether the TensorDumper is enabled

add_tensor_data(path, data, dump_type, dump_type_override=None, permute_axes=None, permute_axes_override=None, exclude=None)[source]

Add tensor data to the dump.

The data is formatted and inserted into the dump structure.

Parameters:

path (str) – Path where the data will be inserted. If the path does not exist, it will be created. If data is a dictionary, the path may be already present in the structure, but the direct children of data need to be non-existent in the element the path points to. If data is not a dictionary, the path must not be present in the structure and the data will be inserted at the path.
data (Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]]]) – The tensor data to add
dump_type (Type) – The type of dump to use
dump_type_override (Optional[Dict[str, Type]], default: None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.
permute_axes (Optional[Sequence[int]], default: None) – Permutation of axes to apply to the tensor data. If None, no permutation is applied.
permute_axes_override (Optional[Dict[str, Optional[Sequence[int]]]], default: None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.
exclude (Optional[Sequence[str]], default: None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.

add_grad_data(path, data, dump_type, dump_type_override=None, permute_grad_axes=None, permute_grad_axes_override=None, exclude=None)[source]

Add gradient data of the given tensor(s) to dump.

Note that if this method is called, set_gradients() must be called before dumping the next time.

The gradients are computed using torch.autograd.grad(), and do not influence the gradients as computed/used elsewhere in the code (e.g. in the training loop).

Note that tensors which do not require gradients or which are not part of the computation graph can be included in the dump, but no actual gradients will be computed for them. Instead, a note will be written to the json dump in case that requires_grad is False. If the tensor is not part of the computation graph, the written gradient will be null, and no image/binary file will be written for that tensor regardless of the dump_type setting.

Parameters:

path (str) – Path where the gradient data will be inserted. See add_tensor_data() for more details.
data (Union[Tensor, Any, Sequence[Union[Tensor, Any, Sequence, Dict]], Dict[str, Union[Tensor, Any, Sequence, Dict]]]) – The tensor data for which to dump the gradients.
dump_type (Type) – The type of dump to use
dump_type_override (Optional[Dict[str, Type]], default: None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.
permute_grad_axes (Optional[Sequence[int]], default: None) – Permutation of axes to apply to the gradient data. If None, no permutation is applied.
permute_grad_axes_override (Optional[Dict[str, Optional[Sequence[int]]]], default: None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. If None, no override is applied.
exclude (Optional[Sequence[str]], default: None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.

set_dump_type_for_all(dump_type, include_tensors=True, include_grads=True)[source]

Set the dump type for all tensors and gradients.

This method is e.g. useful to quickly change the dump type to Type.JSON to generate reference data for comparison (using compare_to_dumped_data()) without the need to go through the code and change the dump type for each tensor manually.

Important

This method can sets the dumping type for the data which is already added. The dump type of data which is added after this method is called will not be affected.

Parameters:

dump_type (Type) – The type of dump to use
include_tensors (bool, default: True) – Whether to include tensors in the dump
include_grads (bool, default: True) – Whether to include gradients in the dump

dump()[source]: Dump the data to the dump directory.

compare_to_dumped_data(eps_numerical_data=1e-06, num_errors_per_tensor_to_show=1, allow_missing_data_in_current=False, as_warning=False)[source]

Compare the data to previously dumped data.

In case of a mismatch, a ValueError is raised with a detailed error message.

Important

Only comparisons to data stored in the JSON format (Type.JSON) are supported. Therefore, the reference data must be stored with the Type.JSON both when generating the reference data and when comparing to it.

An easy way to ensure that the reference data is stored in the JSON format without modifying multiple places in the code is to call set_dump_type_for_all() when generating the reference data.

Note

The comparison can be set to allow missing data in the current data by setting allow_missing_data_in_current to True. This is e.g. useful if the current data is based on an implementation in progress, so that some of the data is not yet available. In this case, the comparison will not raise an error if the current data is missing some data which is present in the reference data. Instead, a warning will be printed.

Parameters:

eps_numerical_data (float, default: 1e-06) – The numerical tolerance for the comparison of numerical data.
num_errors_per_tensor_to_show (int, default: 1) – The number of most significant errors to show per tensor.
allow_missing_data_in_current (bool, default: False) – If True, the comparison will not raise an error if the current data is missing some data which is present in the reference data.
as_warning (bool, default: False) – If True, no error is raised in case of a mismatch and instead, a warning is printed. If False, an error is raised.

set_gradients(function_values)[source]

Set gradients for the tensors in the dump.

The gradients are computed using torch.autograd.grad(), and do not influence the gradients computed elsewhere (e.g. in the training loop).

This method must be called before dumping if add_grad_data() was called since the last dump.

Parameters:: function_values (Union[Tensor, List[Tensor]]) – The value(s) of the function(s) to compute the gradients for. This can be a single tensor or a list of tensors.

reset_dump_count()[source]

Reset the dump count.

Important

Resetting the dump count means that:

In case of dumping: the next dump will overwrite a previous dump (starting from the first dump).

In case of comparing to previously dumped data: the next comparison will start from the first dump.

This method is useful for debugging e.g. to rerun the same code multiple times to check for determinism, while always comparing to the same dumped data.

perform_after_dump_count(count, action)[source]

Register an action to be performed after a given number of dumps.

The action will be performed after the dump is completed.

This can e.g. be used to automatically exit the program after a given number of iterations have been dumped (by passing the exit()-function as the action).

Important

If reset_dump_count() is called, the dump count is reset to 0, and the action will be performed after the count-th dump after the reset.

Note that this also means that the action can be performed multiple times if the dump count is reset after the action has been performed.

Important

This method can be called multiple times with the same count. In this case, the action will be overwritten.

Note that as in case of other methods, this method has no effect if the TensorDumper is not enabled.

Parameters:

count (int) – The number of dumps after which the action should be performed.
action (Callable[[], None]) – The action to perform.

register_custom_converter(data_type, converter_func)[source]

Register a custom converter for a given data type.

This method can be used to register a custom converter function for a given data type. The converter function must take a single argument of type data_type and return one of the following, or a nested list/dict structure containing elements of the following types:

either a JSON-serializable object,

or a tensor,

or a numpy array

or an object for which a custom converter is registered

The conversion is performed iteratively, so that chains of conversions can be followed through.

The conversion is performed before any other processing steps. This means that if the converter returns tensors, these are handled in the same way as tensors which are directly added to the dumper.

Note

This is useful when the data to dump in not JSON-serializable by default. This may e.g. be the case for custom data types which are used in the training.

Parameters:

data_type (type) – The type of the data to convert.
converter_func (Callable) – The function to use for converting the data.

enable_ragged_batch_dumping(as_per_sample=False)[source]

Enable dumping of RaggedBatch data.

Note

It is possible to dump some RaggedBatch data as per sample, and some as a RaggedBatch structure. This can be achieved by calling this method multiple times with different values for as_per_sample, before adding the data which should be dumped with the desired format.

Parameters:: as_per_sample (bool, default: False) – If True, the RaggedBatch data is dumped as per sample. Otherwise, it is dumped as a RaggedBatch structure.

run_if_enabled(func)[source]

Run a function if the TensorDumper is enabled.

This method can be used to run a function only if the TensorDumper is enabled. This is useful to avoid running code which is only relevant for debugging.

The typical use-case for this method is the dumping of data which needs to be pre-processed first (e.g. drawing of bounding boxes into an image). This is done as follows:

Encapsulate the pre-processing logic in a function (inside the function which uses the dumper). Note that this means that func will enclose the data accessible in that function and therefore does not need to have any arguments. The function func should

Perform any debugging-related pre-processing needed

Add the pre-processed data to the dump (e.g. using add_tensor_data())

Call run_if_enabled() with the function func as its argument. This will ensure that the pre-processing is only performed if the dumper is enabled. Otherwise, the pre-processing is omitted, and there is no overhead (apart from calling an empty function).

Parameters:: func (Callable[[], None]) – The function to run. The function must take no arguments.