API Reference
Complete API documentation for the optim_test_tools package.
- class accvlab.optim_test_tools.Stopwatch(*args, **kwargs)[source]
Bases:
SingletonBaseStopwatch for performing runtime evaluations.
This is a singleton class for performing runtime measurements and obtaining the total as well as average run-time for the measurements
per measurement.
per (training) iteration, where a single measurement can be performed multiple times, or only in some iterations.
A warm-up phase can be defined when configuring the stopwatch. During the warm-up phase, no measurements are performed.
Multiple measurements can be performed, and are distinguished by name. The end of a (training) iteration is indicated explicitly (by calling the method
finish_iter()). This is done to automatically start measurements after the warm-up phase is finished, to average the measurements over the iterations, and to print the measurements in certain intervals.The CPU usage can be measured for one “type” of measurement (i.e. one measurement name). This is done by calling the
set_cpu_usage_meas_name()before the first measurement with the corresponding name is started. The CPU usage is then measured whenever the measurement is running and the average CPU usage is printed together with the other measurements.One-time measurements can be performed at any point in the code (see
start_one_time_measurement(),end_one_time_measurement()). They are not affected by the warm-up phase and are reported as such (i.e. in own section and without averages etc.). Each one-time measurement (i.e. measurement with a given name) can be performed only once.Warning
The CPU usage is measured using
psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the functionpsutil.cpu_percent()must not be called outside of the stopwatch during the measurement.The stopwatch must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. To ensure this, the methods are empty in disabled state and replaced with methods implementing the functionality when enabled. This means that the runtime overhead for using the stopwatch is negligible when it is not enabled. The enabling can be done from any part of the code (as this is a singleton).
Note
When obtaining an object using (
Stopwatch()) the singleton is returned if already created.If parameters are provided when calling
Stopwatch(), this will enable the stopwatch (equivalent to callingenable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.- Parameters:
num_warmup_iters – The number of warmup iterations to be performed before the runtime measurement is started.
print_every_n_iters – Once in how many iterations to print the measured runtime. If
None, the runtime is not printed automatically (but can still be printed manually by callingprint_eval_times()).do_cuda_sync – Whether to synchronize the CUDA device every time a measurement is started or stopped.
- enable(num_warmup_iters, print_every_n_iters, do_cuda_sync)[source]
Enable the stopwatch
This method can be called only once and enables the Stopwatch singleton. Any measurements started or performed before calling this method are ignored.
- Parameters:
num_warmup_iters (
int) – The number of warmup iterations to be performed before the runtime measurement is startedprint_every_n_iters (
Optional[int]) – Once in how many iterations to print the measured runtime. IfNone, the runtime is not printed automatically (but can be still printed manually by callingprint_eval_times()).do_cuda_sync (
bool) – Whether to synchronize CUDA device every time a measurement is started or stopped.
- set_cpu_usage_meas_name(name)[source]
Set the name of the CPU usage measurement
This method must be called before the first measurement with the corresponding name is started.
Important
If the CPU usage measurement is already set, it cannot be changed. Calling this method with a different name will raise an error.
Calling this method with the same name as the current CPU usage measurement name is allowed and will have no effect. This is useful to set the CPU usage name right before starting the first measurement with the corresponding name, even if the corresponding code region is called iteratively.
Warning
The CPU usage is measured using
psutil.cpu_percent(). To ensure that the interval for which the CPU usage is measured is correct, the functionpsutil.cpu_percent()must not be called outside of the stopwatch during the measurement.- Parameters:
name (
str) – Name of the measurement
- start_meas(name)[source]
Start a measurement with given name.
- Parameters:
name (
str) – Name for the measurement
- end_meas(name)[source]
End a measurement with given name.
- Parameters:
name (
str) – Name of the measurements
- start_one_time_measurement(name)[source]
Start a one-time measurement with given name.
- Parameters:
name (
str) – Name of the measurement
- class accvlab.optim_test_tools.NVTXRangeWrapper(*args, **kwargs)[source]
Bases:
SingletonBaseWrapper for NVTX ranges.
This is a singleton class which allows for enabling the use of NVTX ranges and configuring how the ranges are used from any part of the implementation.
The wrapper must be first enabled before any measurements are performed. If not enabled, calls to any methods have minimal overhead. Enabling can be done from any part of the code (as this is a singleton).
Compared to using the NVTX range push/pop functionality directly, it offers the following advantages:
It is possible to easily configure whether CUDA synchronization is performed when pushing/popping a range. The synchronization is part of the push/pop methods and so can be turned on and off without changes to the code where the ranges are used, and is not performed if not needed.
If not enabled, calls to push/pop have minimal overhead (call to an empty function). Note that while the pushing/popping of ranges itself also has negligible overhead using NVTX directly, profiling-related CUDA synchronizations need to be handled manually in this case.
Range mismatch checks: The wrapper allows for checks whether the popped range corresponds to the range that is expected to be popped. This functionality can be turned on or off as part of the configuration when enabling the wrapper. This functionality has an overhead, and so should be only enabled for debugging purposes, and be turned off when actual profiling is performed.
Note
When obtaining an object using (
NVTXRangeWrapper()) the singleton is returned if already created.If parameters are provided when calling
NVTXRangeWrapper(), this will enable the NVTX range wrapper (equivalent to callingenable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.- Parameters:
sync_on_push – Whether to synchronize the CUDA device every time before pushing a range
sync_on_pop – Whether to synchronize the CUDA device every time before popping a range
keep_track_of_range_order – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set to
True. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.
- enable(sync_on_push, sync_on_pop, keep_track_of_range_order)[source]
Enable the NVTX range wrapper.
This method can be called only once and enables the NVTXRangeWrapper singleton. Any use of the singleton before enabling it is ignored.
- Parameters:
sync_on_push (
bool) – Whether to synchronize the CUDA device every time before pushing a rangesync_on_pop (
bool) – Whether to synchronize the CUDA device every time before popping a rangekeep_track_of_range_order (
bool) – Whether to keep track of the range stack internally. A range name may be specified optionally when popping a range, and a check is performed whether the popped range corresponds to the range that is expected to be popped if this is set toTrue. Note that this has an overhead and so should be only enabled for debugging purposes, and be turned off when performing the actual profiling.
- class accvlab.optim_test_tools.TensorDumper(*args, **kwargs)[source]
Bases:
SingletonBaseSingleton class for dumping tensor & gradient data to a directory and comparing to previously dumped data.
This class provides a way to dump tensor data to a directory in a structured format.
The dumper is able to dump tensors, gradients,
RaggedBatchobjects, as well as data with user-defined & auto-applied converters. Furthermore, it supports custom processing prior to dumping (e.g. converting of bounding boxes to images containing the bounding boxes), which is performed only if the dumper is enabled, and does not incur overhead if the dumper is not enabled.Main JSON files are created for each dump (one for the data and one for the gradients). The individual tensors (or converted data) can be stored inside the main JSON file, or in separate binary/image files (can be configured, and can vary for individual data entries). In case of the binary/image files, the main JSON file contains a reference to the file, and the file is stored in the same directory as the main JSON file.
The dumper can also be used to compare to previously dumped data, to detect mismatches. This can be useful for debugging e.g. to rerun the same code multiple times, while always comparing to the same dumped data. This can be use used when modifying (e.g. optimizing) the implementation, or to check for determinism.
Important
The dumper is a singleton, so that it can be used in different source files without having to pass the instance around.
Note
The comparison is only supported if all data is dumped in the
Type.JSONformat. This can be enforced by callingset_dump_type_for_all()before dumping/comparing the data (so easy switching between dumping for manual inspection and comparison is possible).Note
When in the disabled state, all dumping-related methods (dump, add data, compare to dumped data etc) are empty methods, which means they have no effect and minimal overhead.
Note
When obtaining an object using (
TensorDumper()) the singleton is returned if already created.If parameters are provided when calling
TensorDumper(), this will enable the dumper (equivalent to callingenable()). Note that enabling can only be done once, and will lead to an error if attempted a second time.- Parameters:
dump_dir – The directory to dump the data to. If provided, the dumper will be enabled automatically. If not provided, the dumper will be disabled and can be enabled later by calling
enable().
- class Type(value)[source]
Bases:
EnumDump format types.
The format type determines how tensor data is serialized when dumped.
Note
For binary types (
BINARY,IMAGE_RGB,IMAGE_BGR,IMAGE_I), entries are added to the main JSON file indicating the filenames of the stored data. Also, files containing meta-data are created and stored in the same directory. ForBINARY, the meta-data is the shape and dtype of the tensor. ForIMAGE_*, the meta-data is the original range of the image data (min and max value) and the image format (RGB, BGR, Intensity).Note
For
BINARYandIMAGE_*formats, entries are added to the main JSON file indicating the filenames of the stored data. The filenames for these cases are:blob/image data:
[<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>meta-data:
[<main_json_file_name>]<path_to_data_in_dumped_structure>.<file_type>.meta.json
Note
For images containing multiple channels, the color channel is the last dimension. If this is not the case, permutation of the axes needs to be applied to move the color channel to the last dimension. The permutation can be applied using the
permute_axesparameter, e.g. ofadd_tensor_data().If a tensor contains more than the necessary number of dimensions (3 for color images, 2 for grayscale images), the leading dimensions are treated as iterating over the images, and multiple images are dumped (with the indices of the leading dimensions indicated in the filename).
- JSON = 0
Tensor data is serialized into the JSON file as nested lists. Suitable for small tensors and provides human-readable output.
- BINARY = 1
Tensor data saved as binary files with metadata in separate JSON files. Efficient for large tensors; preserves exact numerical precision.
- IMAGE_RGB = 2
Tensor data converted to PNG image format (RGB, 3 channels). Channel must be the last dimension; permute axes if necessary.
- IMAGE_BGR = 3
Tensor data converted to PNG image format (BGR, 3 channels). Channel must be the last dimension; permute axes if necessary.
- IMAGE_I = 4
Tensor data converted to PNG image format (grayscale). Single channel; no explicit channel dimension.
- enable(dump_dir)[source]
Enable the TensorDumper singleton.
This method can be called only once and enables the TensorDumper singleton. Any use of the singleton before enabling it is ignored.
- Parameters:
dump_dir (
str) – The directory to dump the data to.
- add_tensor_data(path, data, dump_type, dump_type_override=None, permute_axes=None, permute_axes_override=None, exclude=None)[source]
Add tensor data to the dump.
The data is formatted and inserted into the dump structure.
- Parameters:
path (
str) – Path where the data will be inserted. If the path does not exist, it will be created. If data is a dictionary, the path may be already present in the structure, but the direct children of data need to be non-existent in the element the path points to. If data is not a dictionary, the path must not be present in the structure and the data will be inserted at the path.data (
Union[Tensor,Any,Sequence[Union[Tensor,Any,Sequence,Dict]],Dict[str,Union[Tensor,Any,Sequence,Dict]]]) – The tensor data to adddump_type (
Type) – The type of dump to usedump_type_override (
Optional[Dict[str,Type]], default:None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. IfNone, no override is applied.permute_axes (
Optional[Sequence[int]], default:None) – Permutation of axes to apply to the tensor data. IfNone, no permutation is applied.permute_axes_override (
Optional[Dict[str,Optional[Sequence[int]]]], default:None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. IfNone, no override is applied.exclude (
Optional[Sequence[str]], default:None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.
- add_grad_data(path, data, dump_type, dump_type_override=None, permute_grad_axes=None, permute_grad_axes_override=None, exclude=None)[source]
Add gradient data of the given tensor(s) to dump.
Note that if this method is called,
set_gradients()must be called before dumping the next time.The gradients are computed using
torch.autograd.grad(), and do not influence the gradients as computed/used elsewhere in the code (e.g. in the training loop).Note that tensors which do not require gradients or which are not part of the computation graph can be included in the dump, but no actual gradients will be computed for them. Instead, a note will be written to the json dump in case that
requires_gradisFalse. If the tensor is not part of the computation graph, the written gradient will benull, and no image/binary file will be written for that tensor regardless of thedump_typesetting.- Parameters:
path (
str) – Path where the gradient data will be inserted. Seeadd_tensor_data()for more details.data (
Union[Tensor,Any,Sequence[Union[Tensor,Any,Sequence,Dict]],Dict[str,Union[Tensor,Any,Sequence,Dict]]]) – The tensor data for which to dump the gradients.dump_type (
Type) – The type of dump to usedump_type_override (
Optional[Dict[str,Type]], default:None) – A dictionary mapping names to dump types. If a name is present in the dictionary, the dump type for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. IfNone, no override is applied.permute_grad_axes (
Optional[Sequence[int]], default:None) – Permutation of axes to apply to the gradient data. IfNone, no permutation is applied.permute_grad_axes_override (
Optional[Dict[str,Optional[Sequence[int]]]], default:None) – A dictionary mapping names to permute axes. If a name is present in the dictionary, the permute axes for all gradients of tensors with that name in the path (i.e. either the name itself or the name of a parent along the path) will be overridden with the value in the dictionary. If multiple names match the path, the match closest to the tensor (i.e. further inside the structure) is used. IfNone, no override is applied.exclude (
Optional[Sequence[str]], default:None) – List of entries to exclude from the dump. There entries are specified by names and may apply to any level of the data structure.
- set_dump_type_for_all(dump_type, include_tensors=True, include_grads=True)[source]
Set the dump type for all tensors and gradients.
This method is e.g. useful to quickly change the dump type to
Type.JSONto generate reference data for comparison (usingcompare_to_dumped_data()) without the need to go through the code and change the dump type for each tensor manually.Important
This method can sets the dumping type for the data which is already added. The dump type of data which is added after this method is called will not be affected.
- compare_to_dumped_data(eps_numerical_data=1e-06, num_errors_per_tensor_to_show=1, allow_missing_data_in_current=False, as_warning=False)[source]
Compare the data to previously dumped data.
In case of a mismatch, a
ValueErroris raised with a detailed error message.Important
Only comparisons to data stored in the JSON format (Type.JSON) are supported. Therefore, the reference data must be stored with the
Type.JSONboth when generating the reference data and when comparing to it.An easy way to ensure that the reference data is stored in the JSON format without modifying multiple places in the code is to call
set_dump_type_for_all()when generating the reference data.Note
The comparison can be set to allow missing data in the current data by setting
allow_missing_data_in_currenttoTrue. This is e.g. useful if the current data is based on an implementation in progress, so that some of the data is not yet available. In this case, the comparison will not raise an error if the current data is missing some data which is present in the reference data. Instead, a warning will be printed.- Parameters:
eps_numerical_data (
float, default:1e-06) – The numerical tolerance for the comparison of numerical data.num_errors_per_tensor_to_show (
int, default:1) – The number of most significant errors to show per tensor.allow_missing_data_in_current (
bool, default:False) – IfTrue, the comparison will not raise an error if the current data is missing some data which is present in the reference data.as_warning (
bool, default:False) – IfTrue, no error is raised in case of a mismatch and instead, a warning is printed. IfFalse, an error is raised.
- set_gradients(function_values)[source]
Set gradients for the tensors in the dump.
The gradients are computed using
torch.autograd.grad(), and do not influence the gradients computed elsewhere (e.g. in the training loop).This method must be called before dumping if
add_grad_data()was called since the last dump.
- reset_dump_count()[source]
Reset the dump count.
Important
Resetting the dump count means that:
In case of dumping: the next dump will overwrite a previous dump (starting from the first dump).
In case of comparing to previously dumped data: the next comparison will start from the first dump.
This method is useful for debugging e.g. to rerun the same code multiple times to check for determinism, while always comparing to the same dumped data.
- perform_after_dump_count(count, action)[source]
Register an action to be performed after a given number of dumps.
The action will be performed after the dump is completed.
This can e.g. be used to automatically exit the program after a given number of iterations have been dumped (by passing the
exit()-function as the action).Important
If
reset_dump_count()is called, the dump count is reset to 0, and the action will be performed after thecount-th dump after the reset.Note that this also means that the action can be performed multiple times if the dump count is reset after the action has been performed.
Important
This method can be called multiple times with the same count. In this case, the action will be overwritten.
Note that as in case of other methods, this method has no effect if the TensorDumper is not enabled.
- register_custom_converter(data_type, converter_func)[source]
Register a custom converter for a given data type.
This method can be used to register a custom converter function for a given data type. The converter function must take a single argument of type
data_typeand return one of the following, or a nested list/dict structure containing elements of the following types:either a JSON-serializable object,
or a tensor,
or a numpy array
or an object for which a custom converter is registered
The conversion is performed iteratively, so that chains of conversions can be followed through.
The conversion is performed before any other processing steps. This means that if the converter returns tensors, these are handled in the same way as tensors which are directly added to the dumper.
Note
This is useful when the data to dump in not JSON-serializable by default. This may e.g. be the case for custom data types which are used in the training.
- enable_ragged_batch_dumping(as_per_sample=False)[source]
Enable dumping of
RaggedBatchdata.Note
It is possible to dump some
RaggedBatchdata as per sample, and some as aRaggedBatchstructure. This can be achieved by calling this method multiple times with different values foras_per_sample, before adding the data which should be dumped with the desired format.- Parameters:
as_per_sample (
bool, default:False) – IfTrue, theRaggedBatchdata is dumped as per sample. Otherwise, it is dumped as aRaggedBatchstructure.
- run_if_enabled(func)[source]
Run a function if the TensorDumper is enabled.
This method can be used to run a function only if the TensorDumper is enabled. This is useful to avoid running code which is only relevant for debugging.
The typical use-case for this method is the dumping of data which needs to be pre-processed first (e.g. drawing of bounding boxes into an image). This is done as follows:
Encapsulate the pre-processing logic in a function (inside the function which uses the dumper). Note that this means that
funcwill enclose the data accessible in that function and therefore does not need to have any arguments. The functionfuncshouldPerform any debugging-related pre-processing needed
Add the pre-processed data to the dump (e.g. using
add_tensor_data())
Call
run_if_enabled()with the functionfuncas its argument. This will ensure that the pre-processing is only performed if the dumper is enabled. Otherwise, the pre-processing is omitted, and there is no overhead (apart from calling an empty function).