Debugging#
Printing Values#
Often one of the best debugging methods is to simply print values from kernels. Warp supports printing all built-in
types using the print()
function, e.g.:
v = wp.vec3(1.0, 2.0, 3.0)
print(v)
[1.0, 2.0, 3.0]
In addition, formatted C-style printing is available through the wp.printf()
function, e.g.:
x = 1.0
i = 2
wp.printf("A float value %f, an int value: %d", x, i)
Note
Formatted printing is only available for scalar types (e.g.: int
and float
) not vector types.
Printing Launches#
For complex applications, it can be difficult to understand the order-of-operations that lead to a bug. To help diagnose these issues, Warp supports a simple option to print out all launches and arguments to the console:
wp.config.print_launches = True
Step-Through Debugging#
It is possible to attach IDE debuggers such as Visual Studio to Warp processes to step through generated kernel code. Users should first compile the kernels in debug mode by setting:
wp.config.mode = "debug"
This setting ensures that line numbers, and debug symbols are generated correctly. After launching the Python process, the debugger should be attached, and a breakpoint inserted into the generated code.
Note
Generated kernel code is not a 1:1 correspondence with the original Python code, but individual operations can still be replayed and variables inspected.
Also see warp/tests/walkthrough_debug.py for an example of how to debug Warp kernel code running on the CPU.
Generated Code#
Occasionally it can be useful to inspect the generated code for debugging or profiling.
The generated code for kernels is stored in a central cache location in the user’s home directory, the cache location
is printed at startup when wp.init()
is called, for example:
Warp 0.8.1 initialized:
CUDA Toolkit: 11.8, Driver: 11.8
Devices:
"cpu" | AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
"cuda:0" | NVIDIA GeForce RTX 3090 (sm_86)
"cuda:1" | NVIDIA GeForce RTX 2080 Ti (sm_75)
Kernel cache: C:\Users\LukasW\AppData\Local\NVIDIA Corporation\warp\Cache\0.8.1
The kernel cache has folders beginning with wp_
that contain the generated C++/CUDA code and the compiled binaries
for each module that was compiled at runtime.
The name of each folder ends with a hexadecimal hash constructed from the module contents to avoid potential
conflicts when using multiple processes and to support the caching of runtime-defined kernels.
Bounds Checking#
Warp will perform bounds checking in debug build configurations to ensure that all array accesses lie within the defined shape.
CUDA Verification#
It is possible to generate out-of-bounds memory access violations through poorly formed kernel code or inputs. In this case the CUDA runtime will detect the violation and put the CUDA context into an error state. Subsequent kernel launches may silently fail which can lead to hard to diagnose issues.
If a CUDA error is suspected a simple verification method is to enable:
wp.config.verify_cuda = True
This setting will check the CUDA context after every wp.launch()
to ensure that it is still valid.
If an error is encountered, an exception will be raised that often helps to narrow down the problematic kernel.
Note
Verifying CUDA state at each launch requires synchronizing CPU and GPU which has a significant overhead. Users should ensure this setting is only used during debugging.