Debugging#

Printing Values#

Often one of the best debugging methods is to simply print values from kernels. Warp supports printing all built-in types using the print() function, e.g.:

v = wp.vec3(1.0, 2.0, 3.0)

print(v)

[1.0, 2.0, 3.0]

In addition, formatted C-style printing for scalar types is available through the wp.printf() function, e.g.:

@wp.kernel
def mykernel():
    x = 1.0
    i = 2

    wp.printf("A float value %f, an int value: %d\n", x, i)

Verbose Mode and Printing Launches#

For complex applications, it can be difficult to understand the order-of-operations that lead to a bug. To help diagnose these issues, Warp supports a simple option to print out all launches and arguments to the console:

wp.config.print_launches = True

Verbose mode can also be enabled with:

wp.config.verbose = True

In verbose mode, additional messages will be printed to standard output regarding program progress and code generation, such as when operations may be non-differentiable.

Verbose warnings can be enabled with:

wp.config.verbose_warnings = True

This can be useful in identify where a particular Warp UserWarning message is being emitted from.

Debug Mode Compilation#

In debug mode, Warp kernels will perform the following additional checks:

Raise an assertion if there is an array access outside the defined shape.
Warn if wp.tid() will return an overflowed value on large grids.
(GPU-only) Warn if the CUDA grid dimensions have been capped due to an overflowed number of blocks.
(GPU-only) Generate line-number information for device code.

The easiest way to enable the compilation of Warp kernels in debug mode is to set:

wp.config.mode = "debug"

As an alternative to the previous global setting, debug mode can be turned on in a per-module basis by setting

wp.set_module_options({"mode": "debug"})

Assertions#

assert statements can be inserted into Warp kernels and user-defined functions to interrupt the program execution when a provided Boolean expression evaluates to false. Assertions are only active for a module’s kernels when the module is compiled in debug mode.

The following example will raise an assertion when the kernel is run since the module is compiled in debug mode and the assert statement expects that the array passed into the expect_ones kernel is an array of ones, but we passed it a single-element array of zeros:

import warp as wp

wp.config.mode = "debug"

@wp.kernel
def expect_ones(a: wp.array(dtype=int)):
    i = wp.tid()

    assert a[i] == 1, "Array element must be 1"

input_array = wp.zeros(1, dtype=int)

wp.launch(expect_ones, input_array.shape, inputs=[input_array])

wp.synchronize_device()

The output of the program will include a line like the following statement:

default_program:49: void expect_ones_133f9859_cuda_kernel_forward(wp::launch_bounds_t, wp::array_t<int>): block: [0,0,0], thread: [0,0,0] Assertion `("assert a[i] == 1, \"Array element must be 1\"",var_3)` failed.

Step-Through Debugging#

It is possible to attach IDE debuggers such as Visual Studio to Warp processes to step through generated kernel code. Users should first compile the kernels in debug mode by setting:

wp.config.mode = "debug"

This setting ensures that line numbers, and debug symbols are generated correctly. After launching the Python process, the debugger should be attached, and a breakpoint inserted into the generated code.

Note

Generated kernel code is not a 1:1 correspondence with the original Python code, but individual operations can still be replayed and variables inspected.

Also see warp/tests/walkthrough_debug.py for an example of how to debug Warp kernel code running on the CPU.

Generated Code#

Occasionally, it can be useful to inspect the generated code for debugging or profiling. The generated code for kernels is stored in a central cache location in the user’s home directory by default. The cache location is printed at startup when wp.init() is called, for example:

Warp 0.8.1 initialized:
    CUDA Toolkit: 11.8, Driver: 11.8
    Devices:
    "cpu"    | AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
    "cuda:0" | NVIDIA GeForce RTX 3090 (sm_86)
    "cuda:1" | NVIDIA GeForce RTX 2080 Ti (sm_75)
    Kernel cache: C:\Users\LukasW\AppData\Local\NVIDIA Corporation\warp\Cache\0.8.1

The kernel cache has folders beginning with wp_ that contain the generated C++/CUDA code and the compiled binaries for each module that was compiled at runtime. The name of each folder ends with a hexadecimal hash constructed from the module contents to avoid potential conflicts when using multiple processes and to support the caching of runtime-defined kernels.

If an bug with Warp’s kernel caching logic is suspected, kernel caching can be disabled by setting:

wp.config.cache_kernels = True

CUDA Error Verification#

It is possible to generate out-of-bounds memory access violations through poorly formed kernel code or inputs. In this case, the CUDA runtime will detect the violation and put the CUDA context into an error state. Subsequent kernel launches may silently fail, which can lead to hard-to-diagnose issues.

If a CUDA error is suspected, a simple verification method is to enable:

wp.config.verify_cuda = True

This setting will check the CUDA context after every wp.launch() to ensure that it is still valid. If an error is encountered, an exception will be raised that often helps to narrow down the problematic kernel.

CUDA error verification cannot be used while a CUDA graph is being captured.

Note

Verifying CUDA state at each launch requires synchronizing CPU and GPU which has a significant overhead. Users should ensure this setting is only used during debugging.

Detecting Non-Finite Values#

wp.config.verify_fp = True can be helpful in identifying where a calculation is producing non-finite values like NaN or infinity. When this flag is used on its own, messages will be printed to the standard output stream indicating the function that is detecting invalid values.

If combined with Debug Mode Compilation, an assertion will be raised when an invalid value is detected.

CUDA Toolkit Debugging Tools#

Compute Sanitizer tools like initcheck and memcheck can also be used to detect subtle memory-access issues in Warp applications, e.g.

compute-sanitizer --tool initcheck python sim.py

The Compute Sanitizer suite is available through the CUDA Toolkit.