Configuration¶

Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: enable_backward), the setting at the more-specific scope takes precedence.

Global Settings¶

To change a setting, prepend wp.config. to the name of the variable and assign a value to it. Some settings may be changed on the fly, while others need to be set prior to calling wp.init() to take effect.

For example, the location of the user kernel cache can be changed with:

import os

import warp as wp

example_dir = os.path.dirname(os.path.realpath(__file__))

# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")

wp.init()

Basic Global Settings¶

Field	Type	Default Value	Description
`verify_fp`	Boolean	`False`	If `True`, Warp will check that inputs and outputs are finite before and/or after various operations. Has performance implications.
`verify_cuda`	Boolean	`False`	If `True`, Warp will check for CUDA errors after every launch and memory operation. CUDA error verification cannot be used during graph capture. Has performance implications.
`print_launches`	Boolean	`False`	If `True`, Warp will print details of every kernel launch to standard out (e.g. launch dimensions, inputs, outputs, device, etc.). Has performance implications.
`mode`	String	`"release"`	Controls whether to compile Warp kernels in debug or release mode. Valid choices are `"release"` or `"debug"`. Has performance implications.
`max_unroll`	Integer	Global setting	The maximum fixed-size loop to unroll. Note that `max_unroll` does not consider the total number of iterations in nested loops. This can result in a large amount of automatically generated code if each nested loop is below the `max_unroll` threshold.
`verbose`	Boolean	`False`	If `True`, additional information will be printed to standard out during code generation, compilation, etc.
`verbose_warnings`	Boolean	`False`	If `True`, Warp warnings will include extra information such as the source file and line number.
`quiet`	Boolean	`False`	If `True`, Warp module initialization messages will be disabled. This setting does not affect error messages and warnings.
`kernel_cache_dir`	String	`None`	The path to the directory used for the user kernel cache. Subdirectories beginning with `wp_` will be created in this directory. If `None`, a directory will be automatically determined using the value of the environment variable `WARP_CACHE_PATH` or the appdirs.user_cache_directory if `WARP_CACHE_PATH` is also not set. `kernel_cache_dir` will be updated to reflect the location of the cache directory used.
`enable_backward`	Boolean	`True`	If `True`, backward passes of kernels will be compiled by default. Disabling this setting can reduce kernel compilation times.
`enable_graph_capture_module_load_by_default`	Boolean	`True`	If `True`, `wp.capture_begin()` will call `wp.force_load()` to compile and load Warp kernels from all imported modules before graph capture if the `force_module_load` argument is not explicitly provided to `wp.capture_begin()`. This setting is ignored if the CUDA driver supports CUDA 12.3 or newer.
`enable_mempools_at_init`	Boolean	`False`	If `True`, `wp.init()` will enable pooled allocators on all CUDA devices that support memory pools. Pooled allocators are generally faster and can be used during CUDA graph capture. For the caveats, see CUDA Pooled Allocators documentation.

Advanced Global Settings¶

Field	Type	Default Value	Description
`cache_kernels`	Boolean	`True`	If `True`, kernels that have already been compiled from previous application launches will not be recompiled.
`cuda_output`	String	`None`	The preferred CUDA output format for kernels. Valid choices are `None`, `"ptx"`, and `"cubin"`. If `None`, a format will be determined automatically.
`ptx_target_arch`	Integer	70	The target architecture for PTX generation.
`llvm_cuda`	Boolean	`False`	If `True`, Clang/LLVM will be used to compile CUDA code instead of NVTRC.

Module Settings¶

Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of option pairs to wp.set_module_options().

For example, compilation of backward passes for the kernel in an entire module can be disabled with:

wp.set_module_options({"enable_backward": False})

The options for a module can also be queried using wp.get_module_options().

Field	Type	Default Value	Description
`mode`	String	Global setting	Controls whether to compile the module’s kernels in debug or release mode by default. Valid choices are `"release"` or `"debug"`.
`max_unroll`	Integer	Global setting	The maximum fixed-size loop to unroll. Note that `max_unroll` does not consider the total number of iterations in nested loops. This can result in a large amount of automatically generated code if each nested loop is below the `max_unroll` threshold.
`enable_backward`	Boolean	Global setting	If `True`, backward passes of kernels will be compiled by default. Valid choices are `"release"` or `"debug"`.
`fast_math`	Boolean	`False`	If `True`, CUDA kernels will be compiled with the `--use_fast_math` compiler option, which enables some fast math operations that are faster but less accurate.
`cuda_output`	String	`None`	The preferred CUDA output format for kernels. Valid choices are `None`, `"ptx"`, and `"cubin"`. If `None`, a format will be determined automatically. The module-level setting takes precedence over the global setting.

Kernel Settings¶

enable_backward is currently the only setting that can also be configured on a per-kernel level. Backward-pass compilation can be disabled by passing an argument into the @wp.kernel decorator as in the following example:

@wp.kernel(enable_backward=False)
def scale_2(
    x: wp.array(dtype=float),
    y: wp.array(dtype=float),
):
    y[0] = x[0] ** 2.0