Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity
of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: enable_backward),
the setting at the more-specific scope takes precedence.
To change a setting, prepend wp.config. to the name of the variable and assign a value to it.
Some settings may be changed on the fly, while others need to be set prior to calling wp.init() to take effect.
For example, the location of the user kernel cache can be changed with:
importosimportwarpaswpexample_dir=os.path.dirname(os.path.realpath(__file__))# set default cache directory before wp.init()wp.config.kernel_cache_dir=os.path.join(example_dir,"tmp","warpcache1")wp.init()
If True, Warp will check that inputs and outputs are finite before
and/or after various operations. Has performance implications.
verify_cuda
Boolean
False
If True, Warp will check for CUDA errors after every launch and
memory operation. CUDA error verification cannot be used during graph
capture. Has performance implications.
print_launches
Boolean
False
If True, Warp will print details of every kernel launch to standard
out (e.g. launch dimensions, inputs, outputs, device, etc.).
Has performance implications.
mode
String
"release"
Controls whether to compile Warp kernels in debug or release mode.
Valid choices are "release" or "debug".
Has performance implications.
max_unroll
Integer
Global
setting
The maximum fixed-size loop to unroll. Note that max_unroll does not
consider the total number of iterations in nested loops. This can result
in a large amount of automatically generated code if each nested loop is
below the max_unroll threshold.
verbose
Boolean
False
If True, additional information will be printed to standard out
during code generation, compilation, etc.
verbose_warnings
Boolean
False
If True, Warp warnings will include extra information such as
the source file and line number.
quiet
Boolean
False
If True, Warp module initialization messages will be disabled.
This setting does not affect error messages and warnings.
kernel_cache_dir
String
None
The path to the directory used for the user kernel cache. Subdirectories
beginning with wp_ will be created in this directory. If None,
a directory will be automatically determined using the value of the
environment variable WARP_CACHE_PATH or the
appdirs.user_cache_directory
if WARP_CACHE_PATH is also not set. kernel_cache_dir will be
updated to reflect the location of the cache directory used.
enable_backward
Boolean
True
If True, backward passes of kernels will be compiled by default.
Disabling this setting can reduce kernel compilation times.
enable_graph_capture_module_load_by_default
Boolean
True
If True, wp.capture_begin() will call wp.force_load() to
compile and load Warp kernels from all imported modules before graph
capture if the force_module_load argument is not explicitly provided
to wp.capture_begin(). This setting is ignored if the CUDA driver
supports CUDA 12.3 or newer.
enable_mempools_at_init
Boolean
False
If True, wp.init() will enable pooled allocators on all CUDA
devices that support memory pools.
Pooled allocators are generally faster and can be used during CUDA graph
capture. For the caveats, see CUDA Pooled Allocators documentation.
Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of
option pairs to wp.set_module_options().
For example, compilation of backward passes for the kernel in an entire module can be disabled with:
wp.set_module_options({"enable_backward":False})
The options for a module can also be queried using wp.get_module_options().
Field
Type
Default Value
Description
mode
String
Global
setting
Controls whether to compile the module’s kernels in debug or release
mode by default. Valid choices are "release" or "debug".
max_unroll
Integer
Global
setting
The maximum fixed-size loop to unroll. Note that max_unroll does not
consider the total number of iterations in nested loops. This can result
in a large amount of automatically generated code if each nested loop is
below the max_unroll threshold.
enable_backward
Boolean
Global
setting
If True, backward passes of kernels will be compiled by default.
Valid choices are "release" or "debug".
fast_math
Boolean
False
If True, CUDA kernels will be compiled with the --use_fast_math
compiler option, which enables some fast math operations that are faster
but less accurate.
cuda_output
String
None
The preferred CUDA output format for kernels. Valid choices are None,
"ptx", and "cubin". If None, a format will be determined
automatically. The module-level setting takes precedence over the global
setting.
enable_backward is currently the only setting that can also be configured on a per-kernel level.
Backward-pass compilation can be disabled by passing an argument into the @wp.kernel decorator
as in the following example: