Configuration#
Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity
of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: enable_backward),
the setting at the more-specific scope takes precedence.
Global Settings#
Settings can be modified by direct assignment before or after calling wp.init(),
though some settings only take effect if set prior to initialization.
For example, the location of the user kernel cache can be changed with:
import os
import warp as wp
example_dir = os.path.dirname(os.path.realpath(__file__))
# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")
wp.init()
- warp.config.verify_fp: bool = False#
Enable floating-point verification for inputs and outputs.
When enabled, checks if all values are finite before and after operations.
Note: Enabling this flag impacts performance.
- warp.config.verify_cuda: bool = False#
Enable CUDA error checking after kernel launches.
This setting cannot be used during graph capture
Note: Enabling this flag impacts performance
- warp.config.print_launches: bool = False#
Enable detailed kernel launch logging.
Prints information about each kernel launch including:
Launch dimensions
Input/output parameters
Target device
Note: Enabling this flag impacts performance.
- warp.config.mode: str = 'release'#
Compilation mode for Warp kernels.
- Parameters:
mode – Either
"release"or"debug".
Note: Debug mode may impact performance.
This setting can be overridden at the module level by setting the
"mode"module option.
- warp.config.verbose_warnings: bool = False#
Enable extended warning messages with source location information.
- warp.config.quiet: bool = False#
Disable Warp module initialization messages.
Error messages and warnings remain unaffected.
- warp.config.verify_autograd_array_access: bool = False#
Enable warnings for array overwrites that may affect gradient computation.
- warp.config.enable_vector_component_overwrites: bool = False#
Allow multiple writes to vector/matrix/quaternion components.
Note: Enabling this may significantly increase kernel compilation time.
- warp.config.kernel_cache_dir: str | None = None#
Directory path for storing compiled kernel cache.
If
None, the path is determined in the following order:WARP_CACHE_PATHenvironment variable.System’s user cache directory (via
appdirs.user_cache_directory).
Note: Subdirectories prefixed with
wp_will be created in this location.
- warp.config.cuda_output: str | None = None#
Preferred CUDA output format for kernel compilation.
- Parameters:
cuda_output – One of {
None,"ptx","cubin"}. IfNone, format is auto-determined.
- warp.config.ptx_target_arch: int | None = None#
Target architecture version for PTX generation, e.g.,
ptx_target_arch = 75.If
None, the architecture is determined by devices present in the system.
- warp.config.lineinfo: bool = False#
Enable the compilation of modules with line information.
Modules compiled for GPU execution will be compiled with the
--generate-line-infocompiler option, which generates line-number information for device code. Line-number information is always included when compiling a module in"debug"mode regardless of this setting.This setting can be overridden at the module level by setting the
"lineinfo"module option.
- warp.config.line_directives: bool = True#
Enable Python source line mapping in generated code.
If
True,#linedirectives are inserted in generated code for modules compiled with line information to map back to the original Python source file.
- warp.config.compile_time_trace: bool = False#
Enable the generation of Trace Event Format files for runtime module compilation.
These are JSON files that can be opened by tools like
edge://tracing/andchrome://tracing/.This setting is currently only effective when compiling modules for the GPU with NVRTC (CUDA 12.8+).
This setting can be overridden at the module level by setting the
"compile_time_trace"module option.
- warp.config.enable_backward: bool = True#
Enable compilation of kernel backward passes.
This setting can be overridden at the module level by setting the
"enable_backward"module option.
- warp.config.enable_graph_capture_module_load_by_default: bool = True#
Enable automatic module loading before graph capture.
Only affects systems with CUDA driver versions below 12.3.
- warp.config.enable_mempools_at_init: bool = True#
Enable CUDA memory pools during device initialization when supported.
- warp.config.max_unroll: int = 16#
Maximum unroll factor for loops.
Note that
max_unrolldoes not consider the total number of iterations in nested loops. This can result in a large amount of automatically generated code if each nested loop is below themax_unrollthreshold.This setting can be overridden at the module level by setting the
"max_unroll"module option.
Module Settings#
Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of
option pairs to wp.set_module_options().
For example, compilation of backward passes for the kernel in an entire module can be disabled with:
wp.set_module_options({"enable_backward": False})
The options for a module can also be queried using wp.get_module_options().
Field |
Type |
Default Value |
Description |
|---|---|---|---|
|
String |
Global setting |
A module-level override of the |
|
Integer |
Global setting |
A module-level override of the |
|
Boolean |
Global setting |
A module-level override of the |
|
Boolean |
|
If |
|
Boolean |
|
If |
|
Boolean |
Global setting |
A module-level override of the |
|
Boolean |
Global setting |
A module-level override of the |
|
String |
|
A module-level override of the |
|
Integer |
256 |
The number of CUDA threads per block that kernels in the module will be compiled for. |
|
Boolean |
|
If |
Kernel Settings#
Backward-pass compilation can be disabled on a per-kernel basis by passing the enable_backward argument into the @wp.kernel decorator
as in the following example:
@wp.kernel(enable_backward=False)
def scale_2(
x: wp.array(dtype=float),
y: wp.array(dtype=float),
):
y[0] = x[0] ** 2.0