Configuration#

Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: enable_backward), the setting at the more-specific scope takes precedence.

Global Settings#

Settings can be modified by direct assignment before or after calling wp.init(), though some settings only take effect if set prior to initialization.

For example, the location of the user kernel cache can be changed with:

import os

import warp as wp

example_dir = os.path.dirname(os.path.realpath(__file__))

# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")

wp.init()
warp.config.version: str = '1.6.1'#

Warp version string

warp.config.verify_fp: bool = False#

Enable floating-point verification for inputs and outputs.

When enabled, checks if all values are finite before and after operations.

Note: Enabling this flag impacts performance.

warp.config.verify_cuda: bool = False#

Enable CUDA error checking after kernel launches.

This setting cannot be used during graph capture

Note: Enabling this flag impacts performance

warp.config.print_launches: bool = False#

Enable detailed kernel launch logging.

Prints information about each kernel launch including:

  • Launch dimensions

  • Input/output parameters

  • Target device

Note: Enabling this flag impacts performance.

warp.config.mode: str = 'release'#

Compilation mode for Warp kernels.

Parameters:

mode – Either "release" or "debug".

Note: Debug mode may impact performance.

warp.config.verbose: bool = False#

Enable detailed logging during code generation and compilation.

warp.config.verbose_warnings: bool = False#

Enable extended warning messages with source location information.

warp.config.quiet: bool = False#

Disable Warp module initialization messages.

Error messages and warnings remain unaffected.

warp.config.verify_autograd_array_access: bool = False#

Enable warnings for array overwrites that may affect gradient computation.

warp.config.enable_vector_component_overwrites: bool = False#

Allow multiple writes to vector/matrix/quaternion components.

Note: Enabling this may significantly increase kernel compilation time.

warp.config.cache_kernels: bool = True#

Enable kernel caching between application launches.

warp.config.kernel_cache_dir: str | None = None#

Directory path for storing compiled kernel cache.

If None, the path is determined in the following order:

  1. WARP_CACHE_PATH environment variable.

  2. System’s user cache directory (via appdirs.user_cache_directory).

Note: Subdirectories prefixed with wp_ will be created in this location.

warp.config.cuda_output: str | None = None#

Preferred CUDA output format for kernel compilation.

Parameters:

cuda_output – One of {None, "ptx", "cubin"}. If None, format is auto-determined.

warp.config.ptx_target_arch: int = 75#

Target architecture version for PTX generation.

Defaults to minimum architecture version supporting all Warp features.

warp.config.enable_backward: bool = True#

Enable compilation of kernel backward passes.

warp.config.llvm_cuda: bool = False#

Use Clang/LLVM compiler instead of NVRTC for CUDA compilation.

warp.config.enable_graph_capture_module_load_by_default: bool = True#

Enable automatic module loading before graph capture.

Only affects systems with CUDA driver versions below 12.3.

warp.config.enable_mempools_at_init: bool = True#

Enable CUDA memory pools during device initialization when supported.

warp.config.max_unroll: int = 16#

Maximum unroll factor for loops.

Module Settings#

Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of option pairs to wp.set_module_options().

For example, compilation of backward passes for the kernel in an entire module can be disabled with:

wp.set_module_options({"enable_backward": False})

The options for a module can also be queried using wp.get_module_options().

Field

Type

Default Value

Description

mode

String

Global setting

Controls whether to compile the module’s kernels in debug or release mode by default. Valid choices are "release" or "debug".

max_unroll

Integer

Global setting

The maximum fixed-size loop to unroll. Note that max_unroll does not consider the total number of iterations in nested loops. This can result in a large amount of automatically generated code if each nested loop is below the max_unroll threshold.

enable_backward

Boolean

Global setting

If True, backward passes of kernels will be compiled by default. Valid choices are "release" or "debug".

fast_math

Boolean

False

If True, CUDA kernels will be compiled with the --use_fast_math compiler option, which enables some fast math operations that are faster but less accurate.

fuse_fp

Boolean

True

If True, allow compilers to emit fused floating point operations such as fused-multiply-add. This may improve numerical accuracy and is generally recommended. Setting to False can help ensuring that functionally equivalent kernels will produce identical results unaffected by the presence or absence of fused operations.

lineinfo

Boolean

False

If True, CUDA kernels will be compiled with the --generate-line-info compiler option, which generates line-number information for device code, e.g. to allow NVIDIA Nsight Compute to correlate CUDA-C source and SASS. Line-number information is always included when compiling kernels in "debug" mode regardless of this setting.

cuda_output

String

None

The preferred CUDA output format for kernels. Valid choices are None, "ptx", and "cubin". If None, a format will be determined automatically. The module-level setting takes precedence over the global setting.

Kernel Settings#

enable_backward is currently the only setting that can also be configured on a per-kernel level. Backward-pass compilation can be disabled by passing an argument into the @wp.kernel decorator as in the following example:

@wp.kernel(enable_backward=False)
def scale_2(
    x: wp.array(dtype=float),
    y: wp.array(dtype=float),
):
    y[0] = x[0] ** 2.0