Configuration#

Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: enable_backward), the setting at the more-specific scope takes precedence.

Global Settings#

Settings can be modified by direct assignment before or after calling wp.init(), though some settings only take effect if set prior to initialization.

For example, the location of the user kernel cache can be changed with:

import os

import warp as wp

example_dir = os.path.dirname(os.path.realpath(__file__))

# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")

wp.init()

See warp.config for a complete list of global settings.

Module Settings#

Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of option pairs to wp.set_module_options().

For example, compilation of backward passes for the kernel in an entire module can be disabled with:

wp.set_module_options({"enable_backward": False})

The options for a module can also be queried using wp.get_module_options().

Field	Type	Default Value	Description
`mode`	String	`None`	A module-level override of the `warp.config.mode` setting. `None` defers to the global setting at compile time.
`optimization_level`	Integer	`None`	A module-level override of the `warp.config.optimization_level` setting. `None` defers to the global setting at compile time.
`max_unroll`	Integer	Global setting	A module-level override of the `warp.config.max_unroll` setting.
`enable_backward`	Boolean	Global setting	A module-level override of the `warp.config.enable_backward` setting.
`fast_math`	Boolean	`False`	If `True`, CUDA kernels will be compiled with the `--use_fast_math` compiler option, which enables some fast math operations that are faster but less accurate.
`fuse_fp`	Boolean	`True`	If `True`, allow compilers to emit fused floating point operations such as fused-multiply-add. This may improve numerical accuracy and is generally recommended. Setting to `False` can help ensuring that functionally equivalent kernels will produce identical results unaffected by the presence or absence of fused operations.
`lineinfo`	Boolean	Global setting	A module-level override of the `warp.config.lineinfo` setting.
`compile_time_trace`	Boolean	Global setting	A module-level override of the `warp.config.compile_time_trace` setting.
`cuda_output`	String	`None`	A module-level override of the `warp.config.cuda_output` setting.
`block_dim`	Integer	256	The number of CUDA threads per block that kernels in the module will be compiled for.
`strip_hash`	Boolean	`False`	If `True`, avoids using a content-based hash to identify the module and its functions.
`enable_mathdx_gemm`	Boolean	`None`	A module-level override of the `warp.config.enable_mathdx_gemm` setting. `None` defers to the global setting at compile time.

Kernel Settings#

Kernel-level settings can be passed as arguments to the @wp.kernel decorator.

Field	Type	Default Value	Description
`enable_backward`	Boolean	`None`	If `False`, the backward pass will not be generated for this kernel. If `None`, inherits from the module/global setting.
`module`	Module \| `"unique"` \| str	`None`	Controls which module the kernel belongs to. If `"unique"`, the kernel is assigned to a new module named after the kernel (with a hash suffix). If a plain string is provided, the kernel is registered in the module with that name. If `None`, the module is inferred from the function’s module.
`launch_bounds`	int \| tuple	`None`	CUDA `__launch_bounds__` attribute for the kernel. Can be an int (`maxThreadsPerBlock`) or a tuple of 1–2 ints `(maxThreadsPerBlock, minBlocksPerMultiprocessor)`. Only applies to CUDA kernels. The `block_dim` parameter in `warp.launch()` must not exceed the `maxThreadsPerBlock` value specified here.

@wp.kernel(enable_backward=False)
def scale_2(
    x: wp.array[float],
    y: wp.array[float],
):
    y[0] = x[0] ** 2.0


@wp.kernel(module="unique")
def isolated_kernel(a: wp.array[float], b: wp.array[float]):
    # This kernel will be registered in a new unique module created
    # just for this kernel and its dependent functions and structs
    tid = wp.tid()
    b[tid] = a[tid] + 1.0


@wp.kernel(launch_bounds=(256, 1))
def bounded_kernel(a: wp.array[float]):
    # CUDA __launch_bounds__ will be set to (256, 1)
    tid = wp.tid()
    a[tid] = a[tid] * 2.0