warp.config#

Global configuration settings for Warp.

This module provides settings to control compilation behavior, debugging, performance, and runtime behavior of Warp kernels and modules. Settings exist at the global, module, and kernel levels, with more specific scopes taking precedence.

Settings can be modified by direct assignment before or after calling warp.init(), though some settings only take effect if set prior to initialization. See individual setting documentation for details.

For information on module-level and kernel-level settings, see Configuration.

API#

cache_kernels

Enable kernel caching between application launches.

compile_time_trace

Enable the generation of Trace Event Format files for runtime module compilation.

cuda_output

Preferred CUDA output format for kernel compilation.

enable_backward

Enable compilation of kernel backward passes.

enable_graph_capture_module_load_by_default

Enable automatic module loading before graph capture.

enable_mempools_at_init

Enable CUDA memory pools during device initialization when supported.

enable_tiles_in_stack_memory

Use stack memory instead of static memory for tile allocations on the CPU.

enable_vector_component_overwrites

Allow multiple writes to vector/matrix/quaternion components.

kernel_cache_dir

Directory path for storing compiled kernel cache.

line_directives

Enable Python source line mapping in generated code.

lineinfo

Enable the compilation of modules with line information.

llvm_cuda

Use Clang/LLVM compiler instead of NVRTC for CUDA compilation.

load_module_max_workers

Default number of worker threads for compiling and loading modules in parallel.

max_unroll

Maximum unroll factor for loops.

mode

Compilation mode for Warp kernels.

optimization_level

Optimization level for Warp kernels.

print_launches

Enable detailed kernel launch logging.

ptx_target_arch

Target architecture version for PTX generation, e.g., ptx_target_arch = 75.

quiet

Disable Warp module initialization messages.

use_precompiled_headers

Enable the use of precompiled headers during kernel compilation.

verbose

Enable detailed logging during code generation and compilation.

verbose_warnings

Enable extended warning messages with source location information.

verify_autograd_array_access

Enable warnings for array overwrites that may affect gradient computation.

verify_cuda

Enable CUDA error checking after kernel launches.

verify_fp

Enable floating-point verification for inputs and outputs.

version

Warp version string