warp.config#

Global configuration settings for Warp.

This module provides settings to control compilation behavior, debugging, performance, and runtime behavior of Warp kernels and modules. Settings exist at the global, module, and kernel levels, with more specific scopes taking precedence.

Settings can be modified by direct assignment before or after calling warp.init(), though some settings only take effect if set prior to initialization. See individual setting documentation for details.

For information on module-level and kernel-level settings, see Configuration.

API#

`cache_kernels`	Enable kernel caching between application launches.
`compile_time_trace`	Enable the generation of Trace Event Format files for runtime module compilation.
`cuda_arch_suffix`	CUDA architecture suffix for kernel compilation.
`cuda_output`	Preferred CUDA output format for kernel compilation.
`enable_backward`	Enable compilation of kernel backward passes.
`enable_graph_capture_module_load_by_default`	Enable automatic module loading before graph capture.
`enable_mathdx_gemm`	Use libmathdx (cuBLASDx) for tile_matmul on GPU when available.
`enable_mempools_at_init`	Enable CUDA memory pools during device initialization when supported.
`enable_tiles_in_stack_memory`	Use stack memory instead of static memory for tile allocations on the CPU.
`enable_vector_component_overwrites`	Allow multiple writes to vector/matrix/quaternion components.
`kernel_cache_dir`	Directory path for storing compiled kernel cache.
`legacy_scalar_return_types`	Use legacy scalar return types from built-in functions and indexing.
`line_directives`	Enable Python source line mapping in generated code.
`lineinfo`	Enable the compilation of modules with line information.
`llvm_cuda`	Use Clang/LLVM compiler instead of NVRTC for CUDA compilation.
`load_module_max_workers`	Default number of worker threads for compiling and loading modules in parallel.
`max_unroll`	Maximum unroll factor for loops.
`mode`	Compilation mode for Warp kernels.
`optimization_level`	Optimization level for Warp kernels.
`print_launches`	Enable detailed kernel launch logging.
`ptx_target_arch`	Target architecture version for PTX generation, e.g., `ptx_target_arch = 75`.
`quiet`	Disable Warp module initialization messages.
`use_precompiled_headers`	Enable the use of precompiled headers during kernel compilation.
`verbose`	Enable detailed logging during code generation and compilation.
`verbose_warnings`	Enable extended warning messages with source location information.
`verify_autograd_array_access`	Enable warnings for array overwrites that may affect gradient computation.
`verify_cuda`	Enable CUDA error checking after kernel launches.
`verify_fp`	Enable floating-point verification for inputs and outputs.
`version`	Warp version string