.. _Configuration: Configuration ============= Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity of Warp programs. In cases in which a setting can be changed at multiple levels (e.g.: ``enable_backward``), the setting at the more-specific scope takes precedence. .. _global-settings: Global Settings --------------- To change a setting, prepend ``wp.config.`` to the name of the variable and assign a value to it. Some settings may be changed on the fly, while others need to be set prior to calling ``wp.init()`` to take effect. For example, the location of the user kernel cache can be changed with: .. code-block:: python import os import warp as wp example_dir = os.path.dirname(os.path.realpath(__file__)) # set default cache directory before wp.init() wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1") wp.init() Basic Global Settings ^^^^^^^^^^^^^^^^^^^^^ +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ | Field | Type |Default Value| Description | +================================================+=========+=============+==========================================================================+ |``verify_fp`` | Boolean | ``False`` | If ``True``, Warp will check that inputs and outputs are finite before | | | | | and/or after various operations. **Has performance implications.** | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``verify_cuda`` | Boolean | ``False`` | If ``True``, Warp will check for CUDA errors after every launch and | | | | | memory operation. CUDA error verification cannot be used during graph | | | | | capture. **Has performance implications.** | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``print_launches`` | Boolean | ``False`` | If ``True``, Warp will print details of every kernel launch to standard | | | | | out (e.g. launch dimensions, inputs, outputs, device, etc.). | | | | | **Has performance implications.** | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``mode`` | String |``"release"``| Controls whether to compile Warp kernels in debug or release mode. | | | | | Valid choices are ``"release"`` or ``"debug"``. | | | | | **Has performance implications.** | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``max_unroll`` | Integer | Global | The maximum fixed-size loop to unroll. Note that ``max_unroll`` does not | | | | setting | consider the total number of iterations in nested loops. This can result | | | | | in a large amount of automatically generated code if each nested loop is | | | | | below the ``max_unroll`` threshold. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``verbose`` | Boolean | ``False`` | If ``True``, additional information will be printed to standard out | | | | | during code generation, compilation, etc. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``verbose_warnings`` | Boolean | ``False`` | If ``True``, Warp warnings will include extra information such as | | | | | the source file and line number. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``quiet`` | Boolean | ``False`` | If ``True``, Warp module initialization messages will be disabled. | | | | | This setting does not affect error messages and warnings. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``kernel_cache_dir`` | String | ``None`` | The path to the directory used for the user kernel cache. Subdirectories | | | | | beginning with ``wp_`` will be created in this directory. If ``None``, | | | | | a directory will be automatically determined using the value of the | | | | | environment variable ``WARP_CACHE_PATH`` or the | | | | | `appdirs.user_cache_directory `_ | | | | | if ``WARP_CACHE_PATH`` is also not set. ``kernel_cache_dir`` will be | | | | | updated to reflect the location of the cache directory used. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``enable_backward`` | Boolean | ``True`` | If ``True``, backward passes of kernels will be compiled by default. | | | | | Disabling this setting can reduce kernel compilation times. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``enable_graph_capture_module_load_by_default`` | Boolean | ``True`` | If ``True``, ``wp.capture_begin()`` will call ``wp.force_load()`` to | | | | | compile and load Warp kernels from all imported modules before graph | | | | | capture if the ``force_module_load`` argument is not explicitly provided | | | | | to ``wp.capture_begin()``. This setting is ignored if the CUDA driver | | | | | supports CUDA 12.3 or newer. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ |``enable_mempools_at_init`` | Boolean | ``False`` | If ``True``, ``wp.init()`` will enable pooled allocators on all CUDA | | | | | devices that support memory pools. | | | | | Pooled allocators are generally faster and can be used during CUDA graph | | | | | capture. For the caveats, see CUDA Pooled Allocators documentation. | +------------------------------------------------+---------+-------------+--------------------------------------------------------------------------+ Advanced Global Settings ^^^^^^^^^^^^^^^^^^^^^^^^ +--------------------+---------+-------------+--------------------------------------------------------------------------+ | Field | Type |Default Value| Description | +====================+=========+=============+==========================================================================+ |``cache_kernels`` | Boolean | ``True`` | If ``True``, kernels that have already been compiled from previous | | | | | application launches will not be recompiled. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``cuda_output`` | String | ``None`` | The preferred CUDA output format for kernels. Valid choices are ``None``,| | | | | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined | | | | | automatically. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``ptx_target_arch`` | Integer | 70 | The target architecture for PTX generation. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``llvm_cuda`` | Boolean | ``False`` | If ``True``, Clang/LLVM will be used to compile CUDA code instead of | | | | | NVTRC. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ Module Settings --------------- Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of option pairs to ``wp.set_module_options()``. For example, compilation of backward passes for the kernel in an entire module can be disabled with: .. code:: python wp.set_module_options({"enable_backward": False}) The options for a module can also be queried using ``wp.get_module_options()``. +--------------------+---------+-------------+--------------------------------------------------------------------------+ | Field | Type |Default Value| Description | +====================+=========+=============+==========================================================================+ |``mode`` | String | Global | Controls whether to compile the module's kernels in debug or release | | | | setting | mode by default. Valid choices are ``"release"`` or ``"debug"``. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``max_unroll`` | Integer | Global | The maximum fixed-size loop to unroll. Note that ``max_unroll`` does not | | | | setting | consider the total number of iterations in nested loops. This can result | | | | | in a large amount of automatically generated code if each nested loop is | | | | | below the ``max_unroll`` threshold. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``enable_backward`` | Boolean | Global | If ``True``, backward passes of kernels will be compiled by default. | | | | setting | Valid choices are ``"release"`` or ``"debug"``. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``fast_math`` | Boolean | ``False`` | If ``True``, CUDA kernels will be compiled with the ``--use_fast_math`` | | | | | compiler option, which enables some fast math operations that are faster | | | | | but less accurate. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ |``cuda_output`` | String | ``None`` | The preferred CUDA output format for kernels. Valid choices are ``None``,| | | | | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined | | | | | automatically. The module-level setting takes precedence over the global | | | | | setting. | +--------------------+---------+-------------+--------------------------------------------------------------------------+ Kernel Settings --------------- ``enable_backward`` is currently the only setting that can also be configured on a per-kernel level. Backward-pass compilation can be disabled by passing an argument into the ``@wp.kernel`` decorator as in the following example: .. code-block:: python @wp.kernel(enable_backward=False) def scale_2( x: wp.array(dtype=float), y: wp.array(dtype=float), ): y[0] = x[0] ** 2.0