warp.launch#

warp.launch( kernel, dim, inputs=[], outputs=[], adj_inputs=[], adj_outputs=[], device=None, stream=None, adjoint=False, record_tape=True, record_cmd=False, max_blocks=0, block_dim=256, )[source]#

Launch a Warp kernel on the target device

Kernel launches are asynchronous with respect to the calling Python thread.

Parameters:

kernel – The name of a Warp kernel function, decorated with the @wp.kernel decorator
dim (int | Sequence[int]) – The number of threads to launch the kernel, can be an integer or a sequence of integers with a maximum of 4 dimensions.
inputs (Sequence) – The input parameters to the kernel (optional)
outputs (Sequence) – The output parameters (optional)
adj_inputs (Sequence) – The adjoint inputs (optional)
adj_outputs (Sequence) – The adjoint outputs (optional)
device (Device | str | None) – The device to launch on.
stream (Stream | None) – The stream to launch on.
adjoint (bool) – Whether to run forward or backward pass (typically use False).
record_tape (bool) – When True, the launch will be recorded the global warp.Tape object when present.
record_cmd (bool) – When True, the launch will return a Launch object. The launch will not occur until the user calls Launch.launch().
max_blocks (int) – The maximum number of CUDA thread blocks to use. Only has an effect for CUDA kernel launches. If negative or zero, the maximum hardware value will be used.
block_dim (int) – The number of threads per block (always 1 for “cpu” devices).