warp#

The warp package provides array types and functions for creating and manipulating multi-dimensional data on CPU and CUDA devices. It includes kernel and function decorators (kernel(), func()) for defining parallel code, along with a comprehensive set of built-in types and functions for use within kernels (see Built-Ins).

The package provides device management, kernel launch and synchronization functions, automatic differentiation via Tape recording, type introspection and construction utilities, and module compilation and caching.

Additional functionality is available in optional submodules that must be explicitly imported, such as warp.render for visualization, warp.fem for finite element methods, and warp.sparse for sparse linear algebra.

Submodules#

These modules are automatically available when you import warp.

Additional Submodules#

These modules must be explicitly imported (e.g., import warp.autograd).

Type Annotations#

DeviceLike

alias of Device | str | None

Float

Type variable.

Int

Type variable.

Scalar

Type variable.

Data Types#

Scalars#

Vectors#

Matrices#

Quaternions#

quat

alias of quatf

quatd

quatf

quath

quat_between_vectors

Compute the quaternion that rotates vector a to vector b

Transformations#

Spatial Vectors and Matrices#

Arrays#

array

A fixed-size multi-dimensional array containing values of the same type.

fixedarray

A fixed-size, stack allocated, array containing values of the same type.

tile

A Warp tile object.

array1d

array2d

array3d

array4d

clone

Clone an existing array, allocates a copy of the src memory

copy

Copy array contents from src to dest.

empty

Returns an uninitialized array

empty_like

Return an uninitialized array with the same type and dimension of another array

from_ptr

full

Return an array with all elements initialized to the given value

full_like

Return an array with all elements initialized to the given value with the same type and dimension of another array

ones

Return a one-initialized array

ones_like

Return a one-initialized array with the same type and dimension of another array

zeros

Return a zero-initialized array

zeros_like

Return a zero-initialized array with the same type and dimension of another array

Indexed Arrays#

Spatial Acceleration#

Bvh

BvhQuery

Object used to track state during BVH traversal.

BvhQueryTiled

Object used to track state during thread-block parallel BVH traversal.

HashGrid

HashGridQuery

Object used to track state during neighbor traversal.

Mesh

MeshQueryAABB

Object used to track state during mesh traversal.

MeshQueryAABBTiled

Object used to track state during thread-block parallel mesh traversal.

MeshQueryPoint

Output for the mesh query point functions.

MeshQueryRay

Output for the mesh query ray functions.

Volume

Runtime#

clear_kernel_cache

Clear the kernel cache directory of previously generated source code and compiler artifacts.

clear_lto_cache

Clear the LTO cache directory of previously generated LTO code.

init

Initialize the Warp runtime.

is_cpu_available

is_cuda_available

Kernel Programming#

WarpCodegenAttributeError

WarpCodegenError

WarpCodegenIndexError

WarpCodegenKeyError

WarpCodegenTypeError

WarpCodegenValueError

func

func_grad

Decorator to register a custom gradient function for a given forward function.

func_native

Decorator to register native code snippet, @func_native

func_replay

Decorator to register a custom replay function for a given forward function.

grad

Return a callable that computes the gradient of the given function.

kernel

Decorator to register a Warp kernel from a Python function.

map

Map a function over the elements of one or more arrays.

overload

Overload a generic kernel with the given argument types.

static

Evaluates a static expression and replaces the expression with its result.

struct

Kernel Execution#

Function

Kernel

Launch

Represents all data required for a kernel launch so that launches can be replayed quickly.

Module

launch

Launch a Warp kernel on the target device

launch_tiled

A helper method for launching a grid with an extra trailing dimension equal to the block size.

synchronize

Manually synchronize the calling CPU thread with any outstanding CUDA work on all devices

Automatic Differentiation#

Tape

Record kernel launches within a Tape scope to enable automatic differentiation.

Device Management#

Device

A device to allocate Warp arrays and to launch kernels on.

ScopedDevice

A context manager to temporarily change the current default device.

get_cuda_device

Returns the CUDA device with the given ordinal or the current CUDA device if ordinal is None.

get_cuda_device_count

Returns the number of CUDA devices supported in this environment.

get_cuda_devices

Returns a list of CUDA devices supported in this environment.

get_cuda_supported_archs

Return a sorted list of CUDA compute architectures that can be used as compilation targets.

get_device

Returns the device identified by the argument.

get_devices

Returns a list of devices supported in this environment.

get_preferred_device

Returns the preferred compute device, cuda:0 if available and cpu otherwise.

is_device_available

map_cuda_device

Assign a device alias to a CUDA context.

set_device

Sets the default device identified by the argument.

synchronize_device

Synchronize the calling CPU thread with any outstanding CUDA work on the specified device

unmap_cuda_device

Remove a CUDA device with the given alias.

Module Management#

compile_aot_module

Compile a module (ahead of time) for a given device.

force_load

Force user-defined kernels to be compiled and loaded (low-level API).

get_module

get_module_options

Returns a list of options for the current module.

load_aot_module

Load a previously compiled module (ahead of time).

load_module

Force a user-defined module to be compiled and loaded.

set_module_options

Set options for the current module.

CUDA Stream Management#

ScopedStream

A context manager to temporarily change the current stream on a device.

Stream

get_stream

Return the stream currently used by the given device.

set_stream

Convenience function for calling Device.set_stream() on the given device.

synchronize_stream

Synchronize the calling CPU thread with any outstanding CUDA work on the specified stream.

wait_stream

Convenience function for calling Stream.wait_stream() on the current stream.

CUDA Event Management#

Event

A CUDA event that can be recorded onto a stream.

get_event_elapsed_time

Get the elapsed time between two recorded events.

record_event

Convenience function for calling Stream.record_event() on the current stream.

synchronize_event

Synchronize the calling CPU thread with an event recorded on a CUDA stream.

wait_event

Convenience function for calling Stream.wait_event() on the current stream.

CUDA Memory Management#

ScopedMempool

ScopedMempoolAccess

ScopedPeerAccess

get_mempool_release_threshold

Get the CUDA memory pool release threshold on the device.

get_mempool_used_mem_current

Get the amount of memory from the device's memory pool that is currently in use by the application.

get_mempool_used_mem_high

Get the application's memory usage high-water mark from the device's CUDA memory pool.

is_mempool_access_enabled

Check if peer_device can currently access the memory pool of target_device.

is_mempool_access_supported

Check if peer_device can directly access the memory pool of target_device.

is_mempool_enabled

Check if CUDA memory pool allocators are enabled on the device.

is_mempool_supported

Check if CUDA memory pool allocators are available on the device.

is_peer_access_enabled

Check if peer_device can currently access the memory of target_device.

is_peer_access_supported

Check if peer_device can directly access the memory of target_device on this system.

set_mempool_access_enabled

Enable or disable access from peer_device to the memory pool of target_device.

set_mempool_enabled

Enable or disable CUDA memory pool allocators on the device.

set_mempool_release_threshold

Set the CUDA memory pool release threshold on the device.

set_peer_access_enabled

Enable or disable direct access from peer_device to the memory of target_device.

CUDA Graph Management#

ScopedCapture

capture_begin

Begin capture of a CUDA graph

capture_debug_dot_print

Export a CUDA graph to a DOT file for visualization

capture_end

End the capture of a CUDA graph.

capture_if

Create a dynamic branch based on a condition.

capture_launch

Launch a previously captured CUDA graph

capture_while

Create a dynamic loop based on a condition.

is_conditional_graph_supported

CUDA Interprocess Communication#

event_from_ipc_handle

Create an event from an IPC handle.

from_ipc_handle

Create an array from an IPC handle.

Profiling#

ScopedTimer

TimingResult

Timing result for a single activity.

timing_begin

Begin detailed activity timing.

timing_end

End detailed activity timing.

timing_print

Print timing results.

Timing Flags#

NumPy Interop#

dtype_from_numpy

Return the Warp dtype corresponding to a NumPy dtype.

dtype_to_numpy

Return the NumPy dtype corresponding to a Warp dtype.

from_numpy

Returns a Warp array created from a NumPy array.

DLPack Interop#

from_dlpack

Convert a source array or DLPack capsule into a Warp array without copying.

to_dlpack

Convert a Warp array to another type of DLPack-compatible array.

JAX Interop#

device_from_jax

Return the Warp device corresponding to a Jax device.

device_to_jax

Return the Jax device corresponding to a Warp device.

dtype_from_jax

Return the Warp dtype corresponding to a Jax dtype.

dtype_to_jax

Return the Jax dtype corresponding to a Warp dtype.

from_jax

Convert a Jax array to a Warp array without copying the data.

to_jax

Convert a Warp array to a Jax array without copying the data.

PyTorch Interop#

device_from_torch

Return the Warp device corresponding to a Torch device.

device_to_torch

Return the Torch device string corresponding to a Warp device.

dtype_from_torch

Return the Warp dtype corresponding to a Torch dtype.

dtype_to_torch

Return the Torch dtype corresponding to a Warp dtype.

from_torch

Convert a Torch tensor to a Warp array without copying the data.

stream_from_torch

Convert from a Torch CUDA stream to a Warp CUDA stream.

stream_to_torch

Convert from a Warp CUDA stream to a Torch CUDA stream.

to_torch

Convert a Warp array to a Torch tensor without copying the data.

Omniverse Runtime Fabric Interop#

Paddle Interop#

device_from_paddle

Return the Warp device corresponding to a Paddle device.

device_to_paddle

Return the Paddle device string corresponding to a Warp device.

dtype_from_paddle

Return the Warp dtype corresponding to a Paddle dtype.

dtype_to_paddle

Return the Paddle dtype corresponding to a Warp dtype.

from_paddle

Convert a Paddle tensor to a Warp array without copying the data.

stream_from_paddle

Convert from a Paddle CUDA stream to a Warp CUDA stream.

to_paddle

Convert a Warp array to a Paddle tensor without copying the data.

Constants#

constant

Function to declare compile-time constants accessible from Warp kernels

E

HALF_PI

INF

LN2

LN10

LOG10E

LOG2E

NAN

PHI

PI

TAU

e

half_pi

inf

ln2

ln10

log10e

log2e

nan

phi

pi

tau

Misc#

MarchingCubes

A reusable context for marching cubes surface extraction.

RegisteredGLBuffer

Helper class to register a GL buffer with CUDA so that it can be mapped to a Warp array.