warp#

The warp package provides array types and functions for creating and manipulating multi-dimensional data on CPU and CUDA devices. It includes kernel and function decorators (kernel(), func()) for defining parallel code, along with a comprehensive set of built-in types and functions for use within kernels (see Built-Ins).

The package provides device management, kernel launch and synchronization functions, automatic differentiation via Tape recording, type introspection and construction utilities, and module compilation and caching.

Additional functionality is available in optional submodules that must be explicitly imported, such as warp.render for visualization, warp.fem for finite element methods, and warp.sparse for sparse linear algebra.

Submodules#

These modules are automatically available when you import warp.

Additional Submodules#

These modules must be explicitly imported (e.g., import warp.autograd).

Type Annotations#

DeviceLike

alias of Device | str | None

Float

Type variable.

Int

Type variable.

Scalar

Type variable.

Data Types#

Scalars#

bool

Boolean scalar type for use in Warp kernels and arrays.

float16

16-bit half-precision floating-point scalar type.

float32

32-bit single-precision floating-point scalar type.

float64

64-bit double-precision floating-point scalar type.

int8

8-bit signed integer scalar type.

int16

16-bit signed integer scalar type.

int32

32-bit signed integer scalar type.

int64

64-bit signed integer scalar type.

uint8

8-bit unsigned integer scalar type.

uint16

16-bit unsigned integer scalar type.

uint32

32-bit unsigned integer scalar type.

uint64

64-bit unsigned integer scalar type.

Vectors#

vec2

alias of vec2f

vec3

alias of vec3f

vec4

alias of vec4f

vec2b

2D vector with int8 (signed byte) components.

vec2d

2D vector with float64 (double-precision) components.

vec2f

2D vector with float32 (single-precision) components.

vec2h

2D vector with float16 (half-precision) components.

vec2i

2D vector with int32 (signed integer) components.

vec2l

2D vector with int64 (signed long) components.

vec2s

2D vector with int16 (signed short) components.

vec2ub

2D vector with uint8 (unsigned byte) components.

vec2ui

2D vector with uint32 (unsigned integer) components.

vec2ul

2D vector with uint64 (unsigned long) components.

vec2us

2D vector with uint16 (unsigned short) components.

vec3b

3D vector with int8 (signed byte) components.

vec3d

3D vector with float64 (double-precision) components.

vec3f

3D vector with float32 (single-precision) components.

vec3h

3D vector with float16 (half-precision) components.

vec3i

3D vector with int32 (signed integer) components.

vec3l

3D vector with int64 (signed long) components.

vec3s

3D vector with int16 (signed short) components.

vec3ub

3D vector with uint8 (unsigned byte) components.

vec3ui

3D vector with uint32 (unsigned integer) components.

vec3ul

3D vector with uint64 (unsigned long) components.

vec3us

3D vector with uint16 (unsigned short) components.

vec4b

4D vector with int8 (signed byte) components.

vec4d

4D vector with float64 (double-precision) components.

vec4f

4D vector with float32 (single-precision) components.

vec4h

4D vector with float16 (half-precision) components.

vec4i

4D vector with int32 (signed integer) components.

vec4l

4D vector with int64 (signed long) components.

vec4s

4D vector with int16 (signed short) components.

vec4ub

4D vector with uint8 (unsigned byte) components.

vec4ui

4D vector with uint32 (unsigned integer) components.

vec4ul

4D vector with uint64 (unsigned long) components.

vec4us

4D vector with uint16 (unsigned short) components.

Matrices#

mat22

alias of mat22f

mat33

alias of mat33f

mat44

alias of mat44f

mat22d

2x2 matrix with float64 (double-precision) components.

mat22f

2x2 matrix with float32 (single-precision) components.

mat22h

2x2 matrix with float16 (half-precision) components.

mat33d

3x3 matrix with float64 (double-precision) components.

mat33f

3x3 matrix with float32 (single-precision) components.

mat33h

3x3 matrix with float16 (half-precision) components.

mat44d

4x4 matrix with float64 (double-precision) components.

mat44f

4x4 matrix with float32 (single-precision) components.

mat44h

4x4 matrix with float16 (half-precision) components.

matrix_from_cols

Construct a matrix with each vector argument as a column.

matrix_from_rows

Construct a matrix with each vector argument as a row.

Quaternions#

quat

alias of quatf

quatd

Quaternion with float64 (double-precision) components for 3D rotations.

quatf

Quaternion with float32 (single-precision) components for 3D rotations.

quath

Quaternion with float16 (half-precision) components for 3D rotations.

quat_between_vectors

Compute the quaternion that rotates vector a to vector b.

Transformations#

transform

alias of transformf

transformd

Rigid-body transformation (position + quaternion) with float64 components.

transformf

Rigid-body transformation (position + quaternion) with float32 components.

transformh

Rigid-body transformation (position + quaternion) with float16 components.

transform_expand

Expand a flat 7-element sequence into a warp.transformf (position + quaternion).

Spatial Vectors and Matrices#

spatial_matrix

alias of spatial_matrixf

spatial_matrixd

6x6 spatial matrix with float64 components for rigid-body dynamics.

spatial_matrixf

6x6 spatial matrix with float32 components for rigid-body dynamics.

spatial_matrixh

6x6 spatial matrix with float16 components for rigid-body dynamics.

spatial_vector

alias of spatial_vectorf

spatial_vectord

6D spatial vector with float64 components for rigid-body dynamics.

spatial_vectorf

6D spatial vector with float32 components for rigid-body dynamics.

spatial_vectorh

6D spatial vector with float16 components for rigid-body dynamics.

Arrays#

array

A fixed-size multi-dimensional array containing values of the same type.

fixedarray

A fixed-size, stack allocated, array containing values of the same type.

tile

A Warp tile object.

array1d

Create or annotate a 1-dimensional warp.array.

array2d

Create or annotate a 2-dimensional warp.array.

array3d

Create or annotate a 3-dimensional warp.array.

array4d

Create or annotate a 4-dimensional warp.array.

clone

Clone an existing array, allocating a copy of the src memory.

copy

Copy array contents from src to dest.

empty

Return an uninitialized array.

empty_like

Return an uninitialized array with the same type and dimension of another array.

from_ptr

Create an array from a raw memory pointer (deprecated).

full

Return an array with all elements initialized to the given value.

full_like

Return an array with all elements initialized to the given value with the same type and dimension of another array.

ones

Return a one-initialized array.

ones_like

Return a one-initialized array with the same type and dimension of another array.

zeros

Return a zero-initialized array.

zeros_like

Return a zero-initialized array with the same type and dimension of another array.

Indexed Arrays#

indexedarray

Array providing indexed access to a subset of elements in a source warp.array.

indexedarray1d

Create or annotate a 1-dimensional warp.indexedarray.

indexedarray2d

Create or annotate a 2-dimensional warp.indexedarray.

indexedarray3d

Create or annotate a 3-dimensional warp.indexedarray.

indexedarray4d

Create or annotate a 4-dimensional warp.indexedarray.

Spatial Acceleration#

Bvh

Bounding Volume Hierarchy (BVH) for accelerated spatial queries.

BvhQuery

Object used to track state during BVH traversal.

BvhQueryTiled

Object used to track state during thread-block parallel BVH traversal.

HashGrid

Hash-based spatial grid for accelerated neighbor queries on point data.

HashGridQuery

Object used to track state during neighbor traversal.

Mesh

Triangle mesh for collision detection, ray casting, and spatial queries.

MeshQueryAABB

Object used to track state during mesh traversal.

MeshQueryAABBTiled

Object used to track state during thread-block parallel mesh traversal.

MeshQueryPoint

Output for the mesh query point functions.

MeshQueryRay

Output for the mesh query ray functions.

Volume

Sparse volumetric data structure based on NanoVDB for efficient 3D sampling.

Runtime#

clear_kernel_cache

Clear the kernel cache directory of previously generated source code and compiler artifacts.

clear_lto_cache

Clear the LTO cache directory of previously generated LTO code.

init

Initialize the Warp runtime.

is_cpu_available

Check whether CPU execution is available.

is_cuda_available

Check whether CUDA execution is available.

Kernel Programming#

WarpCodegenAttributeError

Attribute error during Warp kernel code generation.

WarpCodegenError

General error during Warp kernel code generation.

WarpCodegenIndexError

Index error during Warp kernel code generation.

WarpCodegenKeyError

Key error during Warp kernel code generation.

WarpCodegenTypeError

Type error during Warp kernel code generation.

WarpCodegenValueError

Value error during Warp kernel code generation.

func

Decorator to define a Warp function callable from kernels and other Warp functions.

func_grad

Decorator to register a custom gradient function for a given forward function.

func_native

Decorator to register native code snippet, @func_native

func_replay

Decorator to register a custom replay function for a given forward function.

grad

Return a callable that computes the gradient of the given function.

kernel

Decorator to register a Warp kernel from a Python function.

map

Map a function over the elements of one or more arrays.

overload

Overload a generic kernel with the given argument types.

static

Evaluate a static expression and replace the expression with its result.

struct

Decorator to define a Warp struct for use in kernels and functions.

Kernel Execution#

Function

Represents a Warp function decorated with @wp.func.

Kernel

Warp kernel object, typically created by decorating a Python function with @wp.kernel.

Launch

Represent all data required for a kernel launch so that launches can be replayed quickly.

Module

Warp module containing kernels and functions to be compiled.

launch

Launch a Warp kernel on the target device

launch_tiled

A helper method for launching a grid with an extra trailing dimension equal to the block size.

synchronize

Manually synchronize the calling CPU thread with any outstanding CUDA work on all devices

Automatic Differentiation#

Tape

Record kernel launches within a Tape scope to enable automatic differentiation.

Device Management#

Device

A device to allocate Warp arrays and to launch kernels on.

ScopedDevice

A context manager to temporarily change the current default device.

get_cuda_device

Return the CUDA device with the given ordinal or the current CUDA device if ordinal is None.

get_cuda_device_count

Return the number of CUDA devices supported in this environment.

get_cuda_devices

Return a list of CUDA devices supported in this environment.

get_cuda_supported_archs

Return a sorted list of CUDA compute architectures that can be used as compilation targets.

get_device

Return the device identified by the argument.

get_devices

Return a list of devices supported in this environment.

get_preferred_device

Return the preferred compute device, cuda:0 if available and cpu otherwise.

is_device_available

Check whether a device is available in the current environment.

map_cuda_device

Assign a device alias to a CUDA context.

set_device

Set the default device identified by the argument.

synchronize_device

Synchronize the calling CPU thread with any outstanding CUDA work on the specified device

unmap_cuda_device

Remove a CUDA device with the given alias.

Module Management#

compile_aot_module

Compile a module (ahead of time) for a given device.

force_load

Force user-defined kernels to be compiled and loaded (low-level API).

get_module

Return or create the Warp module associated with a given name.

get_module_options

Return a list of options for the current module.

load_aot_module

Load a previously compiled module (ahead of time).

load_module

Force a user-defined module to be compiled and loaded.

set_module_options

Set options for the current module.

CUDA Stream Management#

ScopedStream

A context manager to temporarily change the current stream on a device.

Stream

CUDA stream wrapper for managing asynchronous GPU operations.

get_stream

Return the stream currently used by the given device.

set_stream

Convenience function for calling Device.set_stream() on the given device.

synchronize_stream

Synchronize the calling CPU thread with any outstanding CUDA work on the specified stream.

wait_stream

Convenience function for calling Stream.wait_stream() on the current stream.

CUDA Event Management#

Event

A CUDA event that can be recorded onto a stream.

get_event_elapsed_time

Get the elapsed time between two recorded events.

record_event

Convenience function for calling Stream.record_event() on the current stream.

synchronize_event

Synchronize the calling CPU thread with an event recorded on a CUDA stream.

wait_event

Convenience function for calling Stream.wait_event() on the current stream.

CUDA Memory Management#

ScopedMempool

Context manager to temporarily enable or disable memory pool allocators.

ScopedMempoolAccess

Context manager to temporarily enable or disable mempool access between devices.

ScopedPeerAccess

Context manager to temporarily enable or disable peer access between CUDA devices.

get_mempool_release_threshold

Get the CUDA memory pool release threshold on the device.

get_mempool_used_mem_current

Get the amount of memory from the device's memory pool that is currently in use by the application.

get_mempool_used_mem_high

Get the application's memory usage high-water mark from the device's CUDA memory pool.

is_mempool_access_enabled

Check if peer_device can currently access the memory pool of target_device.

is_mempool_access_supported

Check if peer_device can directly access the memory pool of target_device.

is_mempool_enabled

Check if CUDA memory pool allocators are enabled on the device.

is_mempool_supported

Check if CUDA memory pool allocators are available on the device.

is_peer_access_enabled

Check if peer_device can currently access the memory of target_device.

is_peer_access_supported

Check if peer_device can directly access the memory of target_device on this system.

set_mempool_access_enabled

Enable or disable access from peer_device to the memory pool of target_device.

set_mempool_enabled

Enable or disable CUDA memory pool allocators on the device.

set_mempool_release_threshold

Set the CUDA memory pool release threshold on the device.

set_peer_access_enabled

Enable or disable direct access from peer_device to the memory of target_device.

CUDA Graph Management#

ScopedCapture

Context manager to capture a sequence of operations into a CUDA graph.

capture_begin

Begin capture of a CUDA graph

capture_debug_dot_print

Export a CUDA graph to a DOT file for visualization

capture_end

End the capture of a CUDA graph.

capture_if

Create a dynamic branch based on a condition.

capture_launch

Launch a previously captured CUDA graph

capture_while

Create a dynamic loop based on a condition.

is_conditional_graph_supported

Check whether conditional graph nodes are supported.

CUDA Interprocess Communication#

event_from_ipc_handle

Create an event from an IPC handle.

from_ipc_handle

Create an array from an IPC handle.

Profiling#

ScopedTimer

Context manager for timing code blocks.

TimingResult

Timing result for a single activity.

timing_begin

Begin detailed activity timing.

timing_end

End detailed activity timing.

timing_print

Print timing results.

Timing Flags#

TIMING_ALL

Timing flag to capture all CUDA activities.

TIMING_GRAPH

Timing flag for CUDA graph launches.

TIMING_KERNEL

Timing flag for user-defined kernel launches.

TIMING_KERNEL_BUILTIN

Timing flag for built-in kernel operations (e.g., array fill, copy).

TIMING_MEMCPY

Timing flag for memory copy operations.

TIMING_MEMSET

Timing flag for memory set operations.

NumPy Interop#

dtype_from_numpy

Return the Warp dtype corresponding to a NumPy dtype.

dtype_to_numpy

Return the NumPy dtype corresponding to a Warp dtype.

from_numpy

Return a Warp array created from a NumPy array.

DLPack Interop#

from_dlpack

Convert a source array or DLPack capsule into a Warp array without copying.

to_dlpack

Convert a Warp array to another type of DLPack-compatible array.

JAX Interop#

device_from_jax

Return the Warp device corresponding to a Jax device.

device_to_jax

Return the Jax device corresponding to a Warp device.

dtype_from_jax

Return the Warp dtype corresponding to a Jax dtype.

dtype_to_jax

Return the Jax dtype corresponding to a Warp dtype.

from_jax

Convert a Jax array to a Warp array without copying the data.

to_jax

Convert a Warp array to a Jax array without copying the data.

PyTorch Interop#

device_from_torch

Return the Warp device corresponding to a Torch device.

device_to_torch

Return the Torch device string corresponding to a Warp device.

dtype_from_torch

Return the Warp dtype corresponding to a Torch dtype.

dtype_to_torch

Return the Torch dtype corresponding to a Warp dtype.

from_torch

Convert a Torch tensor to a Warp array without copying the data.

stream_from_torch

Convert from a Torch CUDA stream to a Warp CUDA stream.

stream_to_torch

Convert from a Warp CUDA stream to a Torch CUDA stream.

to_torch

Convert a Warp array to a Torch tensor without copying the data.

Omniverse Runtime Fabric Interop#

fabricarray

Array type for accessing data stored in Omniverse Runtime Fabric.

indexedfabricarray

Indexed view into a fabricarray.

fabricarrayarray

Create a Fabric array of arrays (2D fabric array).

indexedfabricarrayarray

Create an indexed Fabric array of arrays (2D indexed fabric array).

Paddle Interop#

device_from_paddle

Return the Warp device corresponding to a Paddle device.

device_to_paddle

Return the Paddle device string corresponding to a Warp device.

dtype_from_paddle

Return the Warp dtype corresponding to a Paddle dtype.

dtype_to_paddle

Return the Paddle dtype corresponding to a Warp dtype.

from_paddle

Convert a Paddle tensor to a Warp array without copying the data.

stream_from_paddle

Convert from a Paddle CUDA stream to a Warp CUDA stream.

to_paddle

Convert a Warp array to a Paddle tensor without copying the data.

Constants#

constant

Function to declare compile-time constants accessible from Warp kernels

E

Euler's number e (approximately 2.718).

HALF_PI

Half of pi (approximately 1.571).

INF

Positive infinity.

LN2

Natural logarithm of 2 (approximately 0.693).

LN10

Natural logarithm of 10 (approximately 2.303).

LOG10E

Base-10 logarithm of e (approximately 0.434).

LOG2E

Base-2 logarithm of e (approximately 1.443).

NAN

Not a Number (NaN).

PHI

Golden ratio (approximately 1.618).

PI

Pi (approximately 3.14159).

TAU

Tau, the circle constant equal to 2*pi (approximately 6.283).

e

Euler's number e (approximately 2.718).

half_pi

Half of pi (approximately 1.571).

inf

Positive infinity.

ln2

Natural logarithm of 2 (approximately 0.693).

ln10

Natural logarithm of 10 (approximately 2.303).

log10e

Base-10 logarithm of e (approximately 0.434).

log2e

Base-2 logarithm of e (approximately 1.443).

nan

Not a Number (NaN).

phi

Golden ratio (approximately 1.618).

pi

Pi (approximately 3.14159).

tau

Tau, the circle constant equal to 2*pi (approximately 6.283).

Misc#

MarchingCubes

A reusable context for marching cubes surface extraction.

RegisteredGLBuffer

Helper class to register a GL buffer with CUDA so that it can be mapped to a Warp array.