warp#

The warp package provides array types and functions for creating and manipulating multi-dimensional data on CPU and CUDA devices. It includes kernel and function decorators (kernel(), func()) for defining parallel code, along with a comprehensive set of built-in types and functions for use within kernels (see Built-Ins).

The package provides device management, kernel launch and synchronization functions, automatic differentiation via Tape recording, type introspection and construction utilities, and module compilation and caching.

Additional functionality is available in optional submodules that must be explicitly imported, such as warp.render for visualization, warp.fem for finite element methods, and warp.sparse for sparse linear algebra.

Submodules#

These modules are automatically available when you import warp.

Additional Submodules#

These modules must be explicitly imported (e.g., import warp.autograd).

Type Annotations#

`DeviceLike`	alias of `Device` \| `str` \| `None`
`Float`	Type variable.
`Int`	Type variable.
`Scalar`	Type variable.

Data Types#

Scalars#

`bool`	Boolean scalar type for use in Warp kernels and arrays.
`float16`	16-bit half-precision floating-point scalar type.
`float32`	32-bit single-precision floating-point scalar type.
`float64`	64-bit double-precision floating-point scalar type.
`int8`	8-bit signed integer scalar type.
`int16`	16-bit signed integer scalar type.
`int32`	32-bit signed integer scalar type.
`int64`	64-bit signed integer scalar type.
`uint8`	8-bit unsigned integer scalar type.
`uint16`	16-bit unsigned integer scalar type.
`uint32`	32-bit unsigned integer scalar type.
`uint64`	64-bit unsigned integer scalar type.

Vectors#

`vec2`	alias of `vec2f`
`vec3`	alias of `vec3f`
`vec4`	alias of `vec4f`
`vec2b`	2D vector with int8 (signed byte) components.
`vec2d`	2D vector with float64 (double-precision) components.
`vec2f`	2D vector with float32 (single-precision) components.
`vec2h`	2D vector with float16 (half-precision) components.
`vec2i`	2D vector with int32 (signed integer) components.
`vec2l`	2D vector with int64 (signed long) components.
`vec2s`	2D vector with int16 (signed short) components.
`vec2ub`	2D vector with uint8 (unsigned byte) components.
`vec2ui`	2D vector with uint32 (unsigned integer) components.
`vec2ul`	2D vector with uint64 (unsigned long) components.
`vec2us`	2D vector with uint16 (unsigned short) components.
`vec3b`	3D vector with int8 (signed byte) components.
`vec3d`	3D vector with float64 (double-precision) components.
`vec3f`	3D vector with float32 (single-precision) components.
`vec3h`	3D vector with float16 (half-precision) components.
`vec3i`	3D vector with int32 (signed integer) components.
`vec3l`	3D vector with int64 (signed long) components.
`vec3s`	3D vector with int16 (signed short) components.
`vec3ub`	3D vector with uint8 (unsigned byte) components.
`vec3ui`	3D vector with uint32 (unsigned integer) components.
`vec3ul`	3D vector with uint64 (unsigned long) components.
`vec3us`	3D vector with uint16 (unsigned short) components.
`vec4b`	4D vector with int8 (signed byte) components.
`vec4d`	4D vector with float64 (double-precision) components.
`vec4f`	4D vector with float32 (single-precision) components.
`vec4h`	4D vector with float16 (half-precision) components.
`vec4i`	4D vector with int32 (signed integer) components.
`vec4l`	4D vector with int64 (signed long) components.
`vec4s`	4D vector with int16 (signed short) components.
`vec4ub`	4D vector with uint8 (unsigned byte) components.
`vec4ui`	4D vector with uint32 (unsigned integer) components.
`vec4ul`	4D vector with uint64 (unsigned long) components.
`vec4us`	4D vector with uint16 (unsigned short) components.

Matrices#

`mat22`	alias of `mat22f`
`mat33`	alias of `mat33f`
`mat44`	alias of `mat44f`
`mat22d`	2x2 matrix with float64 (double-precision) components.
`mat22f`	2x2 matrix with float32 (single-precision) components.
`mat22h`	2x2 matrix with float16 (half-precision) components.
`mat33d`	3x3 matrix with float64 (double-precision) components.
`mat33f`	3x3 matrix with float32 (single-precision) components.
`mat33h`	3x3 matrix with float16 (half-precision) components.
`mat44d`	4x4 matrix with float64 (double-precision) components.
`mat44f`	4x4 matrix with float32 (single-precision) components.
`mat44h`	4x4 matrix with float16 (half-precision) components.
`matrix_from_cols`	Construct a matrix with each vector argument as a column.
`matrix_from_rows`	Construct a matrix with each vector argument as a row.

Quaternions#

`quat`	alias of `quatf`
`quatd`	Quaternion with float64 (double-precision) components for 3D rotations.
`quatf`	Quaternion with float32 (single-precision) components for 3D rotations.
`quath`	Quaternion with float16 (half-precision) components for 3D rotations.
`quat_between_vectors`	Compute the quaternion that rotates vector `a` to vector `b`.

Transformations#

`transform`	alias of `transformf`
`transformd`	Rigid-body transformation (position + quaternion) with float64 components.
`transformf`	Rigid-body transformation (position + quaternion) with float32 components.
`transformh`	Rigid-body transformation (position + quaternion) with float16 components.
`transform_expand`	Expand a flat 7-element sequence into a `warp.transformf` (position + quaternion).

Spatial Vectors and Matrices#

`spatial_matrix`	alias of `spatial_matrixf`
`spatial_matrixd`	6x6 spatial matrix with float64 components for rigid-body dynamics.
`spatial_matrixf`	6x6 spatial matrix with float32 components for rigid-body dynamics.
`spatial_matrixh`	6x6 spatial matrix with float16 components for rigid-body dynamics.
`spatial_vector`	alias of `spatial_vectorf`
`spatial_vectord`	6D spatial vector with float64 components for rigid-body dynamics.
`spatial_vectorf`	6D spatial vector with float32 components for rigid-body dynamics.
`spatial_vectorh`	6D spatial vector with float16 components for rigid-body dynamics.

Arrays#

`array`	A fixed-size multi-dimensional array containing values of the same type.
`array1d`	Create or annotate a 1-dimensional `warp.array`.
`array2d`	Create or annotate a 2-dimensional `warp.array`.
`array3d`	Create or annotate a 3-dimensional `warp.array`.
`array4d`	Create or annotate a 4-dimensional `warp.array`.
`fixedarray`	A fixed-size, stack allocated, array containing values of the same type.
`tile`	A Warp tile object.
`clone`	Clone an existing array, allocating a copy of the src memory.
`copy`	Copy array contents from src to dest.
`empty`	Return an uninitialized array.
`empty_like`	Return an uninitialized array with the same type and dimension of another array.
`from_ptr`	Create an array from a raw memory pointer (deprecated).
`full`	Return an array with all elements initialized to the given value.
`full_like`	Return an array with all elements initialized to the given value with the same type and dimension of another array.
`ones`	Return a one-initialized array.
`ones_like`	Return a one-initialized array with the same type and dimension of another array.
`zeros`	Return a zero-initialized array.
`zeros_like`	Return a zero-initialized array with the same type and dimension of another array.

Indexed Arrays#

`indexedarray`	Array providing indexed access to a subset of elements in a source `warp.array`.
`indexedarray1d`	Create or annotate a 1-dimensional `warp.indexedarray`.
`indexedarray2d`	Create or annotate a 2-dimensional `warp.indexedarray`.
`indexedarray3d`	Create or annotate a 3-dimensional `warp.indexedarray`.
`indexedarray4d`	Create or annotate a 4-dimensional `warp.indexedarray`.

Spatial Acceleration#

`Bvh`	Bounding Volume Hierarchy (BVH) for accelerated spatial queries.
`BvhQuery`	Object used to track state during BVH traversal.
`BvhQueryTiled`	Object used to track state during thread-block parallel BVH traversal.
`HashGrid`	Hash-based spatial grid for accelerated neighbor queries on point data.
`HashGridQuery`	Object used to track state during neighbor traversal (float32).
`HashGridQueryD`	Object used to track state during neighbor traversal (float64).
`HashGridQueryH`	Object used to track state during neighbor traversal (float16).
`Mesh`	Triangle mesh for collision detection, ray casting, and spatial queries.
`MeshQueryAABB`	Object used to track state during mesh traversal.
`MeshQueryAABBTiled`	Object used to track state during thread-block parallel mesh traversal.
`MeshQueryPoint`	Output for the mesh query point functions.
`MeshQueryRay`	Output for the mesh query ray functions.
`Texture`	Unified texture class for hardware-accelerated sampling on GPU and software sampling on CPU.
`Texture1D`	1D texture class.
`Texture2D`	2D texture class.
`Texture3D`	3D texture class.
`TextureAddressMode`	Address modes for texture coordinates outside [0, 1].
`TextureFilterMode`	Filter modes for texture sampling.
`Volume`	Sparse volumetric data structure based on NanoVDB for efficient 3D sampling.

Runtime#

`clear_kernel_cache`	Clear the kernel cache directory of previously generated source code and compiler artifacts.
`clear_lto_cache`	Clear the LTO cache directory of previously generated LTO code.
`init`	Initialize the Warp runtime.
`is_cpu_available`	Check whether CPU execution is available.
`is_cuda_available`	Check whether CUDA execution is available.
`print_diagnostics`	Print a comprehensive snapshot of the Warp build and runtime environment.

Kernel Programming#

`WarpCodegenAttributeError`	Attribute error during Warp kernel code generation.
`WarpCodegenError`	General error during Warp kernel code generation.
`WarpCodegenIndexError`	Index error during Warp kernel code generation.
`WarpCodegenKeyError`	Key error during Warp kernel code generation.
`WarpCodegenTypeError`	Type error during Warp kernel code generation.
`WarpCodegenValueError`	Value error during Warp kernel code generation.
`func`	Decorator to define a Warp function callable from kernels and other Warp functions.
`func_grad`	Decorator to register a custom gradient function for a given forward function.
`func_native`	Decorator to register native code snippet, @func_native
`func_replay`	Decorator to register a custom replay function for a given forward function.
`grad`	Return a callable that computes the gradient of the given function.
`kernel`	Decorator to register a Warp kernel from a Python function.
`map`	Map a function over the elements of one or more arrays.
`overload`	Overload a generic kernel with the given argument types.
`static`	Evaluate a static expression and replace the expression with its result.
`struct`	Decorator to define a Warp struct for use in kernels and functions.

Kernel Execution#

`Function`	Represents a Warp function decorated with `@wp.func`.
`Kernel`	Warp kernel object, typically created by decorating a Python function with `@wp.kernel`.
`Launch`	Represent all data required for a kernel launch so that launches can be replayed quickly.
`Module`	Warp module containing kernels and functions to be compiled.
`launch`	Launch a Warp kernel on the target device
`launch_tiled`	A helper method for launching a grid with an extra trailing dimension equal to the block size.
`synchronize`	Manually synchronize the calling CPU thread with any outstanding CUDA work on all devices

Automatic Differentiation#

Tape

Record kernel launches within a Tape scope to enable automatic differentiation.

Device Management#

`Device`	A device to allocate Warp arrays and to launch kernels on.
`ScopedDevice`	A context manager to temporarily change the current default device.
`get_cuda_device`	Return the CUDA device with the given ordinal or the current CUDA device if ordinal is `None`.
`get_cuda_device_count`	Return the number of CUDA devices supported in this environment.
`get_cuda_devices`	Return a list of CUDA devices supported in this environment.
`get_cuda_driver_version`	Return the CUDA driver version installed on the system.
`get_cuda_supported_archs`	Return a sorted list of CUDA compute architectures that can be used as compilation targets.
`get_cuda_toolkit_version`	Return the CUDA Toolkit version used to build the Warp native library.
`get_device`	Return the device identified by the argument.
`get_devices`	Return a list of devices supported in this environment.
`get_preferred_device`	Return the preferred compute device, `cuda:0` if available and `cpu` otherwise.
`is_device_available`	Check whether a device is available in the current environment.
`map_cuda_device`	Assign a device alias to a CUDA context.
`set_device`	Set the default device identified by the argument.
`synchronize_device`	Synchronize the calling CPU thread with any outstanding CUDA work on the specified device
`unmap_cuda_device`	Remove a CUDA device with the given alias.

Module Management#

`compile_aot_module`	Compile a module (ahead of time) for a given device.
`force_load`	Force user-defined kernels to be compiled and loaded (low-level API).
`get_module`	Return or create the Warp module associated with a given name.
`get_module_options`	Return a list of options for the current module.
`load_aot_module`	Load a previously compiled module (ahead of time).
`load_module`	Force a user-defined module to be compiled and loaded.
`set_module_options`	Set options for the current module.

CUDA Stream Management#

`ScopedStream`	A context manager to temporarily change the current stream on a device.
`Stream`	CUDA stream wrapper for managing asynchronous GPU operations.
`get_stream`	Return the stream currently used by the given device.
`set_stream`	Convenience function for calling `Device.set_stream()` on the given `device`.
`synchronize_stream`	Synchronize the calling CPU thread with any outstanding CUDA work on the specified stream.
`wait_stream`	Convenience function for calling `Stream.wait_stream()` on the current stream.

CUDA Event Management#

`Event`	A CUDA event that can be recorded onto a stream.
`get_event_elapsed_time`	Get the elapsed time between two recorded events.
`record_event`	Convenience function for calling `Stream.record_event()` on the current stream.
`synchronize_event`	Synchronize the calling CPU thread with an event recorded on a CUDA stream.
`wait_event`	Convenience function for calling `Stream.wait_event()` on the current stream.

CUDA Memory Management#

`ScopedMempool`	Context manager to temporarily enable or disable memory pool allocators.
`ScopedMempoolAccess`	Context manager to temporarily enable or disable mempool access between devices.
`ScopedPeerAccess`	Context manager to temporarily enable or disable peer access between CUDA devices.
`get_mempool_release_threshold`	Get the CUDA memory pool release threshold on the device.
`get_mempool_used_mem_current`	Get the amount of memory from the device's memory pool that is currently in use by the application.
`get_mempool_used_mem_high`	Get the application's memory usage high-water mark from the device's CUDA memory pool.
`is_mempool_access_enabled`	Check if `peer_device` can currently access the memory pool of `target_device`.
`is_mempool_access_supported`	Check if `peer_device` can directly access the memory pool of `target_device`.
`is_mempool_enabled`	Check if CUDA memory pool allocators are enabled on the device.
`is_mempool_supported`	Check if CUDA memory pool allocators are available on the device.
`is_peer_access_enabled`	Check if `peer_device` can currently access the memory of `target_device`.
`is_peer_access_supported`	Check if `peer_device` can directly access the memory of `target_device` on this system.
`set_mempool_access_enabled`	Enable or disable access from `peer_device` to the memory pool of `target_device`.
`set_mempool_enabled`	Enable or disable CUDA memory pool allocators on the device.
`set_mempool_release_threshold`	Set the CUDA memory pool release threshold on the device.
`set_peer_access_enabled`	Enable or disable direct access from `peer_device` to the memory of `target_device`.

CUDA Graph Management#

`ScopedCapture`	Context manager to capture a sequence of operations into a CUDA graph.
`capture_begin`	Begin capture of a CUDA graph
`capture_debug_dot_print`	Export a CUDA graph to a DOT file for visualization
`capture_end`	End the capture of a CUDA graph.
`capture_if`	Create a dynamic branch based on a condition.
`capture_launch`	Launch a previously captured CUDA graph
`capture_while`	Create a dynamic loop based on a condition.
`is_conditional_graph_supported`	Check whether conditional graph nodes are supported.

CUDA Interprocess Communication#

`event_from_ipc_handle`	Create an event from an IPC handle.
`from_ipc_handle`	Create an array from an IPC handle.

Profiling#

`ScopedTimer`	Context manager for timing code blocks.
`TimingResult`	Timing result for a single activity.
`timing_begin`	Begin detailed activity timing.
`timing_end`	End detailed activity timing.
`timing_print`	Print timing results.

Timing Flags#

`TIMING_ALL`	Timing flag to capture all CUDA activities.
`TIMING_GRAPH`	Timing flag for CUDA graph launches.
`TIMING_KERNEL`	Timing flag for user-defined kernel launches.
`TIMING_KERNEL_BUILTIN`	Timing flag for built-in kernel operations (e.g., array fill, copy).
`TIMING_MEMCPY`	Timing flag for memory copy operations.
`TIMING_MEMSET`	Timing flag for memory set operations.

NumPy Interop#

`dtype_from_numpy`	Return the Warp dtype corresponding to a NumPy dtype.
`dtype_to_numpy`	Return the NumPy dtype corresponding to a Warp dtype.
`from_numpy`	Return a Warp array created from a NumPy array.

DLPack Interop#

`from_dlpack`	Convert a source array or DLPack capsule into a Warp array without copying.
`to_dlpack`	Convert a Warp array to another type of DLPack-compatible array.

JAX Interop#

`device_from_jax`	Return the Warp device corresponding to a Jax device.
`device_to_jax`	Return the Jax device corresponding to a Warp device.
`dtype_from_jax`	Return the Warp dtype corresponding to a Jax dtype.
`dtype_to_jax`	Return the Jax dtype corresponding to a Warp dtype.
`from_jax`	Convert a Jax array to a Warp array without copying the data.
`to_jax`	Convert a Warp array to a Jax array without copying the data.

PyTorch Interop#

`device_from_torch`	Return the Warp device corresponding to a Torch device.
`device_to_torch`	Return the Torch device string corresponding to a Warp device.
`dtype_from_torch`	Return the Warp dtype corresponding to a Torch dtype.
`dtype_to_torch`	Return the Torch dtype corresponding to a Warp dtype.
`from_torch`	Convert a Torch tensor to a Warp array without copying the data.
`stream_from_torch`	Convert from a Torch CUDA stream to a Warp CUDA stream.
`stream_to_torch`	Convert from a Warp CUDA stream to a Torch CUDA stream.
`to_torch`	Convert a Warp array to a Torch tensor without copying the data.

Omniverse Runtime Fabric Interop#

`fabricarray`	Array type for accessing data stored in Omniverse Runtime Fabric.
`indexedfabricarray`	Indexed view into a `fabricarray`.
`fabricarrayarray`	Create a Fabric array of arrays (2D fabric array).
`indexedfabricarrayarray`	Create an indexed Fabric array of arrays (2D indexed fabric array).

Paddle Interop#

`device_from_paddle`	Return the Warp device corresponding to a Paddle device.
`device_to_paddle`	Return the Paddle device string corresponding to a Warp device.
`dtype_from_paddle`	Return the Warp dtype corresponding to a Paddle dtype.
`dtype_to_paddle`	Return the Paddle dtype corresponding to a Warp dtype.
`from_paddle`	Convert a Paddle tensor to a Warp array without copying the data.
`stream_from_paddle`	Convert from a Paddle CUDA stream to a Warp CUDA stream.
`to_paddle`	Convert a Warp array to a Paddle tensor without copying the data.

Constants#

`constant`	Function to declare compile-time constants accessible from Warp kernels
`E`	Euler's number e (approximately 2.718).
`HALF_PI`	Half of pi (approximately 1.571).
`INF`	Positive infinity.
`LN2`	Natural logarithm of 2 (approximately 0.693).
`LN10`	Natural logarithm of 10 (approximately 2.303).
`LOG10E`	Base-10 logarithm of e (approximately 0.434).
`LOG2E`	Base-2 logarithm of e (approximately 1.443).
`NAN`	Not a Number (NaN).
`PHI`	Golden ratio (approximately 1.618).
`PI`	Pi (approximately 3.14159).
`TAU`	Tau, the circle constant equal to 2*pi (approximately 6.283).
`e`	Euler's number e (approximately 2.718).
`half_pi`	Half of pi (approximately 1.571).
`inf`	Positive infinity.
`ln2`	Natural logarithm of 2 (approximately 0.693).
`ln10`	Natural logarithm of 10 (approximately 2.303).
`log10e`	Base-10 logarithm of e (approximately 0.434).
`log2e`	Base-2 logarithm of e (approximately 1.443).
`nan`	Not a Number (NaN).
`phi`	Golden ratio (approximately 1.618).
`pi`	Pi (approximately 3.14159).
`tau`	Tau, the circle constant equal to 2*pi (approximately 6.283).

Misc#

`MarchingCubes`	A reusable context for marching cubes surface extraction.
`RegisteredGLBuffer`	Helper class to register a GL buffer with CUDA so that it can be mapped to a Warp array.