warp#
The warp package provides array types and functions for creating and manipulating
multi-dimensional data on CPU and CUDA devices. It includes kernel and function decorators
(kernel(), func()) for defining parallel code, along with a comprehensive set
of built-in types and functions for use within kernels (see Built-Ins).
The package provides device management, kernel launch and synchronization functions, automatic
differentiation via Tape recording, type introspection and construction utilities, and
module compilation and caching.
Additional functionality is available in optional submodules that must be explicitly
imported, such as warp.render for visualization, warp.fem for finite
element methods, and warp.sparse for sparse linear algebra.
Submodules#
These modules are automatically available when you import warp.
Additional Submodules#
These modules must be explicitly imported (e.g., import warp.autograd).
Type Annotations#
Data Types#
Scalars#
Vectors#
Matrices#
Quaternions#
Transformations#
Spatial Vectors and Matrices#
Arrays#
A fixed-size multi-dimensional array containing values of the same type. |
|
A fixed-size, stack allocated, array containing values of the same type. |
|
A Warp tile object. |
|
Clone an existing array, allocates a copy of the src memory |
|
Copy array contents from src to dest. |
|
Returns an uninitialized array |
|
Return an uninitialized array with the same type and dimension of another array |
|
Return an array with all elements initialized to the given value |
|
Return an array with all elements initialized to the given value with the same type and dimension of another array |
|
Return a one-initialized array |
|
Return a one-initialized array with the same type and dimension of another array |
|
Return a zero-initialized array |
|
Return a zero-initialized array with the same type and dimension of another array |
Indexed Arrays#
Spatial Acceleration#
Object used to track state during BVH traversal. |
|
Object used to track state during thread-block parallel BVH traversal. |
|
Object used to track state during neighbor traversal. |
|
Object used to track state during mesh traversal. |
|
Object used to track state during thread-block parallel mesh traversal. |
|
Output for the mesh query point functions. |
|
Output for the mesh query ray functions. |
|
Runtime#
Clear the kernel cache directory of previously generated source code and compiler artifacts. |
|
Clear the LTO cache directory of previously generated LTO code. |
|
Initialize the Warp runtime. |
|
Kernel Programming#
Decorator to register a custom gradient function for a given forward function. |
|
Decorator to register native code snippet, @func_native |
|
Decorator to register a custom replay function for a given forward function. |
|
Return a callable that computes the gradient of the given function. |
|
Decorator to register a Warp kernel from a Python function. |
|
Map a function over the elements of one or more arrays. |
|
Overload a generic kernel with the given argument types. |
|
Evaluates a static expression and replaces the expression with its result. |
|
Kernel Execution#
Represents all data required for a kernel launch so that launches can be replayed quickly. |
|
Launch a Warp kernel on the target device |
|
A helper method for launching a grid with an extra trailing dimension equal to the block size. |
|
Manually synchronize the calling CPU thread with any outstanding CUDA work on all devices |
Automatic Differentiation#
Record kernel launches within a Tape scope to enable automatic differentiation. |
Device Management#
A device to allocate Warp arrays and to launch kernels on. |
|
A context manager to temporarily change the current default device. |
|
Returns the CUDA device with the given ordinal or the current CUDA device if ordinal is None. |
|
Returns the number of CUDA devices supported in this environment. |
|
Returns a list of CUDA devices supported in this environment. |
|
Return a sorted list of CUDA compute architectures that can be used as compilation targets. |
|
Returns the device identified by the argument. |
|
Returns a list of devices supported in this environment. |
|
Returns the preferred compute device, |
|
Assign a device alias to a CUDA context. |
|
Sets the default device identified by the argument. |
|
Synchronize the calling CPU thread with any outstanding CUDA work on the specified device |
|
Remove a CUDA device with the given alias. |
Module Management#
Compile a module (ahead of time) for a given device. |
|
Force user-defined kernels to be compiled and loaded (low-level API). |
|
Returns a list of options for the current module. |
|
Load a previously compiled module (ahead of time). |
|
Force a user-defined module to be compiled and loaded. |
|
Set options for the current module. |
CUDA Stream Management#
A context manager to temporarily change the current stream on a device. |
|
Return the stream currently used by the given device. |
|
Convenience function for calling |
|
Synchronize the calling CPU thread with any outstanding CUDA work on the specified stream. |
|
Convenience function for calling |
CUDA Event Management#
A CUDA event that can be recorded onto a stream. |
|
Get the elapsed time between two recorded events. |
|
Convenience function for calling |
|
Synchronize the calling CPU thread with an event recorded on a CUDA stream. |
|
Convenience function for calling |
CUDA Memory Management#
Get the CUDA memory pool release threshold on the device. |
|
Get the amount of memory from the device's memory pool that is currently in use by the application. |
|
Get the application's memory usage high-water mark from the device's CUDA memory pool. |
|
Check if peer_device can currently access the memory pool of target_device. |
|
Check if peer_device can directly access the memory pool of target_device. |
|
Check if CUDA memory pool allocators are enabled on the device. |
|
Check if CUDA memory pool allocators are available on the device. |
|
Check if peer_device can currently access the memory of target_device. |
|
Check if peer_device can directly access the memory of target_device on this system. |
|
Enable or disable access from peer_device to the memory pool of target_device. |
|
Enable or disable CUDA memory pool allocators on the device. |
|
Set the CUDA memory pool release threshold on the device. |
|
Enable or disable direct access from peer_device to the memory of target_device. |
CUDA Graph Management#
Begin capture of a CUDA graph |
|
Export a CUDA graph to a DOT file for visualization |
|
End the capture of a CUDA graph. |
|
Create a dynamic branch based on a condition. |
|
Launch a previously captured CUDA graph |
|
Create a dynamic loop based on a condition. |
|
CUDA Interprocess Communication#
Create an event from an IPC handle. |
|
Create an array from an IPC handle. |
Profiling#
Timing result for a single activity. |
|
Begin detailed activity timing. |
|
End detailed activity timing. |
|
Print timing results. |
Timing Flags#
NumPy Interop#
Return the Warp dtype corresponding to a NumPy dtype. |
|
Return the NumPy dtype corresponding to a Warp dtype. |
|
Returns a Warp array created from a NumPy array. |
DLPack Interop#
Convert a source array or DLPack capsule into a Warp array without copying. |
|
Convert a Warp array to another type of DLPack-compatible array. |
JAX Interop#
Return the Warp device corresponding to a Jax device. |
|
Return the Jax device corresponding to a Warp device. |
|
Return the Warp dtype corresponding to a Jax dtype. |
|
Return the Jax dtype corresponding to a Warp dtype. |
|
Convert a Jax array to a Warp array without copying the data. |
|
Convert a Warp array to a Jax array without copying the data. |
PyTorch Interop#
Return the Warp device corresponding to a Torch device. |
|
Return the Torch device string corresponding to a Warp device. |
|
Return the Warp dtype corresponding to a Torch dtype. |
|
Return the Torch dtype corresponding to a Warp dtype. |
|
Convert a Torch tensor to a Warp array without copying the data. |
|
Convert from a Torch CUDA stream to a Warp CUDA stream. |
|
Convert from a Warp CUDA stream to a Torch CUDA stream. |
|
Convert a Warp array to a Torch tensor without copying the data. |
Omniverse Runtime Fabric Interop#
Paddle Interop#
Return the Warp device corresponding to a Paddle device. |
|
Return the Paddle device string corresponding to a Warp device. |
|
Return the Warp dtype corresponding to a Paddle dtype. |
|
Return the Paddle dtype corresponding to a Warp dtype. |
|
Convert a Paddle tensor to a Warp array without copying the data. |
|
Convert from a Paddle CUDA stream to a Warp CUDA stream. |
|
Convert a Warp array to a Paddle tensor without copying the data. |
Constants#
Misc#
A reusable context for marching cubes surface extraction. |
|
Helper class to register a GL buffer with CUDA so that it can be mapped to a Warp array. |