Changelog#
1.5.0 - 2024-12-02#
Added#
Support for cooperative tile-based primitives using cuBLASDx and cuFFTDx, please see the tile documentation for details.
Expose a
reversed()
built-in for iterators (GH-311).Support for saving Volumes into
.nvdb
files with thesave_to_nvdb
method.warp.fem: Add
wp.fem.Trimesh3D
andwp.fem.Quadmesh3D
geometry types for 3D surfaces with newexample_distortion_energy
example.warp.fem: Add
"add"
option towp.fem.integrate()
for accumulating integration result to existing output.warp.fem: Add
"assembly"
option towp.fem.integrate()
for selecting between more memory-efficient or more computationally efficient integration algorithms.warp.fem: Add Nédélec (first kind) and Raviart-Thomas vector-valued function spaces providing conforming discretization of
curl
anddiv
operators, respectively.warp.sim: Add a graph coloring module that supports converting trimesh into a vertex graph and applying coloring. The
wp.sim.ModelBuilder
now includes methods to color particles for use withwp.sim.VBDIntegrator()
, users should callbuilder.color()
before finalizing assets.warp.sim: Add support for a per-particle radius for soft-body triangle contact using the
wp.sim.Model.particle_radius
array (docs), replacing the previous hard-coded value of 0.01 (GH-329).Add a
particle_radius
parameter towp.sim.ModelBuilder.add_cloth_mesh()
andwp.sim.ModelBuilder.add_cloth_grid()
to set a uniform radius for the added particles.Document
wp.array
attributes (GH-364).Document time-to-compile tradeoffs when using vector component assignment statements in kernels.
Add introductory Jupyter notebooks to the
notebooks
directory.
Changed#
Drop support for Python 3.7; Python 3.8 is now the minimum-supported version.
Promote the
wp.Int
,wp.Float
, andwp.Scalar
generic annotation types to the public API.warp.fem: Simplify querying neighboring cell quantities when integrating on sides using new
wp.fem.cells()
,wp.fem.to_inner_cell()
,wp.fem.to_outer_cell()
operators.Show an error message when the type returned by a function differs from its annotation, which would have led to the compilation stage failing.
Clarify that
wp.randn()
samples a normal distribution of mean 0 and variance 1.Raise error when passing more than 32 variadic argument to the
wp.printf()
built-in.
Fixed#
Fix
place
setting of paddle backend.warp.fem: Fix tri-cubic shape functions on quadrilateral meshes.
warp.fem: Fix caching of integrand kernels when changing code-generation options.
Fix
wp.expect_neq()
overloads missing for scalar types.Fix an error when a
wp.kernel
or awp.func
object is annotated to return aNone
value.Fix error when reading multi-volume, BLOSC-compressed
.nvdb
files.Fix
wp.printf()
erroring out when no variadic arguments are passed (GH-333).Fix memory access issues in soft-rigid contact collisions (GH-362).
Fix gradient propagation for in-place addition/subtraction operations on custom vector-type arrays.
Fix the OpenGL renderer’s window not closing when clicking the X button.
Fix the OpenGL renderer’s camera snapping to a different direction from the initial camera’s orientation when first looking around.
Fix custom colors being ignored when rendering meshes in OpenGL (GH-343).
Fix topology updates not being supported by the the OpenGL renderer.
1.4.2 - 2024-11-13#
Changed#
Make the output of
wp.print()
in backward kernels consistent for all supported data types.
Fixed#
Fix to relax the integer types expected when indexing arrays (regression in
1.3.0
).Fix printing vector and matrix adjoints in backward kernels.
Fix kernel compile error when printing structs.
Fix an incorrect user function being sometimes resolved when multiple overloads are available with array parameters with different
dtype
values.Fix error being raised when static and dynamic for-loops are written in sequence with the same iteration variable names (GH-331).
Fix an issue with the
Texture Write
node, used in the Mandelbrot Omniverse sample, sometimes erroring out in multi-GPU environments.Code generation of in-place multiplication and division operations (regression introduced in a69d061)(GH-342).
1.4.1 - 2024-10-15#
Fixed#
Fix
iter_reverse()
not working as expected for ranges with steps other than 1 (GH-311).Fix potential out-of-bounds memory access when a
wp.sparse.BsrMatrix
object is reused for storing matrices of different shapes.Fix robustness to very low desired tolerance in
wp.fem.utils.symmetric_eigenvalues_qr
.Fix invalid code generation error messages when nesting dynamic and static for-loops.
Fix caching of kernels with static expressions.
Fix
ModelBuilder.add_builder(builder)
to correctly updatearticulation_start
and therebyarticulation_count
whenbuilder
contains more than one articulation.Re-introduced the
wp.rand*()
,wp.sample*()
, andwp.poisson()
onto the Python scope to revert a breaking change.
1.4.0 - 2024-10-01#
Added#
Support for a new
wp.static(expr)
function that allows arbitrary Python expressions to be evaluated at the time of function/kernel definition (docs).Support for stream priorities to hint to the device that it should process pending work in high-priority streams over pending work in low-priority streams when possible (docs).
Adaptive sparse grid geometry to
warp.fem
(docs).Support for defining
wp.kernel
andwp.func
objects from within closures.Support for defining multiple versions of kernels, functions, and structs without manually assigning unique keys.
Support for default argument values for user functions decorated with
wp.func
.Allow passing custom launch dimensions to
jax_kernel()
(GH-310).JAX interoperability examples for sharding and matrix multiplication (docs).
Interoperability support for the PaddlePaddle ML framework (GH-318).
Support
wp.mod()
for vector types (GH-282).Expose the modulo operator
%
to Python’s runtime scalar and vector types.Support for fp64
atomic_add
,atomic_max
, andatomic_min
(GH-284).Support for quaternion indexing (e.g.
q.w
).Support shadowing builtin functions (GH-308).
Support for redefining function overloads.
Add an ocean sample to the
omni.warp
extension.warp.sim.VBDIntegrator
now supports body-particle collision.Add a contributing guide to the Sphinx docs .
Add documentation for dynamic code generation (docs).
Changed#
wp.sim.Model.edge_indices
now includes boundary edges.Unexposed
wp.rand*()
,wp.sample*()
, andwp.poisson()
from the Python scope.Skip unused functions in module code generation, improving performance.
Avoid reloading modules if their content does not change, improving performance.
wp.Mesh.points
is now a property instead of a raw data member, its reference can be changed after the mesh is initialized.Improve error message when invalid objects are referenced in a Warp kernel.
if
/else
/elif
statements with constant conditions are resolved at compile time with no branches being inserted in the generated code.Include all non-hidden builtins in the stub file.
Improve accuracy of symmetric eigenvalues routine in
warp.fem
.
Fixed#
Fix for
wp.func
erroring out when defining aTuple
as a return type hint (GH-302).Fix array in-place op (
+=
,-=
) adjoints to compute gradients correctly in the backwards passFix vector, matrix in-place assignment adjoints to compute gradients correctly in the backwards pass, e.g.:
v[1] = x
Fix a bug in which Python docstrings would be created as local function variables in generated code.
Fix a bug with autograd array access validation in functions from different modules.
Fix a rare crash during error reporting on some systems due to glibc mismatches.
Handle
--num_tiles 1
inexample_render_opengl.py
(GH-306).Fix the computation of body contact forces in
FeatherstoneIntegrator
when bodies and particles collide.Fix bug in
FeatherstoneIntegrator
whereeval_rigid_jacobian
could give incorrect results or reach an infinite loop when the body and joint indices were not in the same order. AddedModel.joint_ancestor
to fix the indexing from a joint to its parent joint in the articulation.Fix wrong vertex index passed to
add_edges()
called fromModelBuilder.add_cloth_mesh()
(GH-319).Add a workaround for uninitialized memory read warning in the
compute-sanitizer
initcheck tool when usingwp.Mesh
.Fix name clashes when Warp functions and structs are returned from Python functions multiple times.
Fix name clashes between Warp functions and structs defined in different modules.
Fix code generation errors when overloading generic kernels defined in a Python function.
Fix issues with unrelated functions being treated as overloads (e.g., closures).
Fix handling of
stream
argument inarray.__dlpack__()
.Fix a bug related to reloading CPU modules.
Fix a crash when kernel functions are not found in CPU modules.
Fix conditions not being evaluated as expected in
while
statements.Fix printing Boolean and 8-bit integer values.
Fix array interface type strings used for Boolean and 8-bit integer values.
Fix initialization error when setting struct members.
Fix Warp not being initialized upon entering a
wp.Tape
context.Use
kDLBool
instead ofkDLUInt
for DLPack interop of Booleans.
1.3.3 - 2024-09-04#
Bug fixes
Fix an aliasing issue with zero-copy array initialization from NumPy introduced in Warp 1.3.0.
Fix
wp.Volume.load_from_numpy()
behavior whenbg_value
is a sequence of values (GH-312).
1.3.2 - 2024-08-30#
Bug fixes
Fix accuracy of 3x3 SVD
wp.svd3
with fp64 numbers (GH-281).Fix module hashing when a kernel argument contained a struct array (GH-287).
Fix a bug in
wp.bvh_query_ray()
where the direction instead of the reciprocal direction was used (GH-288).Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture will no longer be unloaded before the graph is released.
Fix a bug in
wp.sim.collide.triangle_closest_point_barycentric()
where the returned barycentric coordinates may be incorrect when the closest point lies on an edge.Fix 32-bit overflow when array shape is specified using
np.int32
.Fix handling of integer indices in the
input_output_mask
argument toautograd.jacobian
andautograd.jacobian_fd
(GH-289).Fix
ModelBuilder.collapse_fixed_joints()
to correctly update the body centers of mass and theModelBuilder.articulation_start
array.Fix precedence of closure constants over global constants.
Fix quadrature point indexing in
wp.fem.ExplicitQuadrature
(regression from 1.3.0).
Documentation improvements
Add missing return types for built-in functions.
Clarify that atomic operations also return the previous value.
Clarify that
wp.bvh_query_aabb()
returns parts that overlap the bounding volume.
1.3.1 - 2024-07-27#
Remove
wp.synchronize()
from PyTorch autograd function exampleTape.check_kernel_array_access()
andTape.reset_array_read_flags()
are now private methods.Fix reporting unmatched argument types
1.3.0 - 2024-07-25#
Warp Core improvements
Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see README.md for commands to install CUDA 11.x binaries for older drivers
Add information to the module load print outs to indicate whether a module was compiled
(compiled)
, loaded from the cache(cached)
, or was unable to be loaded(error)
.wp.config.verbose = True
now also prints out a message upon the entry to awp.ScopedTimer
.Add
wp.clear_kernel_cache()
to the public API. This is equivalent towp.build.clear_kernel_cache()
.Add code-completion support for
wp.config
variables.Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
Improve error messages for unsupported Python operations such as sequence construction in kernels
Update
wp.matmul()
CPU fallback to use dtype explicitly innp.matmul()
callAdd support for PEP 563’s
from __future__ import annotations
(GH-256).Allow passing external arrays/tensors to
wp.launch()
directly via__cuda_array_interface__
and__array_interface__
, up to 2.5x faster conversion from PyTorchAdd faster Torch interop path using
return_ctype
argument towp.from_torch()
Handle incompatible CUDA driver versions gracefully
Add
wp.abs()
andwp.sign()
for vector typesExpose scalar arithmetic operators to Python’s runtime (e.g.:
wp.float16(1.23) * wp.float16(2.34)
)Add support for creating volumes with anisotropic transforms
Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiabilityAdd
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collectionImplement the assignment operator for
wp.quat
Make the geometry-related built-ins available only from within kernels
Rename the API-facing query types to remove their
_t
suffix:wp.BVHQuery
,wp.HashGridQuery
,wp.MeshQueryAABB
,wp.MeshQueryPoint
, andwp.MeshQueryRay
Add
wp.array(ptr=...)
to allow initializing arrays from pointer addresses inside of kernels (GH-206)
warp.autograd
improvements:New
warp.autograd
module with utility functionsgradcheck()
,jacobian()
, andjacobian_fd()
for debugging kernel Jacobians (docs)Add array overwrite detection, if
wp.config.verify_autograd_array_access
is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)Fix bug where modification of
@wp.func_replay
functions and native snippets would not trigger module recompilationAdd documentation for dynamic loop autograd limitations
warp.sim
improvements:Improve memory usage and performance for rigid body contact handling when
self.rigid_mesh_contact_max
is zero (default behavior).The
mask
argument towp.sim.eval_fk()
now accepts both integer and boolean arrays to mask articulations.Fix handling of
ModelBuilder.joint_act
inModelBuilder.collapse_fixed_joints()
(affected floating-base systems)Fix and improve implementation of
ModelBuilder.plot_articulation()
to visualize the articulation tree of a rigid-body mechanismFix ShapeInstancer
__new__()
method (missing instance return and*args
parameter)Fix handling of
upaxis
variable inModelBuilder
and the rendering thereof inOpenGLRenderer
warp.sparse
improvements:Sparse matrix allocations (from
bsr_from_triplets()
,bsr_axpy()
, etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.bsr_assign()
now supports changing block shape (including CSR/BSR conversions)Add Python operator overloads for common sparse matrix operations, e.g
A += 0.5 * B
,y = x @ C
warp.fem
new features and fixes:Support for variable number of nodes per element
Global
wp.fem.lookup()
operator now supportswp.fem.Tetmesh
andwp.fem.Trimesh2D
geometriesSimplified defining custom subdomains (
wp.fem.Subdomain
), free-slip boundary conditionsNew field types:
wp.fem.UniformField
,wp.fem.ImplicitField
andwp.fem.NonconformingField
New
streamlines
,magnetostatics
andnonconforming_contact
examples, updatedmixed_elasticity
to use a nonlinear modelFunction spaces can now export VTK-compatible cells for visualization
Fixed edge cases with NanoVDB function spaces
Fixed differentiability of
wp.fem.PicQuadrature
w.r.t. positions and measures
1.2.2 - 2024-07-04#
Fix hashing of replay functions and snippets
Add additional documentation and examples demonstrating
wp.copy()
,wp.clone()
, andarray.assign()
differentiabilityAdd
__new__()
methods for all class__del__()
methods to handle when a class instance is created but not instantiated before garbage collection.Add documentation for dynamic loop autograd limitations
Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
Implement the assignment operator for
wp.quat
1.2.2 - 2024-07-04#
Support for NumPy >= 2.0
1.2.1 - 2024-06-14#
Fix generic function caching
Fix Warp not being initialized when constructing arrays with
wp.array()
Fix
wp.is_mempool_access_supported()
not resolving the provided device arguments towp.context.Device
1.2.0 - 2024-06-06#
Add a not-a-number floating-point constant that can be used as
wp.NAN
orwp.nan
.Add
wp.isnan()
,wp.isinf()
, andwp.isfinite()
for scalars, vectors, matrices, etc.Improve kernel cache reuse by hashing just the local module constants. Previously, a module’s hash was affected by all
wp.constant()
variables declared in a Warp program.Revised module compilation process to allow multiple processes to use the same kernel cache directory. Cached kernels will now be stored in hash-specific subdirectory.
Add runtime checks for
wp.MarchingCubes
on field dimensions and sizeFix memory leak in
wp.Mesh
BVH (GH-225)Use C++17 when building the Warp library and user kernels
Increase PTX target architecture up to
sm_75
(fromsm_70
), enabling Turing ISA featuresExtended NanoVDB support (see
warp.Volume
):Add support for data-agnostic index grids, allocation at voxel granularity
New
wp.volume_lookup_index()
,wp.volume_sample_index()
and genericwp.volume_sample()
/wp.volume_lookup()
/wp.volume_store()
kernel-level functionsZero-copy aliasing of in-memory grids, support for multi-grid buffers
Grid introspection and blind data access capabilities
warp.fem
can now work directly on NanoVDB grids usingwarp.fem.Nanogrid
Fixed
wp.volume_sample_v()
andwp.volume_store_*()
adjointsPrevent
wp.volume_store()
from overwriting grid background values
Improve validation of user-provided fields and values in
warp.fem
Support headless rendering of
wp.render.OpenGLRenderer
viapyglet.options["headless"] = True
wp.render.RegisteredGLBuffer
can fall back to CPU-bound copying if CUDA/OpenGL interop is not availableClarify terms for external contributions, please see CONTRIBUTING.md for details
Improve performance of
wp.sparse.bsr_mm()
by ~5x on benchmark problemsFix for XPBD incorrectly indexing into of joint actuations
joint_act
arraysFix for mass matrix gradients computation in
wp.sim.FeatherstoneIntegrator()
Fix for handling of
--msvc_path
in build scriptsFix for
wp.copy()
params to record dest and src offset parameters onwp.Tape()
Fix for
wp.randn()
to ensure return values are finiteFix for slicing of arrays with gradients in kernels
Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
Fix for handling of
bool
types in generic kernelsPublish CUDA 12.5 binaries for Hopper support, see https://github.com/nvidia/warp?tab=readme-ov-file#installing for details
1.1.1 - 2024-05-24#
wp.init()
is no longer required to be called explicitly and will be performed on first call to the APISpeed up
omni.warp.core
’s startup time
1.1.0 - 2024-05-09#
Support returning a value from
@wp.func_native
CUDA functions using type hintsImproved differentiability of the
wp.sim.FeatherstoneIntegrator
Fix gradient propagation for rigid body contacts in
wp.sim.collide()
Added support for event-based timing, see
wp.ScopedTimer()
Added Tape visualization and debugging functions, see
wp.Tape.visualize()
Support constructing Warp arrays from objects that define the
__cuda_array_interface__
attributeSupport copying a struct to another device, use
struct.to(device)
to migrate struct arraysAllow rigid shapes to not have any collisions with other shapes in
wp.sim.Model
Change default test behavior to test redundant GPUs (up to 2x)
Test each example in an individual subprocess
Polish and optimize various examples and tests
Allow non-contiguous point arrays to be passed to
wp.HashGrid.build()
Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
Build DLL source code as C++17 and require GCC 9.4 as a minimum
Array clone, assign, and copy are now differentiable
Use
Ruff
for formatting and lintingVarious documentation improvements (infinity, math constants, etc.)
Improve URDF importer, handle joint armature
Allow builtins.bool to be used in Warp data structures
Use external gradient arrays in backward passes when passed to
wp.launch()
Add Conjugate Residual linear solver, see
wp.optim.linear.cr()
Fix propagation of gradients on aliased copy of variables in kernels
Facilitate debugging and speed up
import warp
by eliminating raising any exceptionsImprove support for nested vec/mat assignments in structs
Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
Support gradient propagation for indexing sliced multi-dimensional arrays, i.e.
a[i][j]
vs.a[i, j]
Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library
1.0.3 - 2024-04-17#
Add a
support_level
entry to the configuration file of the extensions
1.0.2 - 2024-03-22#
Make examples runnable from any location
Fix the examples not running directly from their Python file
Add the example gallery to the documentation
Update
README.md
examples USD locationUpdate
example_graph_capture.py
description
1.0.1 - 2024-03-15#
Document Device
total_memory
andfree_memory
Documentation for allocators, streams, peer access, and generics
Changed example output directory to current working directory
Added
python -m warp.examples.browse
for browsing the examples folderPrint where the USD stage file is being saved
Added
examples/optim/example_walker.py
sampleMake the drone example not specific to USD
Reduce the time taken to run some examples
Optimise rendering points with a single colour
Clarify an error message around needing USD
Raise exception when module is unloaded during graph capture
Added
wp.synchronize_event()
for blocking the host thread until a recorded event completesFlush C print buffers when ending
stdout
captureRemove more unneeded CUTLASS files
Allow setting mempool release threshold as a fractional value
1.0.0 - 2024-03-07#
Add
FeatherstoneIntegrator
which provides more stable simulation of articulated rigid body dynamics in generalized coordinates (State.joint_q
andState.joint_qd
)Introduce
warp.sim.Control
struct to store control inputs for simulations (optional, by default theModel
control inputs are used as before); integrators now have a different simulation signature:integrator.simulate(model: Model, state_in: State, state_out: State, dt: float, control: Control)
joint_act
can now behave in 3 modes: withjoint_axis_mode
set toJOINT_MODE_FORCE
it behaves as a force/torque, withJOINT_MODE_VELOCITY
it behaves as a velocity target, and withJOINT_MODE_POSITION
it behaves as a position target;joint_target
has been removedAdd adhesive contact to Euler integrators via
Model.shape_materials.ka
which controls the contact distance at which the adhesive force is appliedImprove handling of visual/collision shapes in URDF importer so visual shapes are not involved in contact dynamics
Experimental JAX kernel callback support
Improve module load exception message
Add
wp.ScopedCapture
Removing
enable_backward
warning for callablesCopy docstrings and annotations from wrapped kernels, functions, structs
0.15.1 - 2024-03-05#
Add examples assets to the wheel packages
Fix broken image link in documentation
Fix codegen for custom grad functions calling their respective forward functions
Fix custom grad function handling for functions that have no outputs
Fix issues when
wp.config.quiet = True
0.15.0 - 2024-03-04#
Add thumbnails to examples gallery
Apply colored lighting to examples
Moved
examples
directory underwarp/
Add example usage to
python -m warp.tests --help
Adding
torch.autograd.function
example + docsAdd error-checking to array shapes during creation
Adding
example_graph_capture
Add a Diffsim Example of a Drone
Fix
verify_fp
causing compiler errors and support CPU kernelsFix to enable
matmul
to be called in CUDA graph captureEnable mempools by default
Update
wp.launch
to support tuple argsFix BiCGSTAB and GMRES producing NaNs when converging early
Fix warning about backward codegen being disabled in
test_fem
Fix
assert_np_equal
when NaN’s and tolerance are involvedImprove error message to discern between CUDA being disabled or not supported
Support cross-module functions with user-defined gradients
Suppress superfluous CUDA error when ending capture after errors
Make output during initialization atomic
Add
warp.config.max_unroll
, fix custom gradient unrollingSupport native replay snippets using
@wp.func_native(snippet, replay_snippet=replay_snippet)
Look for the CUDA Toolkit in default locations if the
CUDA_PATH
environment variable or--cuda_path
build option are not usedAdded
wp.ones()
to efficiently create one-initialized arraysRename
wp.config.graph_capture_module_load_default
towp.config.enable_graph_capture_module_load_by_default
0.14.0 - 2024-02-19#
Add support for CUDA pooled (stream-ordered) allocators
Support memory allocation during graph capture
Support copying non-contiguous CUDA arrays during graph capture
Improved memory allocation/deallocation performance with pooled allocators
Use
wp.config.enable_mempools_at_init
to enable pooled allocators during Warp initialization (if supported)wp.is_mempool_supported()
- check if a device supports pooled allocatorswp.is_mempool_enabled()
,wp.set_mempool_enabled()
- enable or disable pooled allocators per devicewp.set_mempool_release_threshold()
,wp.get_mempool_release_threshold()
- configure memory pool release threshold
Add support for direct memory access between devices
Improved peer-to-peer memory transfer performance if access is enabled
Caveat: enabling peer access may impact memory allocation/deallocation performance and increase memory consumption
wp.is_peer_access_supported()
- check if the memory of a device can be accessed by a peer devicewp.is_peer_access_enabled()
,wp.set_peer_access_enabled()
- manage peer access for memory allocated using default CUDA allocatorswp.is_mempool_access_supported()
- check if the memory pool of a device can be accessed by a peer devicewp.is_mempool_access_enabled()
,wp.set_mempool_access_enabled()
- manage access for memory allocated using pooled CUDA allocators
Refined stream synchronization semantics
wp.ScopedStream
can synchronize with the previous stream on entry and/or exit (only sync on entry by default)Functions taking an optional stream argument do no implicit synchronization for max performance (e.g.,
wp.copy()
,wp.launch()
,wp.capture_launch()
)
Support for passing a custom
deleter
argument when constructing arraysDeprecation of
owner
argument - usedeleter
to transfer ownership
Optimizations for various core API functions (e.g.,
wp.zeros()
,wp.full()
, and more)Fix
wp.matmul()
to always use the correct CUDA contextFix memory leak in BSR transpose
Fix stream synchronization issues when copying non-contiguous arrays
API change:
wp.matmul()
no longer accepts a device as a parameter; instead, it infers the correct device from the arrays being multipliedUpdated DLPack utilities to the latest published standard
External arrays can be imported into Warp directly, e.g.,
wp.from_dlpack(external_array)
Warp arrays can be exported to consumer frameworks directly, e.g.,
jax.dlpack.from_dlpack(warp_array)
Added CUDA stream synchronization for CUDA arrays
The original DLPack protocol can still be used for better performance when stream synchronization is not required, see interoperability docs for details
warp.to_dlpack()
is about 3-4x faster in common caseswarp.from_dlpack()
is about 2x faster when called with a DLPack capsuleFixed a small CPU memory leak related to DLPack interop
Improved performance of creating arrays
0.13.1 - 2024-02-22#
Ensure that the results from the
Noise Deform
are deterministic across different Kit sessions
0.13.0 - 2024-02-16#
Update the license to NVIDIA Software License, allowing commercial use (see
LICENSE.md
)Add
CONTRIBUTING.md
guidelines (for NVIDIA employees)Hash CUDA
snippet
andadj_snippet
strings to fix cachingFix
build_docs.py
on WindowsAdd missing
.py
extension towarp/tests/walkthrough_debug
Allow
wp.bool
usage in vector and matrix types
0.12.0 - 2024-02-05#
Add a warning when the
enable_backward
setting is set toFalse
upon callingwp.Tape.backward()
Fix kernels not being recompiled as expected when defined using a closure
Change the kernel cache appauthor subdirectory to just “NVIDIA”
Ensure that gradients attached to PyTorch tensors have compatible strides when calling
wp.from_torch()
Add a
Noise Deform
node for OmniGraph that deforms points using a perlin/curl noise
0.11.0 - 2024-01-23#
Re-release 1.0.0-beta.7 as a non-pre-release 0.11.0 version so it gets selected by
pip install warp-lang
.Introducing a new versioning and release process, detailed in
PACKAGING.md
and resembling that of Python itself:The 0.11 release(s) can be found on the
release-0.11
branch.Point releases (if any) go on the same minor release branch and only contain bug fixes, not new features.
The
public
branch, previously used to merge releases into and corresponding with the GitHubmain
branch, is retired.
1.0.0-beta.7 - 2024-01-23#
Ensure captures are always enclosed in
try
/finally
Only include .py files from the warp subdirectory into wheel packages
Fix an extension’s sample node failing at parsing some version numbers
Allow examples to run without USD when possible
Add a setting to disable the main Warp menu in Kit
Add iterative linear solvers, see
wp.optim.linear.cg
,wp.optim.linear.bicgstab
,wp.optim.linear.gmres
, andwp.optim.linear.LinearOperator
Improve error messages around global variables
Improve error messages around mat/vec assignments
Support conversion of scalars to native/ctypes, e.g.:
float(wp.float32(1.23))
orctypes.c_float(wp.float32(1.23))
Add a constant for infinity, see
wp.inf
Add a FAQ entry about array assignments
Add a mass spring cage diff simulation example, see
examples/example_diffsim_mass_spring_cage.py
Add
-s
,--suite
option for only running tests belonging to the given suitesFix common spelling mistakes
Fix indentation of generated code
Show deprecation warnings only once
Improve
wp.render.OpenGLRenderer
Create the extension’s symlink to the core library at runtime
Fix some built-ins failing to compile the backward pass when nested inside if/else blocks
Update examples with the new variants of the mesh query built-ins
Fix type members that weren’t zero-initialized
Fix missing adjoint function for
wp.mesh_query_ray()
1.0.0-beta.6 - 2024-01-10#
Do not create CPU copy of grad array when calling
array.numpy()
Fix
assert_np_equal()
bugSupport Linux AArch64 platforms, including Jetson/Tegra devices
Add parallel testing runner (invoke with
python -m warp.tests
, usewarp/tests/unittest_serial.py
for serial testing)Fix support for function calls in
range()
wp.matmul()
adjoints now accumulateExpand available operators (e.g. vector @ matrix, scalar as dividend) and improve support for calling native built-ins
Fix multi-gpu synchronization issue in
sparse.py
Add depth rendering to
wp.render.OpenGLRenderer
, documentwp.render
Make
wp.atomic_min()
,wp.atomic_max()
differentiableFix error reporting using the exact source segment
Add user-friendly mesh query overloads, returning a struct instead of overwriting parameters
Address multiple differentiability issues
Fix backpropagation for returning array element references
Support passing the return value to adjoints
Add point basis space and explicit point-based quadrature for
wp.fem
Support overriding the LLVM project source directory path using
build_lib.py --build_llvm --llvm_source_path=
Fix the error message for accessing non-existing attributes
Flatten faces array for Mesh constructor in URDF parser
1.0.0-beta.5 - 2023-11-22#
Fix for kernel caching when function argument types change
Fix code-gen ordering of dependent structs
Fix for
wp.Mesh
build on MGPU systemsFix for name clash bug with adjoint code: https://github.com/NVIDIA/warp/issues/154
Add
wp.frac()
for returning the fractional part of a floating point valueAdd support for custom native CUDA snippets using
@wp.func_native
decoratorAdd support for batched matmul with batch size > 2^16-1
Add support for transposed CUTLASS
wp.matmul()
and additional error checkingAdd support for quad and hex meshes in
wp.fem
Detect and warn when C++ runtime doesn’t match compiler during build, e.g.:
libstdc++.so.6: version `GLIBCXX_3.4.30' not found
Documentation update for
wp.BVH
Documentation and simplified API for runtime kernel specialization
wp.Kernel
1.0.0-beta.4 - 2023-11-01#
Add
wp.cbrt()
for cube root calculationAdd
wp.mesh_furthest_point_no_sign()
to compute furthest point on a surface from a query pointAdd support for GPU BVH builds, 10-100x faster than CPU builds for large meshes
Add support for chained comparisons, i.e.:
0 < x < 2
Add support for running
wp.fem
examples headlessFix for unit test determinism
Fix for possible GC collection of array during graph capture
Fix for
wp.utils.array_sum()
output initialization when used with vector typesCoverage and documentation updates
1.0.0-beta.3 - 2023-10-19#
Add support for code coverage scans (test_coverage.py), coverage at 85% in
omni.warp.core
Add support for named component access for vector types, e.g.:
a = v.x
Add support for lvalue expressions, e.g.:
array[i] += b
Add casting constructors for matrix and vector types
Add support for
type()
operator that can be used to return type inside kernelsAdd support for grid-stride kernels to support kernels with > 2^31-1 thread blocks
Fix for multi-process initialization warnings
Fix alignment issues with empty
wp.struct
Fix for return statement warning with tuple-returning functions
Fix for
wp.batched_matmul()
registering the wrong function in the TapeFix and document for
wp.sim
forward + inverse kinematicsFix for
wp.func
to return a default value if function does not return on all control pathsRefactor
wp.fem
support for new basis functions, decoupled function spacesOptimizations for
wp.noise
functions, up to 10x faster in most casesOptimizations for
type_size_in_bytes()
used in array construction’
Breaking Changes#
To support grid-stride kernels,
wp.tid()
can no longer be called insidewp.func
functions.
1.0.0-beta.2 - 2023-09-01#
Fix for passing bool into
wp.func
functionsFix for deprecation warnings appearing on
stderr
, now redirected tostdout
Fix for using
for i in wp.hash_grid_query(..)
syntax
1.0.0-beta.1 - 2023-08-29#
Fix for
wp.float16
being passed as kernel argumentsFix for compile errors with kernels using structs in backward pass
Fix for
wp.Mesh.refit()
not being CUDA graph capturable due to synchronous temp. allocsFix for dynamic texture example flickering / MGPU crashes demo in Kit by reusing
ui.DynamicImageProvider
instancesFix for a regression that disabled bundle change tracking in samples
Fix for incorrect surface velocities when meshes are deforming in
OgnClothSimulate
Fix for incorrect lower-case when setting USD stage “up_axis” in examples
Fix for incompatible gradient types when wrapping PyTorch tensor as a vector or matrix type
Fix for adding open edges when building cloth constraints from meshes in
wp.sim.ModelBuilder.add_cloth_mesh()
Add support for
wp.fabricarray
to directly access Fabric data from Warp kernels, see https://docs.omniverse.nvidia.com/kit/docs/usdrt/latest/docs/usdrt_prim_selection.html for examplesAdd support for user defined gradient functions, see
@wp.func_replay
, and@wp.func_grad
decoratorsAdd support for more OG attribute types in
omni.warp.from_omni_graph()
Add support for creating NanoVDB
wp.Volume
objects from dense NumPy arraysAdd support for
wp.volume_sample_grad_f()
which returns the value + gradient efficiently from an NVDB volumeAdd support for LLVM fp16 intrinsics for half-precision arithmetic
Add implementation of stochastic gradient descent, see
wp.optim.SGD
Add
wp.fem
framework for solving weak-form PDE problems (see https://nvidia.github.io/warp/modules/fem.html)Optimizations for
omni.warp
extension load time (2.2s to 625ms cold start)Make all
omni.ui
dependencies optional so that Warp unit tests can run headlessDeprecation of
wp.tid()
outside of kernel functions, users should passtid()
values towp.func
functions explicitlyDeprecation of
wp.sim.Model.flatten()
for returning all contained tensors from the modelAdd support for clamping particle max velocity in
wp.sim.Model.particle_max_velocity
Remove dependency on
urdfpy
package, improve MJCF parser handling of default values
0.10.1 - 2023-07-25#
Fix for large multidimensional kernel launches (> 2^32 threads)
Fix for module hashing with generics
Fix for unrolling loops with break or continue statements (will skip unrolling)
Fix for passing boolean arguments to build_lib.py (previously ignored)
Fix build warnings on Linux
Fix for creating array of structs from NumPy structured array
Fix for regression on kernel load times in Kit when using
wp.sim
Update
wp.array.reshape()
to handle-1
dimensionsUpdate margin used by for mesh queries when using
wp.sim.create_soft_body_contacts()
Improvements to gradient handling with
wp.from_torch()
,wp.to_torch()
plus documentation
0.10.0 - 2023-07-05#
Add support for macOS universal binaries (x86 + aarch64) for M1+ support
Add additional methods for SDF generation please see the following new methods:
wp.mesh_query_point_nosign()
- closest point query with no sign determinationwp.mesh_query_point_sign_normal()
- closest point query with sign from angle-weighted normalwp.mesh_query_point_sign_winding_number()
- closest point query with fast winding number sign determination
Add CSR/BSR sparse matrix support, see
wp.sparse
module:wp.sparse.BsrMatrix
wp.sparse.bsr_zeros()
,wp.sparse.bsr_set_from_triplets()
for constructionwp.sparse.bsr_mm()
,wp.sparse_bsr_mv()
for matrix-matrix and matrix-vector products respectively
Add array-wide utilities:
wp.utils.array_scan()
- prefix sum (inclusive or exclusive)wp.utils.array_sum()
- sum across arraywp.utils.radix_sort_pairs()
- in-place radix sort (key,value) pairs
Add support for calling
@wp.func
functions from Python (outside of kernel scope)Add support for recording kernel launches using a
wp.Launch
object that can be replayed with low overhead, usewp.launch(..., record_cmd=True)
to generate a command objectOptimizations for
wp.struct
kernel arguments, up to 20x faster launches for kernels with large structs or number of paramsRefresh USD samples to use bundle based workflow + change tracking
Add Python API for manipulating mesh and point bundle data in OmniGraph, see
omni.warp.nodes
module, seeomni.warp.nodes.mesh_create_bundle()
,omni.warp.nodes.mesh_get_points()
, etcImprovements to
wp.array
:Fix a number of array methods misbehaving with empty arrays
Fix a number of bugs and memory leaks related to gradient arrays
Fix array construction when creating arrays in pinned memory from a data source in pageable memory
wp.empty()
no longer zeroes-out memory and returns an uninitialized array, as intendedarray.zero_()
andarray.fill_()
work with non-contiguous arraysSupport wrapping non-contiguous NumPy arrays without a copy
Support preserving the outer dimensions of NumPy arrays when wrapping them as Warp arrays of vector or matrix types
Improve PyTorch and DLPack interop with Warp arrays of arbitrary vectors and matrices
array.fill_()
can now take lists or other sequences when filling arrays of vectors or matrices, e.g.arr.fill_([[1, 2], [3, 4]])
array.fill_()
now works with arrays of structs (pass a struct instance)wp.copy()
gracefully handles copying between non-contiguous arrays on different devicesAdd
wp.full()
andwp.full_like()
, e.g.,a = wp.full(shape, value)
Add optional
device
argument towp.empty_like()
,wp.zeros_like()
,wp.full_like()
, andwp.clone()
Add
indexedarray
methods.zero_()
,.fill_()
, and.assign()
Fix
indexedarray
methods.numpy()
and.list()
Fix
array.list()
to work with arrays of any Warp data typeFix
array.list()
synchronization issue with CUDA arraysarray.numpy()
called on an array of structs returns a structured NumPy array with named fieldsImprove the performance of creating arrays
Fix for
Error: No module named 'omni.warp.core'
when running some Kit configurations (e.g.: stubgen)Fix for
wp.struct
instance address being included in module content hashFix codegen with overridden function names
Fix for kernel hashing so it occurs after code generation and before loading to fix a bug with stale kernel cache
Fix for
wp.BVH.refit()
when executed on the CPUFix adjoint of
wp.struct
constructorFix element accessors for
wp.float16
vectors and matrices in PythonFix
wp.float16
members in structsRemove deprecated
wp.ScopedCudaGuard()
, please usewp.ScopedDevice()
instead
0.9.0 - 2023-06-01#
Add support for in-place modifications to vector, matrix, and struct types inside kernels (will warn during backward pass with
wp.verbose
if using gradients)Add support for step-through VSCode debugging of kernel code with standalone LLVM compiler, see
wp.breakpoint()
, andwalkthrough_debug.py
Add support for default values on built-in functions
Add support for multi-valued
@wp.func
functionsAdd support for
pass
,continue
, andbreak
statementsAdd missing
__sincos_stret
symbol for macOSAdd support for gradient propagation through
wp.Mesh.points
, and other cases where arrays are passed to native functionsAdd support for Python
@
operator as an alias forwp.matmul()
Add XPBD support for particle-particle collision
Add support for individual particle radii:
ModelBuilder.add_particle
has a newradius
argument,Model.particle_radius
is now a Warp arrayAdd per-particle flags as a
Model.particle_flags
Warp array, introducePARTICLE_FLAG_ACTIVE
to define whether a particle is being simulated and participates in contact dynamicsAdd support for Python bitwise operators
&
,|
,~
,<<
,>>
Switch to using standalone LLVM compiler by default for
cpu
devicesSplit
omni.warp
intoomni.warp.core
for Omniverse applications that want to use the Warp Python module with minimal additional dependenciesDisable kernel gradient generation by default inside Omniverse for improved compile times
Fix for bounds checking on element access of vector/matrix types
Fix for stream initialization when a custom (non-primary) external CUDA context has been set on the calling thread
Fix for duplicate
@wp.struct
registration during hot reloadFix for array
unot()
operator so kernel writers can useif not array:
syntaxFix for case where dynamic loops are nested within unrolled loops
Change
wp.hash_grid_point_id()
now returns -1 if thewp.HashGrid
has not been reserved beforeDeprecate
wp.Model.soft_contact_distance
which is now replaced bywp.Model.particle_radius
Deprecate single scalar particle radius (should be a per-particle array)
0.8.2 - 2023-04-21#
Add
ModelBuilder.soft_contact_max
to control the maximum number of soft contacts that can be registered. UseModel.allocate_soft_contacts(new_count)
to change count on existingModel
objects.Add support for
bool
parametersAdd support for logical boolean operators with
int
typesFix for
wp.quat()
default constructorFix conditional reassignments
Add sign determination using angle weighted normal version of
wp.mesh_query_point()
aswp.mesh_query_sign_normal()
Add sign determination using winding number of
wp.mesh_query_point()
aswp.mesh_query_sign_winding_number()
Add query point without sign determination
wp.mesh_query_no_sign()
0.8.1 - 2023-04-13#
Fix for regression when passing flattened numeric lists as matrix arguments to kernels
Fix for regressions when passing
wp.struct
types with uninitialized (None
) member attributes
0.8.0 - 2023-04-05#
Add
Texture Write
node for updating dynamic RTX textures from Warp kernels / nodesAdd multi-dimensional kernel support to Warp Kernel Node
Add
wp.load_module()
to pre-load specific modules (passrecursive=True
to load recursively)Add
wp.poisson()
for sampling Poisson distributionsAdd support for UsdPhysics schema see
wp.sim.parse_usd()
Add XPBD rigid body implementation plus diff. simulation examples
Add support for standalone CPU compilation (no host-compiler) with LLVM backed, enable with
--standalone
build optionAdd support for per-timer color in
wp.ScopedTimer()
Add support for row-based construction of matrix types outside of kernels
Add support for setting and getting row vectors for Python matrices, see
matrix.get_row()
,matrix.set_row()
Add support for instantiating
wp.struct
types within kernelsAdd support for indexed arrays,
slice = array[indices]
will now generate a sparse slice of array dataAdd support for generic kernel params, use
def compute(param: Any):
Add support for
with wp.ScopedDevice("cuda") as device:
syntax (same forwp.ScopedStream()
,wp.Tape()
)Add support for creating custom length vector/matrices inside kernels, see
wp.vector()
, andwp.matrix()
Add support for creating identity matrices in kernels with, e.g.:
I = wp.identity(n=3, dtype=float)
Add support for unary plus operator (
wp.pos()
)Add support for
wp.constant
variables to be used directly in Python without having to use.val
memberAdd support for nested
wp.struct
typesAdd support for returning
wp.struct
from functionsAdd
--quick
build for faster local dev. iteration (uses a reduced set of SASS arches)Add optional
requires_grad
parameter towp.from_torch()
to override gradient allocationAdd type hints for generic vector / matrix types in Python stubs
Add support for custom user function recording in
wp.Tape()
Add support for registering CUTLASS
wp.matmul()
with tape backward passAdd support for grids with > 2^31 threads (each dimension may be up to INT_MAX in length)
Add CPU fallback for
wp.matmul()
Optimizations for
wp.launch()
, up to 3x faster launches in common casesFix
wp.randf()
conversion to float to reduce bias for uniform samplingFix capture of
wp.func
andwp.constant
types from inside Python closuresFix for CUDA on WSL
Fix for matrices in structs
Fix for transpose indexing for some non-square matrices
Enable Python faulthandler by default
Update to VS2019
Breaking Changes#
wp.constant
variables can now be treated as their true type, accessing the underlying value throughconstant.val
is no longer supportedwp.sim.model.ground_plane
is now awp.array
to support gradient, users should callbuilder.set_ground_plane()
to create the groundwp.sim
capsule, cones, and cylinders are now aligned with the default USD up-axis
0.7.2 - 2023-02-15#
Reduce test time for vec/math types
Clean-up CUDA disabled build pipeline
Remove extension.gen.toml to make Kit packages Python version independent
Handle additional cases for array indexing inside Python
0.7.1 - 2023-02-14#
Disabling some slow tests for Kit
Make unit tests run on first GPU only by default
0.7.0 - 2023-02-13#
Add support for arbitrary length / type vector and matrices e.g.:
wp.vec(length=7, dtype=wp.float16)
, seewp.vec()
, andwp.mat()
Add support for
array.flatten()
,array.reshape()
, andarray.view()
with NumPy semanticsAdd support for slicing
wp.array
types in PythonAdd
wp.from_ptr()
helper to construct arrays from an existing allocationAdd support for
break
statements in ranged-for and while loops (backward pass support currently not implemented)Add built-in mathematic constants, see
wp.pi
,wp.e
,wp.log2e
, etc.Add built-in conversion between degrees and radians, see
wp.degrees()
,wp.radians()
Add security pop-up for Kernel Node
Improve error handling for kernel return values
0.6.3 - 2023-01-31#
Add DLPack utilities, see
wp.from_dlpack()
,wp.to_dlpack()
Add Jax utilities, see
wp.from_jax()
,wp.to_jax()
,wp.device_from_jax()
,wp.device_to_jax()
Fix for Linux Kit extensions OM-80132, OM-80133
0.6.2 - 2023-01-19#
Updated
wp.from_torch()
to support more data typesUpdated
wp.from_torch()
to automatically determine the target Warp data type if not specifiedUpdated
wp.from_torch()
to support non-contiguous tensors with arbitrary stridesAdd CUTLASS integration for dense GEMMs, see
wp.matmul()
andwp.matmul_batched()
Add QR and Eigen decompositions for
mat33
types, seewp.qr3()
, andwp.eig3()
Add default (zero) constructors for matrix types
Add a flag to suppress all output except errors and warnings (set
wp.config.quiet = True
)Skip recompilation when Kernel Node attributes are edited
Allow optional attributes for Kernel Node
Allow disabling backward pass code-gen on a per-kernel basis, use
@wp.kernel(enable_backward=False)
Replace Python
imp
package withimportlib
Fix for quaternion slerp gradients (
wp.quat_slerp()
)
0.6.1 - 2022-12-05#
Fix for non-CUDA builds
Fix strides computation in array_t constructor, fixes a bug with accessing mesh indices through mesh.indices[]
Disable backward pass code generation for kernel node (4-6x faster compilation)
Switch to linbuild for universal Linux binaries (affects TeamCity builds only)
0.6.0 - 2022-11-28#
Add support for CUDA streams, see
wp.Stream
,wp.get_stream()
,wp.set_stream()
,wp.synchronize_stream()
,wp.ScopedStream
Add support for CUDA events, see
wp.Event
,wp.record_event()
,wp.wait_event()
,wp.wait_stream()
,wp.Stream.record_event()
,wp.Stream.wait_event()
,wp.Stream.wait_stream()
Add support for PyTorch stream interop, see
wp.stream_from_torch()
,wp.stream_to_torch()
Add support for allocating host arrays in pinned memory for asynchronous data transfers, use
wp.array(..., pinned=True)
(default is non-pinned)Add support for direct conversions between all scalar types, e.g.:
x = wp.uint8(wp.float64(3.0))
Add per-module option to enable fast math, use
wp.set_module_options({"fast_math": True})
, fast math is now disabled by defaultAdd support for generating CUBIN kernels instead of PTX on systems with older drivers
Add user preference options for CUDA kernel output (“ptx” or “cubin”, e.g.:
wp.config.cuda_output = "ptx"
or per-modulewp.set_module_options({"cuda_output": "ptx"})
)Add kernel node for OmniGraph
Add
wp.quat_slerp()
,wp.quat_to_axis_angle()
,wp.rotate_rodriquez()
and adjoints for all remaining quaternion operationsAdd support for unrolling for-loops when range is a
wp.constant
Add support for arithmetic operators on built-in vector / matrix types outside of
wp.kernel
Add support for multiple solution variables in
wp.optim
Adam optimizationAdd nested attribute support for
wp.struct
attributesAdd missing adjoint implementations for spatial math types, and document all functions with missing adjoints
Add support for retrieving NanoVDB tiles and voxel size, see
wp.Volume.get_tiles()
, andwp.Volume.get_voxel_size()
Add support for store operations on integer NanoVDB volumes, see
wp.volume_store_i()
Expose
wp.Mesh
points, indices, as arrays inside kernels, seewp.mesh_get()
Optimizations for
wp.array
construction, 2-3x faster on averageOptimizations for URDF import
Fix various deployment issues by statically linking with all CUDA libs
Update warp.so/warp.dll to CUDA Toolkit 11.5
0.5.1 - 2022-11-01#
Fix for unit tests in Kit
0.5.0 - 2022-10-31#
Add smoothed particle hydrodynamics (SPH) example, see
example_sph.py
Add support for accessing
array.shape
inside kernels, e.g.:width = arr.shape[0]
Add dependency tracking to hot-reload modules if dependencies were modified
Add lazy acquisition of CUDA kernel contexts (save ~300Mb of GPU memory in MGPU environments)
Add BVH object, see
wp.Bvh
andbvh_query_ray()
,bvh_query_aabb()
functionsAdd component index operations for
spatial_vector
,spatial_matrix
typesAdd
wp.lerp()
andwp.smoothstep()
builtinsAdd
wp.optim
module with implementation of the Adam optimizer for float and vector typesAdd support for transient Python modules (fix for Houdini integration)
Add
wp.length_sq()
,wp.trace()
for vector / matrix types respectivelyAdd missing adjoints for
wp.quat_rpy()
,wp.determinant()
Add
wp.atomic_min()
,wp.atomic_max()
operatorsAdd vectorized version of
wp.sim.model.add_cloth_mesh()
Add NVDB volume allocation API, see
wp.Volume.allocate()
, andwp.Volume.allocate_by_tiles()
Add NVDB volume write methods, see
wp.volume_store_i()
,wp.volume_store_f()
,wp.volume_store_v()
Add MGPU documentation
Add example showing how to compute Jacobian of multiple environments in parallel, see
example_jacobian_ik.py
Add
wp.Tape.zero()
support forwp.struct
typesMake SampleBrowser an optional dependency for Kit extension
Make
wp.Mesh
object accept both 1d and 2d arrays of face vertex indicesFix for reloading of class member kernel / function definitions using
importlib.reload()
Fix for hashing of
wp.constants()
not invalidating kernelsFix for reload when multiple
.ptx
versions are presentImproved error reporting during code-gen
0.4.3 - 2022-09-20#
Update all samples to use GPU interop path by default
Fix for arrays > 2GB in length
Add support for per-vertex USD mesh colors with
wp.render
class
0.4.2 - 2022-09-07#
Register Warp samples to the sample browser in Kit
Add NDEBUG flag to release mode kernel builds
Fix for particle solver node when using a large number of particles
Fix for broken cameras in Warp sample scenes
0.4.1 - 2022-08-30#
Add geometry sampling methods, see
wp.sample_unit_cube()
,wp.sample_unit_disk()
, etcAdd
wp.lower_bound()
for searching sorted arraysAdd an option for disabling code-gen of backward pass to improve compilation times, see
wp.set_module_options({"enable_backward": False})
, True by defaultFix for using Warp from Script Editor or when module does not have a
__file__
attributeFix for hot reload of modules containing
wp.func()
definitionsFix for debug flags not being set correctly on CUDA when
wp.config.mode == "debug"
, this enables bounds checking on CUDA kernels in debug modeFix for code gen of functions that do not return a value
0.4.0 - 2022-08-09#
Fix for FP16 conversions on GPUs without hardware support
Fix for
runtime = None
errors when reloading the Warp moduleFix for PTX architecture version when running with older drivers, see
wp.config.ptx_target_arch
Fix for USD imports from
__init__.py
, defer them to individual functions that need themFix for robustness issues with sign determination for
wp.mesh_query_point()
Fix for
wp.HashGrid
memory leak when creating/destroying gridsAdd CUDA version checks for toolkit and driver
Add support for cross-module
@wp.struct
referencesSupport running even if CUDA initialization failed, use
wp.is_cuda_available()
to check availabilityStatically linking with the CUDA runtime library to avoid deployment issues
Breaking Changes#
Removed
wp.runtime
reference from the top-level module, as it should be considered private
0.3.2 - 2022-07-19#
Remove Torch import from
__init__.py
, defer import towp.from_torch()
,wp.to_torch()
0.3.1 - 2022-07-12#
Fix for marching cubes reallocation after initialization
Add support for closest point between line segment tests, see
wp.closest_point_edge_edge()
builtinAdd support for per-triangle elasticity coefficients in simulation, see
wp.sim.ModelBuilder.add_cloth_mesh()
Add support for specifying default device, see
wp.set_device()
,wp.get_device()
,wp.ScopedDevice
Add support for multiple GPUs (e.g.,
"cuda:0"
,"cuda:1"
), seewp.get_cuda_devices()
,wp.get_cuda_device_count()
,wp.get_cuda_device()
Add support for explicitly targeting the current CUDA context using device alias
"cuda"
Add support for using arbitrary external CUDA contexts, see
wp.map_cuda_device()
,wp.unmap_cuda_device()
Add PyTorch device aliasing functions, see
wp.device_from_torch()
,wp.device_to_torch()
Breaking Changes#
A CUDA device is used by default, if available (aligned with
wp.get_preferred_device()
)wp.ScopedCudaGuard
is deprecated, usewp.ScopedDevice
insteadwp.synchronize()
now synchronizes all devices; for finer-grained control, usewp.synchronize_device()
Device alias
"cuda"
now refers to the current CUDA context, rather than a specific device like"cuda:0"
or"cuda:1"
0.3.0 - 2022-07-08#
Add support for FP16 storage type, see
wp.float16
Add support for per-dimension byte strides, see
wp.array.strides
Add support for passing Python classes as kernel arguments, see
@wp.struct
decoratorAdd additional bounds checks for builtin matrix types
Add additional floating point checks, see
wp.config.verify_fp
Add interleaved user source with generated code to aid debugging
Add generalized GPU marching cubes implementation, see
wp.MarchingCubes
classAdd additional scalar*matrix vector operators
Add support for retrieving a single row from builtin types, e.g.:
r = m33[i]
Add
wp.log2()
andwp.log10()
builtinsAdd support for quickly instancing
wp.sim.ModelBuilder
objects to improve env. creation performance for RLRemove custom CUB version and improve compatibility with CUDA 11.7
Fix to preserve external user-gradients when calling
wp.Tape.zero()
Fix to only allocate gradient of a Torch tensor if
requires_grad=True
Fix for missing
wp.mat22
constructor adjointFix for ray-cast precision in edge case on GPU (watertightness issue)
Fix for kernel hot-reload when definition changes
Fix for NVCC warnings on Linux
Fix for generated function names when kernels are defined as class functions
Fix for reload of generated CPU kernel code on Linux
Fix for example scripts to output USD at 60 timecodes per-second (better Kit compatibility)
0.2.3 - 2022-06-13#
Fix for incorrect 4d array bounds checking
Fix for
wp.constant
changes not updating module hashFix for stale CUDA kernel cache when CPU kernels launched first
Array gradients are now allocated along with the arrays and accessible as
wp.array.grad
, users should take care to always callwp.Tape.zero()
to clear gradients between different invocations ofwp.Tape.backward()
Added
wp.array.fill_()
to set all entries to a scalar value (4-byte values only currently)
Breaking Changes#
Tape
capture
option has been removed, users can now capture tapes inside existing CUDA graphs (e.g.: inside Torch)Scalar loss arrays should now explicitly set
requires_grad=True
at creation time
0.2.2 - 2022-05-30#
Fix for
from import *
inside Warp initializationFix for body space velocity when using deforming Mesh objects with scale
Fix for noise gradient discontinuities affecting
wp.curlnoise()
Fix for
wp.from_torch()
to correctly preserve shapeFix for URDF parser incorrectly passing density to scale parameter
Optimizations for startup time from 3s -> 0.3s
Add support for custom kernel cache location, Warp will now store generated binaries in the user’s application directory
Add support for cross-module function references, e.g.: call another modules @wp.func functions
Add support for overloading
@wp.func
functions based on argument typeAdd support for calling built-in functions directly from Python interpreter outside kernels (experimental)
Add support for auto-complete and docstring lookup for builtins in IDEs like VSCode, PyCharm, etc
Add support for doing partial array copies, see
wp.copy()
for detailsAdd support for accessing mesh data directly in kernels, see
wp.mesh_get_point()
,wp.mesh_get_index()
,wp.mesh_eval_face_normal()
Change to only compile for targets where kernel is launched (e.g.: will not compile CPU unless explicitly requested)
Breaking Changes#
Builtin methods such as
wp.quat_identity()
now call the Warp native implementation directly and will return awp.quat
object instead of NumPy arrayNumPy implementations of many builtin methods have been moved to
wp.utils
and will be deprecatedLocal
@wp.func
functions should not be namespaced when called, e.g.: previouslywp.myfunc()
would work even ifmyfunc()
was not a builtinRemoved
wp.rpy2quat()
, please usewp.quat_rpy()
instead
0.2.1 - 2022-05-11#
Fix for unit tests in Kit
0.2.0 - 2022-05-02#
Warp Core#
Fix for unrolling loops with negative bounds
Fix for unresolved symbol
hash_grid_build_device()
not found when lib is compiled without CUDA supportFix for failure to load nvrtc-builtins64_113.dll when user has a newer CUDA toolkit installed on their machine
Fix for conversion of Torch tensors to
wp.array
with a vector dtype (incorrect row count)Fix for
warp.dll
not found on some Windows installationsFix for macOS builds on Clang 13.x
Fix for step-through debugging of kernels on Linux
Add argument type checking for user defined
@wp.func
functionsAdd support for custom iterable types, supports ranges, hash grid, and mesh query objects
Add support for multi-dimensional arrays, for example use
x = array[i,j,k]
syntax to address a 3-dimensional arrayAdd support for multi-dimensional kernel launches, use
launch(kernel, dim=(i,j,k), ...
andi,j,k = wp.tid()
to obtain thread indicesAdd support for bounds-checking array memory accesses in debug mode, use
wp.config.mode = "debug"
to enableAdd support for differentiating through dynamic and nested for-loops
Add support for evaluating MLP neural network layers inside kernels with custom activation functions, see
wp.mlp()
Add additional NVDB sampling methods and adjoints, see
wp.volume_sample_i()
,wp.volume_sample_f()
, andwp.volume_sample_vec()
Add support for loading zlib compressed NVDB volumes, see
wp.Volume.load_from_nvdb()
Add support for triangle intersection testing, see
wp.intersect_tri_tri()
Add support for NVTX profile zones in
wp.ScopedTimer()
Add support for additional transform and quaternion math operations, see
wp.inverse()
,wp.quat_to_matrix()
,wp.quat_from_matrix()
Add fast math (
--fast-math
) to kernel compilation by defaultAdd
wp.torch
import by default (if PyTorch is installed)
Warp Kit#
Add Kit menu for browsing Warp documentation and example scenes under ‘Window->Warp’
Fix for OgnParticleSolver.py example when collider is coming from Read Prim into Bundle node
Warp Sim#
Fix for joint attachment forces
Fix for URDF importer and floating base support
Add examples showing how to use differentiable forward kinematics to solve inverse kinematics
Add examples for URDF cartpole and quadruped simulation
Breaking Changes#
wp.volume_sample_world()
is now replaced bywp.volume_sample_f/i/vec()
which operate in index (local) space. Users should usewp.volume_world_to_index()
to transform points from world space to index space before sampling.wp.mlp()
expects multi-dimensional arrays instead of one-dimensional arrays for inference, all other semantics remain the same as earlier versions of this API.wp.array.length
member has been removed, please usewp.array.shape
to access array dimensions, or usewp.array.size
to get total element countMarking
dense_gemm()
,dense_chol()
, etc methods as experimental until we revisit them
0.1.25 - 2022-03-20#
Add support for class methods to be Warp kernels
Add HashGrid reserve() so it can be used with CUDA graphs
Add support for CUDA graph capture of tape forward/backward passes
Add support for Python 3.8.x and 3.9.x
Add hyperbolic trigonometric functions, see
wp.tanh()
,wp.sinh()
,wp.cosh()
Add support for floored division on integer types
Move tests into core library so they can be run in Kit environment
0.1.24 - 2022-03-03#
Warp Core#
Add NanoVDB support, see
wp.volume_sample*()
methodsAdd support for reading compile-time constants in kernels, see
wp.constant()
Add support for cuda_array_interface protocol for zero-copy interop with PyTorch, see
wp.torch.to_torch()
Add support for additional numeric types, i8, u8, i16, u16, etc
Add better checks for device strings during allocation / launch
Add support for sampling random numbers with a normal distribution, see
wp.randn()
Upgrade to CUDA 11.3
Update example scenes to Kit 103.1
Deduce array dtype from np.array when one is not provided
Fix for ranged for loops with negative step sizes
Fix for 3d and 4d spherical gradient distributions
0.1.23 - 2022-02-17#
Warp Core#
Fix for generated code folder being removed during Showroom installation
Fix for macOS support
Fix for dynamic for-loop code gen edge case
Add procedural noise primitives, see
wp.noise()
,wp.pnoise()
,wp.curlnoise()
Move simulation helpers our of test into
wp.sim
module
0.1.22 - 2022-02-14#
Warp Core#
Fix for .so reloading on Linux
Fix for while loop code-gen in some edge cases
Add rounding functions
wp.round()
,wp.rint()
,wp.trunc()
,wp.floor()
,wp.ceil()
Add support for printing strings and formatted strings from kernels
Add MSVC compiler version detection and require minimum
Warp Sim#
Add support for universal and compound joint types
0.1.21 - 2022-01-19#
Warp Core#
Fix for exception on shutdown in empty
wp.array
objectsFix for hot reload of CPU kernels in Kit
Add hash grid primitive for point-based spatial queries, see
wp.hash_grid_query()
,wp.hash_grid_query_next()
Add new PRNG methods using PCG-based generators, see
wp.rand_init()
,wp.randf()
,wp.randi()
Add support for AABB mesh queries, see
wp.mesh_query_aabb()
,wp.mesh_query_aabb_next()
Add support for all Python
range()
loop variantsAdd builtin vec2 type and additional math operators,
wp.pow()
,wp.tan()
,wp.atan()
,wp.atan2()
Remove dependency on CUDA driver library at build time
Remove unused NVRTC binary dependencies (50mb smaller Linux distribution)
Warp Sim#
Bundle import of multiple shapes for simulation nodes
New OgnParticleVolume node for sampling shapes -> particles
New OgnParticleSolver node for DEM style granular materials
0.1.20 - 2021-11-02#
Updates to the ripple solver for GTC (support for multiple colliders, buoyancy, etc)
0.1.19 - 2021-10-15#
Publish from 2021.3 to avoid omni.graph database incompatibilities
0.1.18 - 2021-10-08#
Enable Linux support (tested on 20.04)
0.1.17 - 2021-09-30#
Fix for 3x3 SVD adjoint
Fix for A6000 GPU (bump compute model to sm_52 minimum)
Fix for .dll unload on rebuild
Fix for possible array destruction warnings on shutdown
Rename spatial_transform -> transform
Documentation update
0.1.16 - 2021-09-06#
Fix for case where simple assignments (a = b) incorrectly generated reference rather than value copy
Handle passing zero-length (empty) arrays to kernels
0.1.15 - 2021-09-03#
Add additional math library functions (asin, etc)
Add builtin 3x3 SVD support
Add support for named constants (True, False, None)
Add support for if/else statements (differentiable)
Add custom memset kernel to avoid CPU overhead of cudaMemset()
Add rigid body joint model to
wp.sim
(based on Brax)Add Linux, MacOS support in core library
Fix for incorrectly treating pure assignment as reference instead of value copy
Removes the need to transfer array to CPU before numpy conversion (will be done implicitly)
Update the example OgnRipple wave equation solver to use bundles
0.1.14 - 2021-08-09#
Fix for out-of-bounds memory access in CUDA BVH
Better error checking after kernel launches (use
wp.config.verify_cuda=True
)Fix for vec3 normalize adjoint code
0.1.13 - 2021-07-29#
Remove OgnShrinkWrap.py test node
0.1.12 - 2021-07-29#
Switch to Woop et al.’s watertight ray-tri intersection test
Disable –fast-math in CUDA compilation step for improved precision
0.1.11 - 2021-07-28#
Fix for
wp.mesh_query_ray()
returning incorrect t-value
0.1.10 - 2021-07-28#
Fix for OV extension fwatcher filters to avoid hot-reload loop due to OGN regeneration
0.1.9 - 2021-07-21#
Fix for loading sibling DLL paths
Better type checking for built-in function arguments
Added runtime docs, can now list all builtins using
wp.print_builtins()
0.1.8 - 2021-07-14#
Fix for hot-reload of CUDA kernels
Add Tape object for replaying differentiable kernels
Add helpers for Torch interop (convert
torch.Tensor
towp.Array
)
0.1.7 - 2021-07-05#
Switch to NVRTC for CUDA runtime
Allow running without host compiler
Disable asserts in kernel release mode (small perf. improvement)
0.1.6 - 2021-06-14#
Look for CUDA toolchain in target-deps
0.1.5 - 2021-06-14#
Rename OgLang -> Warp
Improve CUDA environment error checking
Clean-up some logging, add verbose mode (
wp.config.verbose
)
0.1.4 - 2021-06-10#
Add support for mesh raycast
0.1.3 - 2021-06-09#
Add support for unary negation operator
Add support for mutating variables during dynamic loops (non-differentiable)
Add support for in-place operators
Improve kernel cache start up times (avoids adjointing before cache check)
Update README.md with requirements / examples
0.1.2 - 2021-06-03#
Add support for querying mesh velocities
Add CUDA graph support, see
wp.capture_begin()
,wp.capture_end()
,wp.capture_launch()
Add explicit initialization phase,
wp.init()
Add variational Euler solver (sim)
Add contact caching, switch to nonlinear friction model (sim)
Fix for Linux/macOS support
0.1.1 - 2021-05-18#
Fix bug with conflicting CUDA contexts
0.1.0 - 2021-05-17#
Initial publish for alpha testing