Limitations#

This section summarizes various limitations and currently unsupported features in Warp. Problems, questions, and feature requests can be opened on GitHub Issues.

Unsupported Features#

To achieve good performance on GPUs some dynamic language features are not supported:

Lambda functions
List comprehensions
Exceptions
Recursion
Runtime evaluation of expressions, e.g.: eval()
Dynamic structures such as lists, sets, dictionaries, etc.

Kernels and User Functions#

Strings cannot be passed into kernels.
Short-circuit evaluation is not supported
wp.atomic_add() does not support wp.float16 on GPUs with compute capability below 7.0. On such devices, the function will return 0.0 without modifying the target memory.
wp.tid() cannot be called from user functions.
Modifying the value of a wp.constant() during runtime will not trigger recompilation of the affected kernels if the modules have already been loaded (e.g. through a wp.launch() or a wp.load_module()).
A wp.constant() can suffer precision loss if used with wp.float64 as it is initially assigned to a wp.float32 variable in the generated code.
Python IntFlag values behave like raw integers in Warp kernels: bitwise negation (~) produces the integer negation, not a masked combination of flags as in standard Python IntFlag behavior.

A limitation of Warp is that each dimension of the grid used to launch a kernel must be representable as a 32-bit signed integer. Therefore, no single dimension of a grid should exceed \(2^{31}-1\).

By default, Warp will try to process one element from the Warp grid in one CUDA thread. This is not always possible for kernels launched with multi-dimensional grid bounds, as there are hardware limitations on CUDA block dimensions.

Warp will automatically fall back to using grid-stride loops when it is not possible for a CUDA thread to process only one element from the Warp grid. When this happens, some CUDA threads may process more than one element from the Warp grid. Users can also set the max_blocks parameter to fine-tune the grid-striding behavior of kernels, even for kernels that are otherwise able to process one Warp-grid element per CUDA thread.

Differentiability#

Please see the Limitations and Workarounds section in the Differentiability page for auto-differentiation limitations.

Arrays#

Arrays can have a maximum of four dimensions.
Each dimension of a Warp array cannot be greater than the maximum value representable by a 32-bit signed integer, \(2^{31}-1\).
There are currently no data types that support complex numbers.

Structs#

Structs cannot have generic members, i.e. of type typing.Any.
Structs do not support inheritance. Consider using composition instead.

Volumes#

The sparse-volume topology cannot be changed after the tiles for the Volume have been allocated.

Multiple Processes#

A CUDA context created in the parent process cannot be used in a forked child process. Use the spawn start method instead, or avoid creating CUDA contexts in the parent process.
There can be issues with using same user kernel cache directory when running with multiple processes. A workaround is to use a separate cache directory for every process. See the Configuration section for how the cache directory may be changed.

Scalar Math Functions#

This section details some limitations and differences from CPython semantics for scalar math functions.

Modulus Operator#

Deviation from Python behavior can occur when the modulus operator (%) is used with a negative dividend or divisor (also see wp.mod()). The behavior of the modulus operator in a Warp kernel follows that of C++11: The sign of the result follows the sign of dividend. In Python, the sign of the result follows the sign of the divisor:

@wp.kernel
def modulus_test():
    # Kernel-scope behavior:
    a = -3 % 2 # a is -1
    b = 3 % -2 # b is 1
    c = 3 % 0  # Undefined behavior

# Python-scope behavior:
a = -3 % 2 # a is 1
b = 3 % -2 # b is -1
c = 3 % 0  # ZeroDivisionError

Power Operator#

The power operator (**) in Warp kernels only works on floating-point numbers (also see wp.pow()). In Python, the power operator can also be used on integers.

Inverse Sine and Cosine#

wp.asin() and wp.acos() automatically clamp the input to fall in the range [-1, 1]. In Python, using math.asin() or math.acos() with an input outside [-1, 1] raises a ValueError exception.

Rounding#

wp.round() rounds halfway cases away from zero, but Python’s round() rounds halfway cases to the nearest even choice (Banker’s rounding). Use wp.rint() when Banker’s rounding is desired. Unlike Python, the return type in Warp of both of these rounding functions is the same type as the input:

@wp.kernel
def halfway_rounding_test():
    # Kernel-scope behavior:
    a = wp.round(0.5) # a is 1.0
    b = wp.rint(0.5)  # b is 0.0
    c = wp.round(1.5) # c is 2.0
    d = wp.rint(1.5)  # d is 2.0

# Python-scope behavior:
a = round(0.5) # a is 0
c = round(1.5) # c is 2

Variable Scope#

When writing Warp kernels, variable scope might behave differently than in standard Python. This can sometimes lead to unexpected results.

In standard Python, variables are only accessible within the block where they are defined. Consider this example:

@wp.func
def foo(cond: bool):
    if cond:
        out = 123
    else:
        out = 234

    print(out)

This code works as expected in standard Python. Regardless of the value of cond, out is defined before being printed.

However, consider a slightly modified example:

@wp.func
def foo(cond: bool):
    if cond:
        out = 123

    print(out) # No error even when `cond` is `False`.

In standard Python, if cond is False, the call to print(out) would raise an UnboundLocalError because out is only defined inside the if block.

In Warp, the behavior is different. The call to print(out) will not raise an error, even if cond is False. Warp effectively makes out accessible outside the if block. However, if cond is False, out will be uninitialized, leading to undefined behavior.

Arrays in Structs#

Modifying flags on arrays stored in structs may not trigger an update to the underlying struct memory, e.g.:

@wp.struct
class MyStruct:
    arr: wp.array(dtype=float)

a = wp.zeros(10, dtype=float)

s = MyStruct()
s.arr = a

# modify original array
a.requires_grad = True

In this case the array stored in the struct will not have the requires_grad=True value propagated to it which could lead to gradients not being computed during backward kernel launches.