warp.sparse#
Warp includes a sparse linear algebra module warp.sparse that implements common sparse matrix operations for simulation. The module provides efficient implementations of Block Sparse Row (BSR) matrices, with Compressed Sparse Row (CSR) matrices supported as a special case with 1x1 block size.
Working with Sparse Matrices#
Creating Sparse Matrices#
The most common way to create a sparse matrix is using coordinate (COO) format triplets:
import warp as wp
from warp.sparse import bsr_from_triplets
# Create a 4x4 sparse matrix with 2x2 blocks
rows = wp.array([0, 1, 2], dtype=int)  # Row indices
cols = wp.array([1, 2, 3], dtype=int)  # Column indices
vals = wp.array([...], dtype=float)    # Block values
# Create BSR matrix
A = bsr_from_triplets(
    rows_of_blocks=2,      # Number of rows of blocks
    cols_of_blocks=2,      # Number of columns of blocks
    rows=rows,            # Row indices
    columns=cols,         # Column indices
    values=vals           # Block values
)
You can also create special matrices:
from warp.sparse import bsr_zeros, bsr_identity, bsr_diag
# Create empty matrix
A = bsr_zeros(rows_of_blocks=4, cols_of_blocks=4, block_type=wp.float32)
# Create identity matrix
I = bsr_identity(rows_of_blocks=4, block_type=wp.float32)
# Create diagonal matrix
D = bsr_diag(diag=wp.array([1.0, 2.0, 3.0, 4.0]))
Block Sizes#
BSR matrices support different block sizes. For example:
# 1x1 blocks (CSR format)
A = bsr_zeros(4, 4, block_type=wp.float32)
# 3x3 blocks
A = bsr_zeros(2, 2, block_type=wp.mat33)
# Rectangular block sizes
A = bsr_zeros(2, 3, block_type=wp.mat23)
The module also provides functions to convert between different block sizes and scalar types:
from warp.sparse import bsr_copy, bsr_assign
# Convert block size from 2x2 to 1x1 (BSR to CSR)
A_csr = bsr_copy(A, block_shape=(1, 1))
# Convert block size from 1x1 to 2x2 (CSR to BSR)
A_bsr = bsr_copy(A, block_shape=(2, 2))
# Convert scalar type from float32 to float64
A_double = bsr_copy(A, scalar_type=wp.float64)
# Convert both block size and scalar type
A_new = bsr_copy(A, block_shape=(2, 2), scalar_type=wp.float64)
# In-place conversion using bsr_assign
B = bsr_zeros(rows_of_blocks=4, cols_of_blocks=4, block_type=wp.mat22)
bsr_assign(src=A, dest=B)  # Converts A to 2x2 blocks and stores in B
Note
When converting block sizes, the source and destination block dimensions must be compatible:
The new block dimensions must either be multiples or exact dividers of the original dimensions
The total matrix dimensions must be evenly divisible by the new block size
For example:
# Valid conversions:
A = bsr_zeros(4, 4, block_type=wp.mat22)  # 8x8 matrix with 2x2 blocks
B = bsr_copy(A, block_shape=(1, 1))       # 8x8 matrix with 1x1 blocks
C = bsr_copy(A, block_shape=(4, 4))       # 8x8 matrix with 4x4 blocks
# Invalid conversion (will raise ValueError):
D = bsr_copy(A, block_shape=(3, 3))       # 3x3 blocks don't divide 8x8 evenly
Non-Zero Block Count#
The number of non-zero blocks in a BSR matrix is computed on the device and not automatically synchronized to the host to avoid performance overhead and allow graph capture. The nnz attribute of a BSR matrix is always an upper bound for the number of non-zero blocks, but the actual count is stored on the device at offsets[nrow].
To get the exact count on host, you can explicitly synchronize using the nnz_sync() method:
# The nnz attribute is an upper bound
upper_bound = A.nnz
# Get the exact number of non-zero blocks
exact_nnz = A.nnz_sync()
Note
The nnz_sync() method ensures that any ongoing transfer of the exact nnz number from the device offsets array to the host has completed and updates the nnz upper bound. This synchronization is only necessary when you need the exact count - for most operations, the upper bound is sufficient.
If the number of non-zeros has been changed from outside of the warp.sparse builtin functions, for instance by direct modifications to the offsets array, use the notify_nnz_changed() method to ensure consistency.
Converting back to COO Format#
You can convert a BSR matrix back to coordinate (COO) format using the matrix’s properties:
# Get row indices from compressed format
nnz = A.nnz_sync()
rows = A.uncompress_rows()[:nnz]  # Returns array of row indices for each block
# Get column indices and values
cols = A.columns[:nnz]            # Column indices for each block
vals = A.values[:nnz]             # Block values
# Now you have the COO format:
# - rows[i] is the row index of block i
# - cols[i] is the column index of block i
# - vals[i] is the value of block i
Matrix Operations#
The module supports common matrix operations with overloaded operators:
# Matrix addition
C = A + B
# Matrix subtraction
C = A - B
# Scalar multiplication
C = 2.0 * A
# Matrix multiplication
C = A @ B
# Matrix-vector multiplication
y = A @ x
# Transpose
AT = A.transpose()
# In-place operations
A += B
A -= B
A *= 2.0
A @= B
For more control about memory allocations, you may use the underlying lower-level functions:
from warp.sparse import bsr_mm, bsr_mv, bsr_axpy, bsr_scale
# Matrix-matrix multiplication
# C = alpha * A @ B + beta * C
bsr_mm(
    A, B, C,           # Input and output matrices
    alpha=1.0,         # Scale factor for A @ B
    beta=0.0,          # Scale factor for C
)
# Matrix-vector multiplication
# y = alpha * A @ x + beta * y
bsr_mv(
    A, x, y,           # Input matrix, vector, and output vector
    alpha=1.0,         # Scale factor for A @ x
    beta=0.0,          # Scale factor for y
)
# Matrix addition
# C = alpha * A + beta * B
bsr_axpy(
    A, B, C,           # Input and output matrices
    alpha=1.0,         # Scale factor for A
    beta=1.0,          # Scale factor for B
)
# Matrix scaling
# A = alpha * A
bsr_scale(
    A,                 # Input/output matrix
    alpha,             # Scale factor
)
Kernel-level utilities#
The following Warp functions are available for use in kernels:
- warp.sparse.bsr_row_index(offsets, row_count, block_index)#
 Returns the index of the row containing a given block, or -1 if no such row exists.
- warp.sparse.bsr_block_index(row, col, bsr_offsets, bsr_columns)#
 Returns the index of the block at block-coordinates (row, col), or -1 if no such block exists. Assumes that the segments of
bsr_columnscorresponding to each row are sorted.- Parameters:
 row (int) – Row of the block.
col (int) – Column of the block.
bsr_offsets (array(ndim=1, dtype=int32)) – Array of size at least
1 + rowcontaining the offsets of the blocks in each row.bsr_columns (array(ndim=1, dtype=int32)) – Array of size at least equal to
bsr_offsets[row + 1]containing the column indices of the blocks.
- Return type:
 
Reference#
- class warp.sparse.BsrMatrix[source]#
 Untyped base class for BSR and CSR matrices.
Should not be constructed directly but through functions such as
bsr_zeros().- nnz#
 Upper bound for the number of non-zero blocks, used for dimensioning launches. The exact number is at
offsets[nrow-1]. See alsonnz_sync().- Type:
 
- offsets#
 Array of size at least
1 + nrowsuch that the start and end indices of the blocks of rowrareoffsets[r]andoffsets[r+1], respectively.
- property block_size: int[source]#
 Size of the individual blocks, i.e. number of rows per block times number of columns per block.
- copy_nnz_async()[source]#
 Starts the asynchronous transfer of the exact nnz from the device offsets array to host and records an event for completion.
Deprecated; prefer
notify_nnz_changed()instead, which will make sure to resize arrays if necessary.- Return type:
 None
- property device: Device[source]#
 Device on which
offsets,columns, andvaluesare allocated – assumed to be the same for all three arrays.
- nnz_sync()[source]#
 Synchronize the number of non-zeros from the device offsets array to the host.
Ensures that any ongoing transfer of the exact nnz number from the device offsets array to the host has completed, or, if none has been scheduled yet, starts a new transfer and waits for it to complete.
Then updates the host-side nnz upper bound to match the exact one, and returns it.
See also
notify_nnz_async().- Return type:
 
- notify_nnz_changed(nnz=None)[source]#
 Notify the matrix that the number of non-zeros has been changed from outside of the
warp.sparsebuiltin functions.Should be called in particular when the offsets array has been modified, or when the nnz upper bound has changed. Makes sure that the matrix is properly resized and starts the asynchronous transfer of the actual non-zero count.
- Parameters:
 nnz (int | None) – The new upper-bound for the number of non-zeros. If not provided, it will be read from the device offsets array (requires a synchronization).
- Return type:
 None
- property requires_grad: bool[source]#
 Read-only property indicating whether the matrix participates in adjoint computations.
- property scalar_type: Scalar[source]#
 Scalar type for individual block coefficients. For CSR matrices, this is the same as the block type.
- warp.sparse.bsr_assign(dest, src, structure_only=False, masked=False)[source]#
 Copy the content of the
srcBSR matrix todest.- Parameters:
 src (BsrMatrix[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]] | _BsrExpression[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]]) – Matrix to be copied.
dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Destination matrix. May have a different block shape or scalar type than
src, in which case the required casting will be performed.structure_only (bool) – If
True, only the non-zero indices are copied, and uninitialized value storage is allocated to accommodate at leastsrc.nnzblocks. Ifstructure_onlyisFalse, values are also copied with implicit casting if the two matrices use distinct scalar types.masked (bool) – If
True, keep the non-zero topology ofdestunchanged.
- warp.sparse.bsr_axpy(
 - x,
 - y=None,
 - alpha=1.0,
 - beta=1.0,
 - masked=False,
 - work_arrays=None,
 Perform the sparse matrix addition
y := alpha * X + beta * yon BSR matricesxandyand returny.The
xandymatrices are allowed to alias.- Parameters:
 x (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Read-only first operand.
y (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable second operand and output matrix. If
yis not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for
x.beta (Scalar) – Uniform scaling factor for
y.masked (bool) – If
True, keep the non-zero topology ofyunchanged.work_arrays (bsr_axpy_work_arrays | None) – In most cases, this function will require the use of temporary storage. This storage can be reused across calls by passing an instance of
bsr_axpy_work_arraysinwork_arrays.
- Return type:
 BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_copy(A, scalar_type=None, block_shape=None, structure_only=False)[source]#
 Return a copy of matrix
A, possibly changing its scalar type.- Parameters:
 A (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Matrix to be copied.
scalar_type (Scalar | None) – If provided, the returned matrix will use this scalar type instead of the one from
A.block_shape (tuple[int, int] | None) – If provided, the returned matrix will use blocks of this shape instead of the one from
A. Both dimensions ofblock_shapemust be either a multiple or an exact divider of the ones fromA.structure_only (bool) – If
True, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at leastsrc.nnzblocks. Ifstructure_onlyisFalse, values are also copied with implicit casting if the two matrices use distinct scalar types.
- warp.sparse.bsr_diag(
 - diag=None,
 - rows_of_blocks=None,
 - cols_of_blocks=None,
 - block_type=None,
 - device=None,
 Create and return a block-diagonal BSR matrix from an given block value or array of block values.
- Parameters:
 diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) –
Specifies the values for diagonal blocks. Can be one of:
A Warp array of type
A.values.dtype: Each element defines one block of the diagonalA constant value of type
A.values.dtype: This value is assigned to all diagonal blocks
rows_of_blocks (int | None) – If not
None, the new number of rows of blockscols_of_blocks (int | None) – If not
None, the new number of columns of blocksblock_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | None) – If
diagisNone, block type of the matrix. Otherwise deduced fromdiagdevice – If
diagis not a Warp array, device on which to allocate the matrix. Otherwise deduced fromdiag
- Return type:
 BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
The shape of the matrix will be defined one of the following, in this order:
rows_of_blocksandcols_of_blocks, if provided. If only one is given, the second is assumed equal.The first dimension of
diagifdiagis an array.
- warp.sparse.bsr_from_triplets(
 - rows_of_blocks,
 - cols_of_blocks,
 - rows,
 - columns,
 - values,
 - prune_numerical_zeros=True,
 Constructs a BSR matrix with values defined by coordinate-oriented (COO) triplets.
The first dimension of the three input arrays must match and indicates the number of COO triplets.
- Parameters:
 rows_of_blocks (int) – Number of rows of blocks.
cols_of_blocks (int) – Number of columns of blocks.
rows (Array[int]) – Row index for each non-zero.
columns (Array[int]) – Columns index for each non-zero.
values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the
destmatrix’s block type, or a 3d array with data type equal to thedestmatrix’s scalar type.prune_numerical_zeros (bool) – If
True, will ignore the zero-valued blocks.
- warp.sparse.bsr_get_diag(A, out=None)[source]#
 Return the array of blocks that constitute the diagonal of a sparse matrix.
- Parameters:
 A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – The sparse matrix from which to extract the diagonal.
out (Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – If provided, the array into which to store the diagonal blocks.
- Return type:
 Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_identity(rows_of_blocks, block_type, device=None)[source]#
 Create and return a square identity matrix.
- Parameters:
 - Return type:
 BsrMatrix[_MatrixBlockType[Rows, Rows, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_matrix_t(dtype)[source]#
 - Parameters:
 dtype (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar])
- warp.sparse.bsr_mm(
 - x,
 - y,
 - z=None,
 - alpha=1.0,
 - beta=0.0,
 - masked=False,
 - work_arrays=None,
 - reuse_topology=False,
 - tile_size=0,
 - max_new_nnz=None,
 Perform the sparse matrix-matrix multiplication
z := alpha * x @ y + beta * zon BSR matricesx,yandz, and returnz.The
x,yandzmatrices are allowed to alias. If the matrixzis not provided as input, it will be allocated and treated as zero.- This method can be graph-captured if either:
 masked=True
reuse_topology=True
max_new_nnz is provided
- Parameters:
 x (BsrMatrix[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]]) – Read-only left operand of the matrix-matrix product.
y (BsrMatrix[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only right operand of the matrix-matrix product.
z (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable affine operand and result matrix. If
zis not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for the
x @ yproductbeta (Scalar) – Uniform scaling factor for
zmasked (bool) – If
True, keep the non-zero topology ofzunchanged.work_arrays (bsr_mm_work_arrays | None) – In most cases, this function will require the use of temporary storage. This storage can be reused across calls by passing an instance of
bsr_mm_work_arraysinwork_arrays.reuse_topology (bool) – If
True, reuse the product topology information stored inwork_arraysrather than recompute it from scratch. The matricesx,yandzmust be structurally similar to the previous call in whichwork_arrayswere populated.max_new_nnz (int | None) – If provided, the maximum number of non-zeros for the matrix-matrix product result (not counting the existing non-zeros in
z).tile_size (int) – If a positive integer, use tiles of this size to compute the matrix-matrix product. If negative, disable tile-based computation. Defaults to
0, which determines whether to use tiles using using an heuristic based on the matrix shape and number of non-zeros..
- Return type:
 BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- class warp.sparse.bsr_mm_work_arrays[source]#
 Opaque structure for persisting
bsr_mm()temporary work buffers across calls.
- warp.sparse.bsr_mv(
 - A,
 - x,
 - y=None,
 - alpha=1.0,
 - beta=0.0,
 - transpose=False,
 - work_buffer=None,
 - tile_size=0,
 Perform the sparse matrix-vector product
y := alpha * A * x + beta * yand returny.The
xandyvectors are allowed to alias.- Parameters:
 A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only, left matrix operand of the matrix-vector product.
x (Array[Vector[Cols, Scalar] | Scalar]) – Read-only, right vector operand of the matrix-vector product.
y (Array[Vector[Rows, Scalar] | Scalar] | None) – Mutable affine operand and result vector. If
yis not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for
x. If zero,xwill not be read and may be left uninitialized.beta (Scalar) – Uniform scaling factor for
y. If zero,ywill not be read and may be left uninitialized.transpose (bool) – If
True, use the transpose of the matrixA. In this case the result is non-deterministic.work_buffer (Array[Vector[Rows, Scalar] | Scalar] | None) – Temporary storage is required if and only if
xandyare the same vector. If provided, thework_bufferarray will be used for this purpose, otherwise a temporary allocation will be performed.tile_size (int) – If a positive integer, use tiles of this size to compute the matrix-matrix product. If negative, disable tile-based computation. Defaults to
0, which determines whether to use tiles using using an heuristic based on the matrix shape and number of non-zeros..
- Return type:
 Array[Vector[Rows, Scalar] | Scalar]
- warp.sparse.bsr_scale(x, alpha)[source]#
 Perform the operation
x := alpha * xon BSR matrixxand returnx.
- warp.sparse.bsr_set_diag(A, diag, rows_of_blocks=None, cols_of_blocks=None)[source]#
 Set
Aas a block-diagonal matrix.- Parameters:
 A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – The sparse matrix to modify.
diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) –
Specifies the values for diagonal blocks. Can be one of:
A Warp array of type
A.values.dtype: Each element defines one block of the diagonalA constant value of type
A.values.dtype: This value is assigned to all diagonal blocksNone: Diagonal block values are left uninitialized
rows_of_blocks (int | None) – If not
None, the new number of rows of blocks.cols_of_blocks (int | None) – If not
None, the new number of columns of blocks.
- Return type:
 None
The shape of the matrix will be defined one of the following, in this order:
rows_of_blocksandcols_of_blocks, if provided. If only one is given, the second is assumed equal.The first dimension of
diag, ifdiagis an arrayThe current dimensions of
Aotherwise
- warp.sparse.bsr_set_from_triplets(
 - dest,
 - rows,
 - columns,
 - values=None,
 - count=None,
 - prune_numerical_zeros=True,
 - masked=False,
 Fill a BSR matrix with values defined by coordinate-oriented (COO) triplets, discarding existing blocks.
The first dimension of the three input arrays must match and indicates the number of COO triplets.
- Parameters:
 dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to populate.
rows (Array[int]) – Row index for each non-zero.
columns (Array[int]) – Columns index for each non-zero.
values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the
destmatrix’s block type, or a 3d array with data type equal to thedestmatrix’s scalar type. IfNone, the values array of the resulting matrix will be allocated but uninitialized.count (Array[int] | None) – Single-element array indicating the number of triplets. If
None, the number of triplets is determined from the shape ofrowsandcolumnsarrays.prune_numerical_zeros (bool) – If
True, will ignore the zero-valued blocks.masked (bool) – If
True, ignore blocks that are not existing non-zeros ofdest.
- warp.sparse.bsr_set_transpose(dest, src, masked=False)[source]#
 Assign the transposed matrix
srcto matrixdest.- Parameters:
 dest (BsrMatrix[_MatrixBlockType[Cols, Rows, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to populate.
src (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to transpose.
masked (bool) – If
True, keep the non-zero topology ofdestunchanged.
- warp.sparse.bsr_set_zero(bsr, rows_of_blocks=None, cols_of_blocks=None)[source]#
 Set a BSR matrix to zero, possibly changing its size.
- warp.sparse.bsr_zeros(rows_of_blocks, cols_of_blocks, block_type, device=None)[source]#
 Construct and return an empty BSR or CSR matrix with the given shape.
- Parameters:
 bsr – The BSR or CSR matrix to set to zero.
rows_of_blocks (int) – Number of rows of blocks.
cols_of_blocks (int) – Number of columns of blocks.
block_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]) – Type of individual blocks. For CSR matrices, this should be a scalar type. For BSR matrices, this should be a matrix type (e.g. from
warp.mat()).device (Device | str | None) – Device on which to allocate the matrix arrays.
- Return type:
 
Iterative Linear Solvers#
Warp provides several iterative linear solvers with optional preconditioning:
Conjugate Gradient (CG)
Conjugate Residual (CR)
Bi-Conjugate Gradient Stabilized (BiCGSTAB)
Generalized Minimal Residual (GMRES)
from warp.optim.linear import cg
# Solve Ax = b
x = cg(A, b, max_iter=100, tol=1e-6)
Note
While primarily intended for sparse matrices, these solvers also work with dense linear operators provided as 2D Warp arrays. Custom operators can be implemented using the LinearOperator interface.
- class warp.optim.linear.LinearOperator(shape, dtype, device, matvec)[source]#
 Linear operator to be used as left-hand-side of linear iterative solvers.
- Parameters:
 
The matrix-vector multiplication routine should have the following signature:
def matvec(x: wp.array, y: wp.array, z: wp.array, alpha: Scalar, beta: Scalar): '''Perform a generalized matrix-vector product. This function computes the operation z = alpha * (A @ x) + beta * y, where 'A' is the linear operator represented by this class. ''' ...
For performance reasons, by default the iterative linear solvers in this module will try to capture the calls for one or more iterations in CUDA graphs. If the matvec routine of a custom
LinearOperatorcannot be graph-captured, theuse_cuda_graph=Falseparameter should be passed to the solver function.
- warp.optim.linear.aslinearoperator(A)[source]#
 Casts the dense or sparse matrix A as a
LinearOperatorA must be of one of the following types:
two-dimensional warp.array; then A is assumed to be a dense matrix
one-dimensional warp.array; then A is assumed to be a diagonal matrix
warp.sparse.LinearOperator; no casting necessary
- Parameters:
 A (array | BsrMatrix | LinearOperator)
- Return type:
 
- warp.optim.linear.bicgstab(
 - A,
 - b,
 - x,
 - tol=None,
 - atol=None,
 - maxiter=0,
 - M=None,
 - callback=None,
 - check_every=10,
 - use_cuda_graph=True,
 - is_left_preconditioner=False,
 Computes an approximate solution to a linear system using the Biconjugate Gradient Stabilized method (BiCGSTAB).
- Parameters:
 A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size.
M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A(respA M) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance. If check_every is 0, the callback should be a Warp kernel.
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm. Setting check_every to 0 disables host-side residual checks, making the solver fully CUDA-graph capturable. If conditional CUDA graphs are supported, convergence checks are performed device-side; otherwise, the solver will always run to the maximum number of iterations.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
 - Tuple (final_iteration, residual_norm, absolute_tolerance)
 final_iteration: The number of iterations performed before convergence or reaching maxiter
residual_norm: The final residual norm ||b - Ax||
absolute_tolerance: The absolute tolerance used for convergence checking
- If check_every is 0: Tuple (final_iteration_array, residual_norm_squared_array, absolute_tolerance_squared_array)
 final_iteration_array: Device array containing the number of iterations performed
residual_norm_squared_array: Device array containing the squared residual norm ||b - Ax||²
absolute_tolerance_squared_array: Device array containing the squared absolute tolerance
- Return type:
 If check_every > 0
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b)).
- warp.optim.linear.cg(
 - A,
 - b,
 - x,
 - tol=None,
 - atol=None,
 - maxiter=0,
 - M=None,
 - callback=None,
 - check_every=10,
 - use_cuda_graph=True,
 Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Gradient algorithm.
- Parameters:
 A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M Ais close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance. If check_every is 0, the callback should be a Warp kernel.
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm. Setting check_every to 0 disables host-side residual checks, making the solver fully CUDA-graph capturable. If conditional CUDA graphs are supported, convergence checks are performed device-side; otherwise, the solver will always run to the maximum number of iterations.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
 - Tuple (final_iteration, residual_norm, absolute_tolerance)
 final_iteration: The number of iterations performed before convergence or reaching maxiter
residual_norm: The final residual norm ||b - Ax||
absolute_tolerance: The absolute tolerance used for convergence checking
- If check_every is 0: Tuple (final_iteration_array, residual_norm_squared_array, absolute_tolerance_squared_array)
 final_iteration_array: Device array containing the number of iterations performed
residual_norm_squared_array: Device array containing the squared residual norm ||b - Ax||²
absolute_tolerance_squared_array: Device array containing the squared absolute tolerance
- Return type:
 If check_every > 0
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b)).
- warp.optim.linear.cr(
 - A,
 - b,
 - x,
 - tol=None,
 - atol=None,
 - maxiter=0,
 - M=None,
 - callback=None,
 - check_every=10,
 - use_cuda_graph=True,
 Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Residual algorithm.
- Parameters:
 A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M Ais close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance. If check_every is 0, the callback should be a Warp kernel.
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm. Setting check_every to 0 disables host-side residual checks, making the solver fully CUDA-graph capturable. If conditional CUDA graphs are supported, convergence checks are performed device-side; otherwise, the solver will always run to the maximum number of iterations.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
 - Tuple (final_iteration, residual_norm, absolute_tolerance)
 final_iteration: The number of iterations performed before convergence or reaching maxiter
residual_norm: The final residual norm ||b - Ax||
absolute_tolerance: The absolute tolerance used for convergence checking
- If check_every is 0: Tuple (final_iteration_array, residual_norm_squared_array, absolute_tolerance_squared_array)
 final_iteration_array: Device array containing the number of iterations performed
residual_norm_squared_array: Device array containing the squared residual norm ||b - Ax||²
absolute_tolerance_squared_array: Device array containing the squared absolute tolerance
- Return type:
 If check_every > 0
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b)).
- warp.optim.linear.gmres(
 - A,
 - b,
 - x,
 - tol=None,
 - atol=None,
 - restart=31,
 - maxiter=0,
 - M=None,
 - callback=None,
 - check_every=31,
 - use_cuda_graph=True,
 - is_left_preconditioner=False,
 Computes an approximate solution to a linear system using the restarted Generalized Minimum Residual method (GMRES[k]).
- Parameters:
 A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
restart – The restart parameter, i.e, the k in GMRES[k]. In general, increasing this parameter reduces the number of iterations but increases memory consumption.
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always perform restart iterations at a time, and as a result may exceed the specified maximum number of iterations by
restart-1.M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A(respA M) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance. If check_every is 0, the callback should be a Warp kernel.
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm. Setting check_every to 0 disables host-side residual checks, making the solver fully CUDA-graph capturable. If conditional CUDA graphs are supported, convergence checks are performed device-side; otherwise, the solver will always run to the maximum number of iterations.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
 - Tuple (final_iteration, residual_norm, absolute_tolerance)
 final_iteration: The number of iterations performed before convergence or reaching maxiter
residual_norm: The final residual norm ||b - Ax||
absolute_tolerance: The absolute tolerance used for convergence checking
- If check_every is 0: Tuple (final_iteration_array, residual_norm_squared_array, absolute_tolerance_squared_array)
 final_iteration_array: Device array containing the number of iterations performed
residual_norm_squared_array: Device array containing the squared residual norm ||b - Ax||²
absolute_tolerance_squared_array: Device array containing the squared absolute tolerance
- Return type:
 If check_every > 0
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b)).
- warp.optim.linear.preconditioner(A, ptype='diag')[source]#
 Constructs and returns a preconditioner for an input matrix.
- Parameters:
 A (array | BsrMatrix | LinearOperator) – The matrix for which to build the preconditioner
ptype (str) –
The type of preconditioner. Currently the following values are supported:
"diag": Diagonal (a.k.a. Jacobi) preconditioner"diag_abs": Similar to Jacobi, but using the absolute value of diagonal coefficients"id": Identity (null) preconditioner
- Return type: