warp.sparse#
Warp includes a sparse linear algebra module warp.sparse
that implements some common sparse matrix operations that arise in simulation.
Sparse Matrices#
Currently warp.sparse supports Block Sparse Row (BSR) matrices, the BSR format can also be used to represent Compressed Sparse Row (CSR) matrices as a special case with a 1x1 block size.
Overloaded Python mathematical operators are supported for sparse matrix addition (+), subtraction (-), multiplication by a scalar (*) and matrix-matrix or matrix-vector multiplication (@), including in-place variants where possible.
- class warp.sparse.BsrMatrix[source]#
Untyped base class for BSR and CSR matrices.
Should not be constructed directly but through functions such as
bsr_zeros()
.- nnz#
Upper bound for the number of non-zero blocks, used for dimensioning launches; the exact number is at
offsets[nrow-1]
. See alsonnz_sync()
.- Type:
- offsets#
Array of size at least
1 + nrows
such that the start and end indices of the blocks of rowr
areoffsets[r]
andoffsets[r+1]
, respectively.
- property scalar_type: Scalar[source]#
Scalar type for individual block coefficients. For CSR matrices, this is the same as the block type
- property block_size: int[source]#
Size of the individual blocks, i.e. number of rows per block times number of columns per block
- property shape: Tuple[int, int][source]#
Shape of the matrix, i.e. number of rows/columns of blocks times number of rows/columns per block
- property device: Device[source]#
Device on which offsets, columns and values are allocated – assumed to be the same for all three arrays
- nnz_sync()[source]#
Ensures that any ongoing transfer of the exact nnz number from the device offsets array to the host has completed, and updates the nnz upper bound.
See also
copy_nnz_async()
- copy_nnz_async(known_nnz=None)[source]#
Starts the asynchronous transfer of the exact nnz from the device offsets array to host, and records an event for completion. Needs to be called whenever the offsets array has been modified from outside
warp.sparse
.See also
nnz_sync()
- Parameters:
known_nnz (int | None)
- warp.sparse.bsr_matrix_t(dtype)[source]#
- Parameters:
dtype (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar])
- warp.sparse.bsr_zeros(rows_of_blocks, cols_of_blocks, block_type, device=None)[source]#
Constructs and returns an empty BSR or CSR matrix with the given shape
- Parameters:
bsr – The BSR or CSR matrix to set to zero
rows_of_blocks (int) – Number of rows of blocks
cols_of_blocks (int) – Number of columns of blocks
block_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]) – Type of individual blocks. For CSR matrices, this should be a scalar type; for BSR matrices, this should be a matrix type (e.g. from
warp.mat()
)device (Device | str | None) – Device on which to allocate the matrix arrays
- Return type:
- warp.sparse.bsr_set_zero(bsr, rows_of_blocks=None, cols_of_blocks=None)[source]#
Sets a BSR matrix to zero, possibly changing its size
- warp.sparse.bsr_set_from_triplets(
- dest,
- rows,
- columns,
- values,
- prune_numerical_zeros=True,
Fills a BSR matrix with values defined by coordinate-oriented (COO) triplets, discarding existing blocks.
The first dimension of the three input arrays must match and indicates the number of COO triplets.
- Parameters:
dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to populate
rows (Array[int]) – Row index for each non-zero
columns (Array[int]) – Columns index for each non-zero
values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the dest matrix’s block type, or a 3d array with data type equal to the dest matrix’s scalar type.
prune_numerical_zeros (bool) – If True, will ignore the zero-valued blocks
- warp.sparse.bsr_assign(dest, src, structure_only=False)[source]#
Copies the content of the src BSR matrix to dest.
- Parameters:
src (BsrMatrix[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]] | _BsrExpression[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]]) – Matrix to be copied
dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Destination matrix. May have a different block shape of scalar type than src, in which case the required casting will be performed.
structure_only (bool) – If
True
, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at least src.nnz blocks. If structure_only isFalse
, values are also copied with implicit casting if the two matrices use distinct scalar types.
- warp.sparse.bsr_copy(A, scalar_type=None, block_shape=None, structure_only=False)[source]#
Returns a copy of matrix
A
, possibly changing its scalar type.- Parameters:
A (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Matrix to be copied
scalar_type (Scalar | None) – If provided, the returned matrix will use this scalar type instead of the one from A.
block_shape (Tuple[int, int] | None) – If provided, the returned matrix will use blocks of this shape instead of the one from A. Both dimensions of block_shape must be either a multiple or an exact divider of the ones from A.
structure_only (bool) – If
True
, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at least src.nnz blocks. If structure_only isFalse
, values are also copied with implicit casting if the two matrices use distinct scalar types.
- warp.sparse.bsr_transposed(A)[source]#
Returns a copy of the transposed matrix A
- Parameters:
A (BsrMatrix[BlockType] | _BsrExpression[BlockType])
- warp.sparse.bsr_get_diag(A, out=None)[source]#
Returns the array of blocks that constitute the diagonal of a sparse matrix.
- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – the sparse matrix from which to extract the diagonal
out (Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – if provided, the array into which to store the diagonal blocks
- Return type:
Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_set_diag(A, diag, rows_of_blocks=None, cols_of_blocks=None)[source]#
Sets A as a block-diagonal matrix
- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – the sparse matrix to modify
diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Either a warp array of type
A.values.dtype
, in which case each element will define one block of the diagonal, or a constant value of typeA.values.dtype
, in which case it will get assigned to all diagonal blocks.rows_of_blocks (int | None) – If not
None
, the new number of rows of blockscols_of_blocks (int | None) – If not
None
, the new number of columns of blocks
- The shape of the matrix will be defined one of the following, in that order:
rows_of_blocks and cols_of_blocks, if provided. If only one is given, the second is assumed equal.
the first dimension of diag, if diag is an array
the current dimensions of A otherwise
- warp.sparse.bsr_diag(diag, rows_of_blocks=None, cols_of_blocks=None)[source]#
Creates and returns a block-diagonal BSR matrix from an given block value or array of block values.
- Parameters:
diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Either a warp array of type
A.values.dtype
, in which case each element will define one block of the diagonal, or a constant value of typeA.values.dtype
, in which case it will get assigned to all diagonal blocks.rows_of_blocks (int | None) – If not
None
, the new number of rows of blockscols_of_blocks (int | None) – If not
None
, the new number of columns of blocks
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- The shape of the matrix will be defined one of the following, in that order:
rows_of_blocks and cols_of_blocks, if provided. If only one is given, the second is assumed equal.
the first dimension of diag, if diag is an array
- warp.sparse.bsr_identity(rows_of_blocks, block_type, device=None)[source]#
Creates and returns a square identity matrix.
- Parameters:
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Rows, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_scale(x, alpha)[source]#
Performs the operation
x := alpha * x
on BSR matrix x and returns x
- class warp.sparse.bsr_axpy_work_arrays[source]#
Opaque structure for persisting
bsr_axpy()
temporary work buffers across calls
- warp.sparse.bsr_axpy(x, y=None, alpha=1.0, beta=1.0, work_arrays=None)[source]#
Performs the sparse matrix addition
y := alpha * X + beta * y
on BSR matrices x and y and returns y.The x and y matrices are allowed to alias.
- Parameters:
x (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Read-only right-hand-side.
y (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If y is not provided, it will be allocated and treated as zero.
alpha (Scalar) – Uniform scaling factor for x
beta (Scalar) – Uniform scaling factor for y
work_arrays (bsr_axpy_work_arrays | None) – In most cases this function will require the use of temporary storage; this storage can be reused across calls by passing an instance of
bsr_axpy_work_arrays
in work_arrays.
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- class warp.sparse.bsr_mm_work_arrays[source]#
Opaque structure for persisting
bsr_mm()
temporary work buffers across calls
- warp.sparse.bsr_mm(
- x,
- y,
- z=None,
- alpha=1.0,
- beta=0.0,
- work_arrays=None,
- reuse_topology=False,
Performs the sparse matrix-matrix multiplication
z := alpha * x * y + beta * z
on BSR matrices x, y and z, and returns z.The x, y and z matrices are allowed to alias. If the matrix z is not provided as input, it will be allocated and treated as zero.
- Parameters:
x (BsrMatrix[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]]) – Read-only left factor of the matrix-matrix product.
y (BsrMatrix[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only right factor of the matrix-matrix product.
z (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If z is not provided, it will be allocated and treated as zero.
alpha (Scalar) – Uniform scaling factor for the
x * y
productbeta (Scalar) – Uniform scaling factor for z
work_arrays (bsr_mm_work_arrays | None) – In most cases this function will require the use of temporary storage; this storage can be reused across calls by passing an instance of
bsr_mm_work_arrays
in work_arrays.reuse_topology (bool) – If True, reuse the product topology information stored in work_arrays rather than recompute it from scratch. The matrices x, y and z must be structurally similar to the previous call in which work_arrays were populated. This is necessary for bsr_mm to be captured in a CUDA graph.
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_mv(
- A,
- x,
- y=None,
- alpha=1.0,
- beta=0.0,
- transpose=False,
- work_buffer=None,
Performs the sparse matrix-vector product
y := alpha * A * x + beta * y
and returns y.The x and y vectors are allowed to alias.
- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only, left matrix factor of the matrix-vector product.
x (Array[Vector[Cols, Scalar] | Scalar]) – Read-only, right vector factor of the matrix-vector product.
y (Array[Vector[Rows, Scalar] | Scalar] | None) – Mutable left-hand-side. If y is not provided, it will be allocated and treated as zero.
alpha (Scalar) – Uniform scaling factor for x. If zero, x will not be read and may be left uninitialized.
beta (Scalar) – Uniform scaling factor for y. If zero, y will not be read and may be left uninitialized.
transpose (bool) – If
True
, use the transpose of the matrix A. In this case the result is non-deterministic.work_buffer (Array[Vector[Rows, Scalar] | Scalar] | None) – Temporary storage is required if and only if x and y are the same vector. If provided the work_buffer array will be used for this purpose, otherwise a temporary allocation will be performed.
- Return type:
Array[Vector[Rows, Scalar] | Scalar]
Iterative Linear Solvers#
Warp provides a few common iterative linear solvers (cg()
, cr()
, bicgstab()
, gmres()
) with optional preconditioning.
Note
While primarily intended to work with sparse matrices, those solvers also accept dense linear operators provided as 2D Warp arrays.
It is also possible to provide custom operators, see LinearOperator
.
- class warp.optim.linear.LinearOperator(shape, dtype, device, matvec)[source]#
Linear operator to be used as left-hand-side of linear iterative solvers.
- Parameters:
The matrix-vector multiplication routine should have the following signature:
def matvec(x: wp.array, y: wp.array, z: wp.array, alpha: Scalar, beta: Scalar): '''Performs the operation z = alpha * x + beta * y''' ...
For performance reasons, by default the iterative linear solvers in this module will try to capture the calls for one or more iterations in CUDA graphs. If the matvec routine of a custom
LinearOperator
cannot be graph-captured, theuse_cuda_graph=False
parameter should be passed to the solver function.
- warp.optim.linear.aslinearoperator(A)[source]#
Casts the dense or sparse matrix A as a
LinearOperator
A must be of one of the following types:
two-dimensional warp.array; then A is assumed to be a dense matrix
one-dimensional warp.array; then A is assumed to be a diagonal matrix
warp.sparse.LinearOperator
; no casting necessary
- Parameters:
A (array | BsrMatrix | LinearOperator)
- Return type:
- warp.optim.linear.preconditioner(A, ptype='diag')[source]#
Constructs and returns a preconditioner for an input matrix.
- Parameters:
A (array | BsrMatrix | LinearOperator) – The matrix for which to build the preconditioner
ptype (str) –
The type of preconditioner. Currently the following values are supported:
"diag"
: Diagonal (a.k.a. Jacobi) preconditioner"diag_abs"
: Similar to Jacobi, but using the absolute value of diagonal coefficients"id"
: Identity (null) preconditioner
- Return type:
- warp.optim.linear.cg(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Gradient algorithm.
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M A
is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
- Return type:
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.cr(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Residual algorithm.
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M A
is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
- Return type:
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.bicgstab(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
- is_left_preconditioner=False,
Computes an approximate solution to a linear system using the Biconjugate Gradient Stabilized method (BiCGSTAB).
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size.
M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A
(respA M
) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.gmres(
- A,
- b,
- x,
- tol=None,
- atol=None,
- restart=31,
- maxiter=0,
- M=None,
- callback=None,
- check_every=31,
- use_cuda_graph=True,
- is_left_preconditioner=False,
Computes an approximate solution to a linear system using the restarted Generalized Minimum Residual method (GMRES[k]).
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
restart – The restart parameter, i.e, the k in GMRES[k]. In general, increasing this parameter reduces the number of iterations but increases memory consumption.
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always perform restart iterations at a time, and as a result may exceed the specified maximum number of iterations by
restart-1
.M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A
(respA M
) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.