warp.sparse#
Warp includes a sparse linear algebra module warp.sparse
that implements some common sparse matrix operations that arise in simulation.
Sparse Matrices#
Currently warp.sparse supports Block Sparse Row (BSR) matrices, the BSR format can also be used to represent Compressed Sparse Row (CSR) matrices as a special case with a 1x1 block size.
Overloaded Python mathematical operators are supported for sparse matrix addition (+), subtraction (-), multiplication by a scalar (*) and matrix-matrix or matrix-vector multiplication (@), including in-place variants where possible.
- class warp.sparse.BsrMatrix[source]#
Untyped base class for BSR and CSR matrices.
Should not be constructed directly but through functions such as
bsr_zeros()
.- nnz#
Upper bound for the number of non-zero blocks, used for dimensioning launches. The exact number is at
offsets[nrow-1]
. See alsonnz_sync()
.- Type:
- offsets#
Array of size at least
1 + nrow
such that the start and end indices of the blocks of rowr
areoffsets[r]
andoffsets[r+1]
, respectively.
- property scalar_type: Scalar[source]#
Scalar type for individual block coefficients. For CSR matrices, this is the same as the block type.
- property block_size: int[source]#
Size of the individual blocks, i.e. number of rows per block times number of columns per block.
- property shape: Tuple[int, int][source]#
Shape of the matrix, i.e. number of rows/columns of blocks times number of rows/columns per block.
- property device: Device[source]#
Device on which
offsets
,columns
, andvalues
are allocated – assumed to be the same for all three arrays.
- uncompress_rows(out=None)[source]#
Compute the row index for each non-zero block from the compressed row offsets.
- nnz_sync()[source]#
Ensure that any ongoing transfer of the exact nnz number from the device offsets array to the host has completed and update the nnz upper bound.
See also
copy_nnz_async()
.
- copy_nnz_async(known_nnz=None)[source]#
Start the asynchronous transfer of the exact nnz from the device offsets array to host and records an event for completion.
Needs to be called whenever the offsets array has been modified from outside
warp.sparse
.See also
nnz_sync()
.- Parameters:
known_nnz (int)
- warp.sparse.bsr_matrix_t(dtype)[source]#
- Parameters:
dtype (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar])
- warp.sparse.bsr_zeros(rows_of_blocks, cols_of_blocks, block_type, device=None)[source]#
Construct and return an empty BSR or CSR matrix with the given shape.
- Parameters:
bsr – The BSR or CSR matrix to set to zero.
rows_of_blocks (int) – Number of rows of blocks.
cols_of_blocks (int) – Number of columns of blocks.
block_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]) – Type of individual blocks. For CSR matrices, this should be a scalar type. For BSR matrices, this should be a matrix type (e.g. from
warp.mat()
).device (Device | str | None) – Device on which to allocate the matrix arrays.
- Return type:
- warp.sparse.bsr_set_zero(bsr, rows_of_blocks=None, cols_of_blocks=None)[source]#
Set a BSR matrix to zero, possibly changing its size.
- warp.sparse.bsr_set_from_triplets(
- dest,
- rows,
- columns,
- values=None,
- prune_numerical_zeros=True,
- masked=False,
Fill a BSR matrix with values defined by coordinate-oriented (COO) triplets, discarding existing blocks.
The first dimension of the three input arrays must match and indicates the number of COO triplets.
- Parameters:
dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to populate.
rows (Array[int]) – Row index for each non-zero.
columns (Array[int]) – Columns index for each non-zero.
values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the
dest
matrix’s block type, or a 3d array with data type equal to thedest
matrix’s scalar type. IfNone
, the values array of the resulting matrix will be allocated but uninitialized.prune_numerical_zeros (bool) – If
True
, will ignore the zero-valued blocks.masked (bool) – If
True
, ignore blocks that are not existing non-zeros ofdest
.
- warp.sparse.bsr_from_triplets(
- rows_of_blocks,
- cols_of_blocks,
- rows,
- columns,
- values,
- prune_numerical_zeros=True,
Constructs a BSR matrix with values defined by coordinate-oriented (COO) triplets.
The first dimension of the three input arrays must match and indicates the number of COO triplets.
- Parameters:
rows_of_blocks (int) – Number of rows of blocks.
cols_of_blocks (int) – Number of columns of blocks.
rows (Array[int]) – Row index for each non-zero.
columns (Array[int]) – Columns index for each non-zero.
values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the
dest
matrix’s block type, or a 3d array with data type equal to thedest
matrix’s scalar type.prune_numerical_zeros (bool) – If
True
, will ignore the zero-valued blocks.
- warp.sparse.bsr_assign(dest, src, structure_only=False, masked=False)[source]#
Copy the content of the
src
BSR matrix todest
.- Parameters:
src (BsrMatrix[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]] | _BsrExpression[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]]) – Matrix to be copied.
dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Destination matrix. May have a different block shape or scalar type than
src
, in which case the required casting will be performed.structure_only (bool) – If
True
, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at leastsrc.nnz
blocks. Ifstructure_only
isFalse
, values are also copied with implicit casting if the two matrices use distinct scalar types.masked (bool) – If
True
, prevent the assignment operation from adding new non-zeros blocks todest
.
- warp.sparse.bsr_copy(A, scalar_type=None, block_shape=None, structure_only=False)[source]#
Return a copy of matrix
A
, possibly changing its scalar type.- Parameters:
A (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Matrix to be copied.
scalar_type (Scalar | None) – If provided, the returned matrix will use this scalar type instead of the one from
A
.block_shape (Tuple[int, int] | None) – If provided, the returned matrix will use blocks of this shape instead of the one from
A
. Both dimensions ofblock_shape
must be either a multiple or an exact divider of the ones fromA
.structure_only (bool) – If
True
, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at leastsrc.nnz
blocks. Ifstructure_only
isFalse
, values are also copied with implicit casting if the two matrices use distinct scalar types.
- warp.sparse.bsr_get_diag(A, out=None)[source]#
Return the array of blocks that constitute the diagonal of a sparse matrix.
- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – The sparse matrix from which to extract the diagonal.
out (Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – If provided, the array into which to store the diagonal blocks.
- Return type:
Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_set_diag(A, diag, rows_of_blocks=None, cols_of_blocks=None)[source]#
Set
A
as a block-diagonal matrix.- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – The sparse matrix to modify.
diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) –
Specifies the values for diagonal blocks. Can be one of:
A Warp array of type
A.values.dtype
: Each element defines one block of the diagonalA constant value of type
A.values.dtype
: This value is assigned to all diagonal blocksNone
: Diagonal block values are left uninitialized
rows_of_blocks (int | None) – If not
None
, the new number of rows of blocks.cols_of_blocks (int | None) – If not
None
, the new number of columns of blocks.
- Return type:
None
The shape of the matrix will be defined one of the following, in this order:
rows_of_blocks
andcols_of_blocks
, if provided. If only one is given, the second is assumed equal.The first dimension of
diag
, ifdiag
is an arrayThe current dimensions of
A
otherwise
- warp.sparse.bsr_diag(
- diag=None,
- rows_of_blocks=None,
- cols_of_blocks=None,
- block_type=None,
- device=None,
Create and return a block-diagonal BSR matrix from an given block value or array of block values.
- Parameters:
diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) –
Specifies the values for diagonal blocks. Can be one of:
A Warp array of type
A.values.dtype
: Each element defines one block of the diagonalA constant value of type
A.values.dtype
: This value is assigned to all diagonal blocks
rows_of_blocks (int | None) – If not
None
, the new number of rows of blockscols_of_blocks (int | None) – If not
None
, the new number of columns of blocksblock_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | None) – If
diag
isNone
, block type of the matrix. Otherwise deduced fromdiag
device – If
diag
is not a Warp array, device on which to allocate the matrix. Otherwise deduced fromdiag
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
The shape of the matrix will be defined one of the following, in this order:
rows_of_blocks
andcols_of_blocks
, if provided. If only one is given, the second is assumed equal.The first dimension of
diag
ifdiag
is an array.
- warp.sparse.bsr_identity(rows_of_blocks, block_type, device=None)[source]#
Create and return a square identity matrix.
- Parameters:
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Rows, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_scale(x, alpha)[source]#
Perform the operation
x := alpha * x
on BSR matrixx
and returnx
.
- class warp.sparse.bsr_axpy_work_arrays[source]#
Opaque structure for persisting
bsr_axpy()
temporary work buffers across calls.
- warp.sparse.bsr_axpy(
- x,
- y=None,
- alpha=1.0,
- beta=1.0,
- masked=False,
- work_arrays=None,
Perform the sparse matrix addition
y := alpha * X + beta * y
on BSR matricesx
andy
and returny
.The
x
andy
matrices are allowed to alias.- Parameters:
x (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Read-only right-hand-side.
y (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If
y
is not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for
x
.beta (Scalar) – Uniform scaling factor for
y
.masked (bool) – If
True
, discard all blocks fromx
which are not existing non-zeros ofy
.work_arrays (bsr_axpy_work_arrays | None) – In most cases, this function will require the use of temporary storage. This storage can be reused across calls by passing an instance of
bsr_axpy_work_arrays
inwork_arrays
.
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- class warp.sparse.bsr_mm_work_arrays[source]#
Opaque structure for persisting
bsr_mm()
temporary work buffers across calls.
- warp.sparse.bsr_mm(
- x,
- y,
- z=None,
- alpha=1.0,
- beta=0.0,
- masked=False,
- work_arrays=None,
- reuse_topology=False,
Perform the sparse matrix-matrix multiplication
z := alpha * x @ y + beta * z
on BSR matricesx
,y
andz
, and returnz
.The
x
,y
andz
matrices are allowed to alias. If the matrixz
is not provided as input, it will be allocated and treated as zero.- Parameters:
x (BsrMatrix[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]]) – Read-only left factor of the matrix-matrix product.
y (BsrMatrix[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only right factor of the matrix-matrix product.
z (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If
z
is not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for the
x @ y
productbeta (Scalar) – Uniform scaling factor for
z
masked (bool) – If
True
, ignore all blocks fromx @ y
which are not existing non-zeros ofy
work_arrays (bsr_mm_work_arrays | None) – In most cases, this function will require the use of temporary storage. This storage can be reused across calls by passing an instance of
bsr_mm_work_arrays
inwork_arrays
.reuse_topology (bool) – If
True
, reuse the product topology information stored inwork_arrays
rather than recompute it from scratch. The matricesx
,y
andz
must be structurally similar to the previous call in whichwork_arrays
were populated. This is necessary forbsr_mm
to be captured in a CUDA graph.
- Return type:
BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]
- warp.sparse.bsr_mv(
- A,
- x,
- y=None,
- alpha=1.0,
- beta=0.0,
- transpose=False,
- work_buffer=None,
Perform the sparse matrix-vector product
y := alpha * A * x + beta * y
and returny
.The
x
andy
vectors are allowed to alias.- Parameters:
A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only, left matrix factor of the matrix-vector product.
x (Array[Vector[Cols, Scalar] | Scalar]) – Read-only, right vector factor of the matrix-vector product.
y (Array[Vector[Rows, Scalar] | Scalar] | None) – Mutable left-hand-side. If
y
is not provided, it will be allocated and treated as zero.alpha (Scalar) – Uniform scaling factor for
x
. If zero,x
will not be read and may be left uninitialized.beta (Scalar) – Uniform scaling factor for
y
. If zero,y
will not be read and may be left uninitialized.transpose (bool) – If
True
, use the transpose of the matrixA
. In this case the result is non-deterministic.work_buffer (Array[Vector[Rows, Scalar] | Scalar] | None) – Temporary storage is required if and only if
x
andy
are the same vector. If provided, thework_buffer
array will be used for this purpose, otherwise a temporary allocation will be performed.
- Return type:
Array[Vector[Rows, Scalar] | Scalar]
Iterative Linear Solvers#
Warp provides a few common iterative linear solvers (cg()
, cr()
, bicgstab()
, gmres()
) with optional preconditioning.
Note
While primarily intended to work with sparse matrices, those solvers also accept dense linear operators provided as 2D Warp arrays.
It is also possible to provide custom operators, see LinearOperator
.
- class warp.optim.linear.LinearOperator(shape, dtype, device, matvec)[source]#
Linear operator to be used as left-hand-side of linear iterative solvers.
- Parameters:
The matrix-vector multiplication routine should have the following signature:
def matvec(x: wp.array, y: wp.array, z: wp.array, alpha: Scalar, beta: Scalar): '''Performs the operation z = alpha * x + beta * y''' ...
For performance reasons, by default the iterative linear solvers in this module will try to capture the calls for one or more iterations in CUDA graphs. If the matvec routine of a custom
LinearOperator
cannot be graph-captured, theuse_cuda_graph=False
parameter should be passed to the solver function.
- warp.optim.linear.aslinearoperator(A)[source]#
Casts the dense or sparse matrix A as a
LinearOperator
A must be of one of the following types:
two-dimensional warp.array; then A is assumed to be a dense matrix
one-dimensional warp.array; then A is assumed to be a diagonal matrix
warp.sparse.LinearOperator
; no casting necessary
- Parameters:
A (array | BsrMatrix | LinearOperator)
- Return type:
- warp.optim.linear.preconditioner(A, ptype='diag')[source]#
Constructs and returns a preconditioner for an input matrix.
- Parameters:
A (array | BsrMatrix | LinearOperator) – The matrix for which to build the preconditioner
ptype (str) –
The type of preconditioner. Currently the following values are supported:
"diag"
: Diagonal (a.k.a. Jacobi) preconditioner"diag_abs"
: Similar to Jacobi, but using the absolute value of diagonal coefficients"id"
: Identity (null) preconditioner
- Return type:
- warp.optim.linear.cg(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Gradient algorithm.
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M A
is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
- Return type:
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.cr(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Residual algorithm.
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.
M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that
M A
is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
- Return type:
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.bicgstab(
- A,
- b,
- x,
- tol=None,
- atol=None,
- maxiter=0,
- M=None,
- callback=None,
- check_every=10,
- use_cuda_graph=True,
- is_left_preconditioner=False,
Computes an approximate solution to a linear system using the Biconjugate Gradient Stabilized method (BiCGSTAB).
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size.
M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A
(respA M
) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.
- warp.optim.linear.gmres(
- A,
- b,
- x,
- tol=None,
- atol=None,
- restart=31,
- maxiter=0,
- M=None,
- callback=None,
- check_every=31,
- use_cuda_graph=True,
- is_left_preconditioner=False,
Computes an approximate solution to a linear system using the restarted Generalized Minimum Residual method (GMRES[k]).
- Parameters:
A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side
b (array) – the linear system’s right-hand-side
x (array) – initial guess and solution vector
tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm
atol (float | None) – absolute tolerance for the residual
restart – The restart parameter, i.e, the k in GMRES[k]. In general, increasing this parameter reduces the number of iterations but increases memory consumption.
maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always perform restart iterations at a time, and as a result may exceed the specified maximum number of iterations by
restart-1
.M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that
M A
(respA M
) is close to identity.callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance
check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.
use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.
is_left_preconditioner – whether M should be used as a left- or right- preconditioner.
- Returns:
Tuple (final iteration number, residual norm, absolute tolerance)
If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is
max(atol, tol * norm(b))
.