warp.sparse#

Warp includes a sparse linear algebra module warp.sparse that implements some common sparse matrix operations that arise in simulation.

Sparse Matrices#

Currently warp.sparse supports Block Sparse Row (BSR) matrices, the BSR format can also be used to represent Compressed Sparse Row (CSR) matrices as a special case with a 1x1 block size.

Overloaded Python mathematical operators are supported for sparse matrix addition (+), subtraction (-), multiplication by a scalar (*) and matrix-matrix or matrix-vector multiplication (@), including in-place variants where possible.

class warp.sparse.BsrMatrix[source]#

Untyped base class for BSR and CSR matrices.

Should not be constructed directly but through functions such as bsr_zeros().

nrow#

Number of rows of blocks

Type:

int

ncol#

Number of columns of blocks

Type:

int

nnz#

Upper bound for the number of non-zero blocks, used for dimensioning launches; the exact number is at offsets[nrow-1]. See also nnz_sync().

Type:

int

offsets#

Array of size at least 1 + nrows such that the start and end indices of the blocks of row r are offsets[r] and offsets[r+1], respectively.

Type:

Array[int]

columns#

Array of size at least equal to nnz containing block column indices

Type:

Array[int]

values#

Array of size at least equal to nnz containing block values

Type:

Array[BlockType]

property scalar_type: Scalar[source]#

Scalar type for individual block coefficients. For CSR matrices, this is the same as the block type

property block_shape: Tuple[int, int][source]#

Shape of the individual blocks

property block_size: int[source]#

Size of the individual blocks, i.e. number of rows per block times number of columns per block

property shape: Tuple[int, int][source]#

Shape of the matrix, i.e. number of rows/columns of blocks times number of rows/columns per block

property dtype: type[source]#

Data type for individual block values

property device: Device[source]#

Device on which offsets, columns and values are allocated – assumed to be the same for all three arrays

nnz_sync()[source]#

Ensures that any ongoing transfer of the exact nnz number from the device offsets array to the host has completed, and updates the nnz upper bound.

See also copy_nnz_async()

copy_nnz_async(known_nnz=None)[source]#

Starts the asynchronous transfer of the exact nnz from the device offsets array to host, and records an event for completion. Needs to be called whenever the offsets array has been modified from outside warp.sparse.

See also nnz_sync()

Parameters:

known_nnz (int | None)

transpose()[source]#

Returns a transposed copy of this matrix

warp.sparse.bsr_matrix_t(dtype)[source]#
Parameters:

dtype (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar])

warp.sparse.bsr_zeros(rows_of_blocks, cols_of_blocks, block_type, device=None)[source]#

Constructs and returns an empty BSR or CSR matrix with the given shape

Parameters:
  • bsr – The BSR or CSR matrix to set to zero

  • rows_of_blocks (int) – Number of rows of blocks

  • cols_of_blocks (int) – Number of columns of blocks

  • block_type (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]) – Type of individual blocks. For CSR matrices, this should be a scalar type; for BSR matrices, this should be a matrix type (e.g. from warp.mat())

  • device (Device | str | None) – Device on which to allocate the matrix arrays

Return type:

BsrMatrix

warp.sparse.bsr_set_zero(bsr, rows_of_blocks=None, cols_of_blocks=None)[source]#

Sets a BSR matrix to zero, possibly changing its size

Parameters:
  • bsr (BsrMatrix) – The BSR or CSR matrix to set to zero

  • rows_of_blocks (int | None) – If not None, the new number of rows of blocks

  • cols_of_blocks (int | None) – If not None, the new number of columns of blocks

warp.sparse.bsr_set_from_triplets(
dest,
rows,
columns,
values,
prune_numerical_zeros=True,
)[source]#

Fills a BSR matrix with values defined by coordinate-oriented (COO) triplets, discarding existing blocks.

The first dimension of the three input arrays must match and indicates the number of COO triplets.

Parameters:
  • dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Sparse matrix to populate

  • rows (Array[int]) – Row index for each non-zero

  • columns (Array[int]) – Columns index for each non-zero

  • values (Array[Scalar | _MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Block values for each non-zero. Must be either a one-dimensional array with data type identical to the dest matrix’s block type, or a 3d array with data type equal to the dest matrix’s scalar type.

  • prune_numerical_zeros (bool) – If True, will ignore the zero-valued blocks

warp.sparse.bsr_assign(dest, src, structure_only=False)[source]#

Copies the content of the src BSR matrix to dest.

Parameters:
  • src (BsrMatrix[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]] | _BsrExpression[_MatrixBlockType[Any, Any, Any] | _ScalarBlockType[Any]]) – Matrix to be copied

  • dest (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Destination matrix. May have a different block shape of scalar type than src, in which case the required casting will be performed.

  • structure_only (bool) – If True, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at least src.nnz blocks. If structure_only is False, values are also copied with implicit casting if the two matrices use distinct scalar types.

warp.sparse.bsr_copy(A, scalar_type=None, block_shape=None, structure_only=False)[source]#

Returns a copy of matrix A, possibly changing its scalar type.

Parameters:
  • A (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Matrix to be copied

  • scalar_type (Scalar | None) – If provided, the returned matrix will use this scalar type instead of the one from A.

  • block_shape (Tuple[int, int] | None) – If provided, the returned matrix will use blocks of this shape instead of the one from A. Both dimensions of block_shape must be either a multiple or an exact divider of the ones from A.

  • structure_only (bool) – If True, only the non-zeros indices are copied, and uninitialized value storage is allocated to accommodate at least src.nnz blocks. If structure_only is False, values are also copied with implicit casting if the two matrices use distinct scalar types.

warp.sparse.bsr_set_transpose(dest, src)[source]#

Assigns the transposed matrix src to matrix dest

Parameters:
  • dest (BsrMatrix[_MatrixBlockType[Cols, Rows, Scalar] | _ScalarBlockType[Scalar]])

  • src (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]])

warp.sparse.bsr_transposed(A)[source]#

Returns a copy of the transposed matrix A

Parameters:

A (BsrMatrix[BlockType] | _BsrExpression[BlockType])

warp.sparse.bsr_get_diag(A, out=None)[source]#

Returns the array of blocks that constitute the diagonal of a sparse matrix.

Parameters:
  • A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – the sparse matrix from which to extract the diagonal

  • out (Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – if provided, the array into which to store the diagonal blocks

Return type:

Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]

warp.sparse.bsr_set_diag(A, diag, rows_of_blocks=None, cols_of_blocks=None)[source]#

Sets A as a block-diagonal matrix

Parameters:
  • A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – the sparse matrix to modify

  • diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Either a warp array of type A.values.dtype, in which case each element will define one block of the diagonal, or a constant value of type A.values.dtype, in which case it will get assigned to all diagonal blocks.

  • rows_of_blocks (int | None) – If not None, the new number of rows of blocks

  • cols_of_blocks (int | None) – If not None, the new number of columns of blocks

The shape of the matrix will be defined one of the following, in that order:
  • rows_of_blocks and cols_of_blocks, if provided. If only one is given, the second is assumed equal.

  • the first dimension of diag, if diag is an array

  • the current dimensions of A otherwise

warp.sparse.bsr_diag(diag, rows_of_blocks=None, cols_of_blocks=None)[source]#

Creates and returns a block-diagonal BSR matrix from an given block value or array of block values.

Parameters:
  • diag (_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar] | Array[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Either a warp array of type A.values.dtype, in which case each element will define one block of the diagonal, or a constant value of type A.values.dtype, in which case it will get assigned to all diagonal blocks.

  • rows_of_blocks (int | None) – If not None, the new number of rows of blocks

  • cols_of_blocks (int | None) – If not None, the new number of columns of blocks

Return type:

BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]

The shape of the matrix will be defined one of the following, in that order:
  • rows_of_blocks and cols_of_blocks, if provided. If only one is given, the second is assumed equal.

  • the first dimension of diag, if diag is an array

warp.sparse.bsr_set_identity(A, rows_of_blocks=None)[source]#

Sets A as the identity matrix

Parameters:
  • A (BsrMatrix) – the sparse matrix to modify

  • rows_of_blocks (int | None) – if provided, the matrix will be resized as a square matrix with rows_of_blocks rows and columns.

warp.sparse.bsr_identity(rows_of_blocks, block_type, device=None)[source]#

Creates and returns a square identity matrix.

Parameters:
  • rows_of_blocks (int) – Number of rows and columns of blocks in the created matrix.

  • block_type (_MatrixBlockType[Rows, Rows, Scalar] | _ScalarBlockType[Scalar]) – Block type for the newly created matrix – must be square

  • device (Device | str | None) – Device onto which to allocate the data arrays

Return type:

BsrMatrix[_MatrixBlockType[Rows, Rows, Scalar] | _ScalarBlockType[Scalar]]

warp.sparse.bsr_scale(x, alpha)[source]#

Performs the operation x := alpha * x on BSR matrix x and returns x

Parameters:
  • x (BsrMatrix[BlockType] | _BsrExpression[BlockType])

  • alpha (Scalar)

Return type:

BsrMatrix

class warp.sparse.bsr_axpy_work_arrays[source]#

Opaque structure for persisting bsr_axpy() temporary work buffers across calls

__init__()[source]#
warp.sparse.bsr_axpy(x, y=None, alpha=1.0, beta=1.0, work_arrays=None)[source]#

Performs the sparse matrix addition y := alpha * X + beta * y on BSR matrices x and y and returns y.

The x and y matrices are allowed to alias.

Parameters:
  • x (BsrMatrix[BlockType] | _BsrExpression[BlockType]) – Read-only right-hand-side.

  • y (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If y is not provided, it will be allocated and treated as zero.

  • alpha (Scalar) – Uniform scaling factor for x

  • beta (Scalar) – Uniform scaling factor for y

  • work_arrays (bsr_axpy_work_arrays | None) – In most cases this function will require the use of temporary storage; this storage can be reused across calls by passing an instance of bsr_axpy_work_arrays in work_arrays.

Return type:

BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]

class warp.sparse.bsr_mm_work_arrays[source]#

Opaque structure for persisting bsr_mm() temporary work buffers across calls

__init__()[source]#
warp.sparse.bsr_mm(
x,
y,
z=None,
alpha=1.0,
beta=0.0,
work_arrays=None,
reuse_topology=False,
)[source]#

Performs the sparse matrix-matrix multiplication z := alpha * x * y + beta * z on BSR matrices x, y and z, and returns z.

The x, y and z matrices are allowed to alias. If the matrix z is not provided as input, it will be allocated and treated as zero.

Parameters:
  • x (BsrMatrix[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Any, Scalar] | _ScalarBlockType[Scalar]]) – Read-only left factor of the matrix-matrix product.

  • y (BsrMatrix[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Any, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only right factor of the matrix-matrix product.

  • z (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | None) – Mutable left-hand-side. If z is not provided, it will be allocated and treated as zero.

  • alpha (Scalar) – Uniform scaling factor for the x * y product

  • beta (Scalar) – Uniform scaling factor for z

  • work_arrays (bsr_mm_work_arrays | None) – In most cases this function will require the use of temporary storage; this storage can be reused across calls by passing an instance of bsr_mm_work_arrays in work_arrays.

  • reuse_topology (bool) – If True, reuse the product topology information stored in work_arrays rather than recompute it from scratch. The matrices x, y and z must be structurally similar to the previous call in which work_arrays were populated. This is necessary for bsr_mm to be captured in a CUDA graph.

Return type:

BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]

warp.sparse.bsr_mv(
A,
x,
y=None,
alpha=1.0,
beta=0.0,
transpose=False,
work_buffer=None,
)[source]#

Performs the sparse matrix-vector product y := alpha * A * x + beta * y and returns y.

The x and y vectors are allowed to alias.

Parameters:
  • A (BsrMatrix[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]] | _BsrExpression[_MatrixBlockType[Rows, Cols, Scalar] | _ScalarBlockType[Scalar]]) – Read-only, left matrix factor of the matrix-vector product.

  • x (Array[Vector[Cols, Scalar] | Scalar]) – Read-only, right vector factor of the matrix-vector product.

  • y (Array[Vector[Rows, Scalar] | Scalar] | None) – Mutable left-hand-side. If y is not provided, it will be allocated and treated as zero.

  • alpha (Scalar) – Uniform scaling factor for x. If zero, x will not be read and may be left uninitialized.

  • beta (Scalar) – Uniform scaling factor for y. If zero, y will not be read and may be left uninitialized.

  • transpose (bool) – If True, use the transpose of the matrix A. In this case the result is non-deterministic.

  • work_buffer (Array[Vector[Rows, Scalar] | Scalar] | None) – Temporary storage is required if and only if x and y are the same vector. If provided the work_buffer array will be used for this purpose, otherwise a temporary allocation will be performed.

Return type:

Array[Vector[Rows, Scalar] | Scalar]

Iterative Linear Solvers#

Warp provides a few common iterative linear solvers (cg(), cr(), bicgstab(), gmres()) with optional preconditioning.

Note

While primarily intended to work with sparse matrices, those solvers also accept dense linear operators provided as 2D Warp arrays. It is also possible to provide custom operators, see LinearOperator.

class warp.optim.linear.LinearOperator(shape, dtype, device, matvec)[source]#

Linear operator to be used as left-hand-side of linear iterative solvers.

Parameters:
  • shape (Tuple[int, int]) – Tuple containing the number of rows and columns of the operator

  • dtype (type) – Type of the operator elements

  • device (Device) – Device on which computations involving the operator should be performed

  • matvec (Callable) – Matrix-vector multiplication routine

The matrix-vector multiplication routine should have the following signature:

def matvec(x: wp.array, y: wp.array, z: wp.array, alpha: Scalar, beta: Scalar):
    '''Performs the operation z = alpha * x + beta * y'''
    ...

For performance reasons, by default the iterative linear solvers in this module will try to capture the calls for one or more iterations in CUDA graphs. If the matvec routine of a custom LinearOperator cannot be graph-captured, the use_cuda_graph=False parameter should be passed to the solver function.

__init__(shape, dtype, device, matvec)[source]#
Parameters:
property shape: Tuple[int, int][source]#
property dtype: type[source]#
property device: Device[source]#
property matvec: Callable[source]#
property scalar_type[source]#
warp.optim.linear.aslinearoperator(A)[source]#

Casts the dense or sparse matrix A as a LinearOperator

A must be of one of the following types:

  • warp.sparse.BsrMatrix

  • two-dimensional warp.array; then A is assumed to be a dense matrix

  • one-dimensional warp.array; then A is assumed to be a diagonal matrix

  • warp.sparse.LinearOperator; no casting necessary

Parameters:

A (array | BsrMatrix | LinearOperator)

Return type:

LinearOperator

warp.optim.linear.preconditioner(A, ptype='diag')[source]#

Constructs and returns a preconditioner for an input matrix.

Parameters:
  • A (array | BsrMatrix | LinearOperator) – The matrix for which to build the preconditioner

  • ptype (str) –

    The type of preconditioner. Currently the following values are supported:

    • "diag": Diagonal (a.k.a. Jacobi) preconditioner

    • "diag_abs": Similar to Jacobi, but using the absolute value of diagonal coefficients

    • "id": Identity (null) preconditioner

Return type:

LinearOperator

warp.optim.linear.cg(
A,
b,
x,
tol=None,
atol=None,
maxiter=0,
M=None,
callback=None,
check_every=10,
use_cuda_graph=True,
)[source]#

Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Gradient algorithm.

Parameters:
  • A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side

  • b (array) – the linear system’s right-hand-side

  • x (array) – initial guess and solution vector

  • tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm

  • atol (float | None) – absolute tolerance for the residual

  • maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.

  • M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that M A is close to identity.

  • callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance

  • check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.

  • use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.

Returns:

Tuple (final iteration number, residual norm, absolute tolerance)

Return type:

Tuple[int, float, float]

If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is max(atol, tol * norm(b)).

warp.optim.linear.cr(
A,
b,
x,
tol=None,
atol=None,
maxiter=0,
M=None,
callback=None,
check_every=10,
use_cuda_graph=True,
)[source]#

Computes an approximate solution to a symmetric, positive-definite linear system using the Conjugate Residual algorithm.

Parameters:
  • A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side

  • b (array) – the linear system’s right-hand-side

  • x (array) – initial guess and solution vector

  • tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm

  • atol (float | None) – absolute tolerance for the residual

  • maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always performs iterations in pairs, and as a result may exceed the specified maximum number of iterations by one.

  • M (array | BsrMatrix | LinearOperator | None) – optional left-preconditioner, ideally chosen such that M A is close to identity.

  • callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance

  • check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.

  • use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.

Returns:

Tuple (final iteration number, residual norm, absolute tolerance)

Return type:

Tuple[int, float, float]

If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is max(atol, tol * norm(b)).

warp.optim.linear.bicgstab(
A,
b,
x,
tol=None,
atol=None,
maxiter=0,
M=None,
callback=None,
check_every=10,
use_cuda_graph=True,
is_left_preconditioner=False,
)[source]#

Computes an approximate solution to a linear system using the Biconjugate Gradient Stabilized method (BiCGSTAB).

Parameters:
  • A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side

  • b (array) – the linear system’s right-hand-side

  • x (array) – initial guess and solution vector

  • tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm

  • atol (float | None) – absolute tolerance for the residual

  • maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size.

  • M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that M A (resp A M) is close to identity.

  • callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance

  • check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.

  • use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.

  • is_left_preconditioner – whether M should be used as a left- or right- preconditioner.

Returns:

Tuple (final iteration number, residual norm, absolute tolerance)

If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is max(atol, tol * norm(b)).

warp.optim.linear.gmres(
A,
b,
x,
tol=None,
atol=None,
restart=31,
maxiter=0,
M=None,
callback=None,
check_every=31,
use_cuda_graph=True,
is_left_preconditioner=False,
)[source]#

Computes an approximate solution to a linear system using the restarted Generalized Minimum Residual method (GMRES[k]).

Parameters:
  • A (array | BsrMatrix | LinearOperator) – the linear system’s left-hand-side

  • b (array) – the linear system’s right-hand-side

  • x (array) – initial guess and solution vector

  • tol (float | None) – relative tolerance for the residual, as a ratio of the right-hand-side norm

  • atol (float | None) – absolute tolerance for the residual

  • restart – The restart parameter, i.e, the k in GMRES[k]. In general, increasing this parameter reduces the number of iterations but increases memory consumption.

  • maxiter (float | None) – maximum number of iterations to perform before aborting. Defaults to the system size. Note that the current implementation always perform restart iterations at a time, and as a result may exceed the specified maximum number of iterations by restart-1.

  • M (array | BsrMatrix | LinearOperator | None) – optional left- or right-preconditioner, ideally chosen such that M A (resp A M) is close to identity.

  • callback (Callable | None) – function to be called every check_every iteration with the current iteration number, residual and tolerance

  • check_every – number of iterations every which to call callback, check the residual against the tolerance and possibility terminate the algorithm.

  • use_cuda_graph – If true and when run on a CUDA device, capture the solver iteration as a CUDA graph for reduced launch overhead. The linear operator and preconditioner must only perform graph-friendly operations.

  • is_left_preconditioner – whether M should be used as a left- or right- preconditioner.

Returns:

Tuple (final iteration number, residual norm, absolute tolerance)

If both tol and atol are provided, the absolute tolerance used as the termination criterion for the residual norm is max(atol, tol * norm(b)).