cub::DeviceSpmv
Defined in cub/device/device_spmv.cuh
-
struct DeviceSpmv
DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV).
Overview
The SpMV computation performs the matrix-vector operation
y = A * x + y
, where:A
is anm * n
sparse matrix whose non-zero structure is specified in compressed-storage-row (CSR) format (i.e., three arrays:values
,row_offsets
, andcolumn_indices
)x
andy
are dense vectors
Usage Considerations
Dynamic parallelism. DeviceSpmv methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
CSR matrix operations
-
template<typename ValueT>
static inline cudaError_t CsrMV(void *d_temp_storage, size_t &temp_storage_bytes, const ValueT *d_values, const int *d_row_offsets, const int *d_column_indices, const ValueT *d_vector_x, ValueT *d_vector_y, int num_rows, int num_cols, int num_nonzeros, cudaStream_t stream = 0) This function performs the matrix-vector operation
y = A*x
.Snippet
The code snippet below illustrates SpMV upon a 9x9 CSR matrix
A
representing a 3x3 lattice (24 non-zeros).#include <cub/cub.cuh> // or equivalently <cub/device/device_spmv.cuh> // Declare, allocate, and initialize device-accessible pointers for input matrix A, input vector x, // and output vector y int num_rows = 9; int num_cols = 9; int num_nonzeros = 24; float* d_values; // e.g., [1, 1, 1, 1, 1, 1, 1, 1, // 1, 1, 1, 1, 1, 1, 1, 1, // 1, 1, 1, 1, 1, 1, 1, 1] int* d_column_indices; // e.g., [1, 3, 0, 2, 4, 1, 5, 0, // 4, 6, 1, 3, 5, 7, 2, 4, // 8, 3, 7, 4, 6, 8, 5, 7] int* d_row_offsets; // e.g., [0, 2, 5, 7, 10, 14, 17, 19, 22, 24] float* d_vector_x; // e.g., [1, 1, 1, 1, 1, 1, 1, 1, 1] float* d_vector_y; // e.g., [ , , , , , , , , ] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceSpmv::CsrMV(d_temp_storage, temp_storage_bytes, d_values, d_row_offsets, d_column_indices, d_vector_x, d_vector_y, num_rows, num_cols, num_nonzeros); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Run SpMV cub::DeviceSpmv::CsrMV(d_temp_storage, temp_storage_bytes, d_values, d_row_offsets, d_column_indices, d_vector_x, d_vector_y, num_rows, num_cols, num_nonzeros); // d_vector_y <-- [2, 3, 2, 3, 4, 3, 2, 3, 2]
- Template Parameters
ValueT – [inferred] Matrix and vector value type (e.g.,
float
,double
, etc.)- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to
temp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_values – [in] Pointer to the array of
num_nonzeros
values of the corresponding nonzero elements of matrixA
.d_row_offsets – [in] Pointer to the array of
m + 1
offsets demarcating the start of every row ind_column_indices
andd_values
(with the final entry being equal tonum_nonzeros
)d_column_indices – [in] Pointer to the array of
num_nonzeros
column-indices of the corresponding nonzero elements of matrixA
. (Indices are zero-valued.)d_vector_x – [in] Pointer to the array of
num_cols
values corresponding to the dense input vectorx
d_vector_y – [out] Pointer to the array of
num_rows
values corresponding to the dense output vectory
num_rows – [in] number of rows of matrix
A
.num_cols – [in] number of columns of matrix
A
.num_nonzeros – [in] number of nonzero elements of matrix
A
.stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.