cub::DeviceAdjacentDifference#

struct DeviceAdjacentDifference#

DeviceAdjacentDifference provides device-wide, parallel operations for computing the differences of adjacent elements residing within device-accessible memory.

Overview#

  • DeviceAdjacentDifference calculates the differences of adjacent elements in d_input. Because the binary operation could be noncommutative, there are two sets of methods. Methods named SubtractLeft subtract left element *(i - 1) of input sequence from current element *i. Methods named SubtractRight subtract current element *i from the right one *(i + 1):

    int *d_values; // [1, 2, 3, 4]
    //...
    int *d_subtract_left_result  <-- [  1,  1,  1,  1 ]
    int *d_subtract_right_result <-- [ -1, -1, -1,  4 ]
    
  • For SubtractLeft, if the left element is out of bounds, the iterator is assigned to *(result + (i - first)) without modification.

  • For SubtractRight, if the right element is out of bounds, the iterator is assigned to *(result + (i - first)) without modification.

Snippet#

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the left difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

// Declare, allocate, and initialize device-accessible pointers
int  num_items;       // e.g., 8
int  *d_values;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
//...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;

cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes, d_values, num_items);

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes, d_values, num_items);

// d_values <-- [1, 1, -1, 1, -1, 1, -1, 1]

Public Static Functions

template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT = ::cuda::std::minus<>, typename NumItemsT = uint32_t>
static inline cudaError_t SubtractLeftCopy(
void *d_temp_storage,
size_t &temp_storage_bytes,
InputIteratorT d_input,
OutputIteratorT d_output,
NumItemsT num_items,
DifferenceOpT difference_op = {},
cudaStream_t stream = 0,
)#

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory

Overview#

  • Calculates the differences of adjacent elements in d_input. That is, *d_input is assigned to *d_output, and, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_output + (i - d_input)).

  • Note that the behavior is undefined if the input and output ranges overlap in any way.

Snippet#

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;      // e.g., 8
int  *d_input;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int  *d_output;
...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;

cub::DeviceAdjacentDifference::SubtractLeftCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output,
  num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeftCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output,
  num_items, CustomDifference());

// d_input  <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_output <-- [1, 1, -1, 1, -1, 1, -1, 1]

Template Parameters:
  • InputIteratorT

    is a model of Input Iterator, and x and y are objects of InputIteratorT’s value_type, then x - y is defined, and InputIteratorT’s value_type is convertible to a type in OutputIteratorT’s set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT’s set of value_types.

  • OutputIteratorT

    is a model of Output Iterator.

  • DifferenceOpT – Its result_type is convertible to a type in OutputIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[in] Pointer to the input sequence

  • d_output[out] Pointer to the output sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0

template<typename RandomAccessIteratorT, typename DifferenceOpT = ::cuda::std::minus<>, typename NumItemsT = uint32_t>
static inline cudaError_t SubtractLeft(
void *d_temp_storage,
size_t &temp_storage_bytes,
RandomAccessIteratorT d_input,
NumItemsT num_items,
DifferenceOpT difference_op = {},
cudaStream_t stream = 0,
)#

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.

Overview#

Calculates the differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_input + (i - d_input)).

Snippet#

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;     // e.g., 8
int  *d_data;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes,
  d_data, num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes,
  d_data, num_items, CustomDifference());

// d_data <-- [1, 1, -1, 1, -1, 1, -1, 1]

Template Parameters:
  • RandomAccessIteratorT

    is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT’s value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT’s set of value_types.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[inout] Pointer to the input sequence and the result

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.

template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT = ::cuda::std::minus<>, typename NumItemsT = uint32_t>
static inline cudaError_t SubtractRightCopy(
void *d_temp_storage,
size_t &temp_storage_bytes,
InputIteratorT d_input,
OutputIteratorT d_output,
NumItemsT num_items,
DifferenceOpT difference_op = {},
cudaStream_t stream = 0,
)#

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview#

  • Calculates the right differences of adjacent elements in d_input. That is, *(d_input + num_items - 1) is assigned to *(d_output + num_items - 1), and, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_output + (i - d_input)).

  • Note that the behavior is undefined if the input and output ranges overlap in any way.

Snippet#

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;     // e.g., 8
int  *d_input;      // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int  *d_output;
..

// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
size_t temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractRightCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output, num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractRightCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output, num_items, CustomDifference());

// d_input <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_data  <-- [-1, 1, -1, 1, -1, 1, -1, 2]

Template Parameters:
  • InputIteratorT

    is a model of Input Iterator, and x and y are objects of InputIteratorT’s value_type, then x - y is defined, and InputIteratorT’s value_type is convertible to a type in OutputIteratorT’s set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT’s set of value_types.

  • OutputIteratorT

    is a model of Output Iterator.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[in] Pointer to the input sequence

  • d_output[out] Pointer to the output sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences.

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.

template<typename RandomAccessIteratorT, typename DifferenceOpT = ::cuda::std::minus<>, typename NumItemsT = uint32_t>
static inline cudaError_t SubtractRight(
void *d_temp_storage,
size_t &temp_storage_bytes,
RandomAccessIteratorT d_input,
NumItemsT num_items,
DifferenceOpT difference_op = {},
cudaStream_t stream = 0,
)#

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview#

Calculates the right differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_input + (i - d_input)).

Snippet#

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

// Declare, allocate, and initialize device-accessible pointers
int  num_items;    // e.g., 8
int  *d_data;      // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...

// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
size_t temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractRight(
  d_temp_storage, temp_storage_bytes, d_data, num_items);

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractRight(
  d_temp_storage, temp_storage_bytes, d_data, num_items);

// d_data  <-- [-1, 1, -1, 1, -1, 1, -1, 2]

Template Parameters:
  • RandomAccessIteratorT

    is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT’s value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT’s set of value_types.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters:
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[inout] Pointer to the input sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.