cub::DeviceAdjacentDifference

Defined in /home/runner/work/cccl/cccl/cub/cub/device/device_adjacent_difference.cuh

struct DeviceAdjacentDifference

DeviceAdjacentDifference provides device-wide, parallel operations for computing the differences of adjacent elements residing within device-accessible memory.

Overview

  • DeviceAdjacentDifference calculates the differences of adjacent elements in d_input. Because the binary operation could be noncommutative, there are two sets of methods. Methods named SubtractLeft subtract left element *(i - 1) of input sequence from current element *i. Methods named SubtractRight subtract current element *i from the right one *(i + 1):

    int *d_values; // [1, 2, 3, 4]
    //...
    int *d_subtract_left_result  <-- [  1,  1,  1,  1 ]
    int *d_subtract_right_result <-- [ -1, -1, -1,  4 ]
    
  • For SubtractLeft, if the left element is out of bounds, the iterator is assigned to *(result + (i - first)) without modification.

  • For SubtractRight, if the right element is out of bounds, the iterator is assigned to *(result + (i - first)) without modification.

Snippet

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the left difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

// Declare, allocate, and initialize device-accessible pointers
int  num_items;       // e.g., 8
int  *d_values;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
//...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;

cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes, d_values, num_items);

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes, d_values, num_items);

// d_values <-- [1, 1, -1, 1, -1, 1, -1, 1]

Public Static Functions

template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT = cub::Difference, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractLeftCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, NumItemsT num_items, DifferenceOpT difference_op = {}, cudaStream_t stream = 0)

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory

Overview

  • Calculates the differences of adjacent elements in d_input. That is, *d_input is assigned to *d_output, and, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_output + (i - d_input)).

  • Note that the behavior is undefined if the input and output ranges overlap in any way.

Snippet

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;      // e.g., 8
int  *d_input;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int  *d_output;
...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;

cub::DeviceAdjacentDifference::SubtractLeftCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output,
  num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeftCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output,
  num_items, CustomDifference());

// d_input  <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_output <-- [1, 1, -1, 1, -1, 1, -1, 1]

Template Parameters
  • InputIteratorT

    is a model of Input Iterator, and x and y are objects of InputIteratorT’s value_type, then x - y is defined, and InputIteratorT’s value_type is convertible to a type in OutputIteratorT’s set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT’s set of value_types.

  • OutputIteratorT

    is a model of Output Iterator.

  • DifferenceOpT – Its result_type is convertible to a type in OutputIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[in] Pointer to the input sequence

  • d_output[out] Pointer to the output sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0

template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractLeftCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, NumItemsT num_items, DifferenceOpT difference_op, cudaStream_t stream, bool debug_synchronous)
template<typename RandomAccessIteratorT, typename DifferenceOpT = cub::Difference, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractLeft(void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, NumItemsT num_items, DifferenceOpT difference_op = {}, cudaStream_t stream = 0)

Subtracts the left element of each adjacent pair of elements residing within device-accessible memory.

Overview

Calculates the differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input + 1, d_input + num_items), the result of difference_op(*i, *(i - 1)) is assigned to *(d_input + (i - d_input)).

Snippet

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;     // e.g., 8
int  *d_data;       // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...

// Determine temporary device storage requirements
void     *d_temp_storage = nullptr;
size_t   temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes,
  d_data, num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractLeft(
  d_temp_storage, temp_storage_bytes,
  d_data, num_items, CustomDifference());

// d_data <-- [1, 1, -1, 1, -1, 1, -1, 1]

Template Parameters
  • RandomAccessIteratorT

    is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT’s value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT’s set of value_types.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[inout] Pointer to the input sequence and the result

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.

template<typename RandomAccessIteratorT, typename DifferenceOpT, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractLeft(void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, NumItemsT num_items, DifferenceOpT difference_op, cudaStream_t stream, bool debug_synchronous)
template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT = cub::Difference, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractRightCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, NumItemsT num_items, DifferenceOpT difference_op = {}, cudaStream_t stream = 0)

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview

  • Calculates the right differences of adjacent elements in d_input. That is, *(d_input + num_items - 1) is assigned to *(d_output + num_items - 1), and, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_output + (i - d_input)).

  • Note that the behavior is undefined if the input and output ranges overlap in any way.

Snippet

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

struct CustomDifference
{
  template <typename DataType>
  __host__ DataType operator()(DataType &lhs, DataType &rhs)
  {
    return lhs - rhs;
  }
};

// Declare, allocate, and initialize device-accessible pointers
int  num_items;     // e.g., 8
int  *d_input;      // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
int  *d_output;
..

// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
size_t temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractRightCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output, num_items, CustomDifference());

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractRightCopy(
  d_temp_storage, temp_storage_bytes,
  d_input, d_output, num_items, CustomDifference());

// d_input <-- [1, 2, 1, 2, 1, 2, 1, 2]
// d_data  <-- [-1, 1, -1, 1, -1, 1, -1, 2]

Template Parameters
  • InputIteratorT

    is a model of Input Iterator, and x and y are objects of InputIteratorT’s value_type, then x - y is defined, and InputIteratorT’s value_type is convertible to a type in OutputIteratorT’s set of value_types, and the return type of x - y is convertible to a type in OutputIteratorT’s set of value_types.

  • OutputIteratorT

    is a model of Output Iterator.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[in] Pointer to the input sequence

  • d_output[out] Pointer to the output sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences.

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.

template<typename InputIteratorT, typename OutputIteratorT, typename DifferenceOpT, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractRightCopy(void *d_temp_storage, std::size_t &temp_storage_bytes, InputIteratorT d_input, OutputIteratorT d_output, NumItemsT num_items, DifferenceOpT difference_op, cudaStream_t stream, bool debug_synchronous)
template<typename RandomAccessIteratorT, typename DifferenceOpT = cub::Difference, typename NumItemsT = std::uint32_t>
static inline cudaError_t SubtractRight(void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, NumItemsT num_items, DifferenceOpT difference_op = {}, cudaStream_t stream = 0)

Subtracts the right element of each adjacent pair of elements residing within device-accessible memory.

Overview

Calculates the right differences of adjacent elements in d_input. That is, for each iterator i in the range [d_input, d_input + num_items - 1), the result of difference_op(*i, *(i + 1)) is assigned to *(d_input + (i - d_input)).

Snippet

The code snippet below illustrates how to use DeviceAdjacentDifference to compute the difference between adjacent elements.

#include <cub/cub.cuh>
// or equivalently <cub/device/device_adjacent_difference.cuh>

// Declare, allocate, and initialize device-accessible pointers
int  num_items;    // e.g., 8
int  *d_data;      // e.g., [1, 2, 1, 2, 1, 2, 1, 2]
...

// Determine temporary device storage requirements
void *d_temp_storage = nullptr;
size_t temp_storage_bytes = 0;
cub::DeviceAdjacentDifference::SubtractRight(
  d_temp_storage, temp_storage_bytes, d_data, num_items);

// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run operation
cub::DeviceAdjacentDifference::SubtractRight(
  d_temp_storage, temp_storage_bytes, d_data, num_items);

// d_data  <-- [-1, 1, -1, 1, -1, 1, -1, 2]

Template Parameters
  • RandomAccessIteratorT

    is a model of Random Access Iterator, RandomAccessIteratorT is mutable. If x and y are objects of RandomAccessIteratorT’s value_type, and x - y is defined, then the return type of x - y should be convertible to a type in RandomAccessIteratorT’s set of value_types.

  • DifferenceOpT – Its result_type is convertible to a type in RandomAccessIteratorT’s set of value_types.

  • NumItemsT[inferred] Type of num_items

Parameters
  • d_temp_storage[in] Device-accessible allocation of temporary storage. When nullptr, the required allocation size is written to temp_storage_bytes and no work is done.

  • temp_storage_bytes[inout] Reference to size in bytes of d_temp_storage allocation

  • d_input[inout] Pointer to the input sequence

  • num_items[in] Number of items in the input sequence

  • difference_op[in] The binary function used to compute differences

  • stream[in]

    [optional] CUDA stream to launch kernels within. Default is stream0.

template<typename RandomAccessIteratorT, typename DifferenceOpT, typename NumItemsT>
static inline cudaError_t SubtractRight(void *d_temp_storage, std::size_t &temp_storage_bytes, RandomAccessIteratorT d_input, NumItemsT num_items, DifferenceOpT difference_op, cudaStream_t stream, bool debug_synchronous)