cub::DeviceHistogram#
-
struct DeviceHistogram#
DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
Overview#
A histogram counts the number of observations that fall into each of the disjoint categories (known as bins).
Usage Considerations#
Dynamic parallelism. DeviceHistogram methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Evenly-segmented bin ranges
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_samples,
- cudaStream_t stream = 0
Computes an intensity histogram from a sequence of data samples using equal-width bins.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The number of histogram bins is (
num_levels - 1)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1).If the common type of
SampleTandLevelTis of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level), round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.The ranges
[d_samples, d_samples + num_samples)and[d_histogram, d_histogram + num_levels - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, 0.3, 2.9, 2.1, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_samples – [in] The number of input samples (i.e., the length of
d_samples)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Computes an intensity histogram from a sequence of data samples using equal-width bins.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins is (
num_levels - 1)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)If the common type of
SampleTandLevelTis of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level), round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
rin[0, num_rows), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)androw_end = row_begin + num_row_samples. The ranges[row_begin, row_end)and[d_histogram, d_histogram + num_levels - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; size_t row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.1, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_pixels,
- cudaStream_t stream = 0
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., only RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i]), round down to the nearest whole number. To protect against potential overflows, if, for any channeli, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given channel
cin[0, NUM_ACTIVE_CHANNELS), the ranges[d_samples, d_samples + NUM_CHANNELS * num_pixels)and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of three 256-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), // (0, 6, 7, 5), (3, 0, 2, 6)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 0, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]should be num_levels[i] - 1`.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT lower_level[NUM_ACTIVE_CHANNELS],
- const LevelT upper_level[NUM_ACTIVE_CHANNELS],
- OffsetT num_pixels,
- cudaStream_t stream = 0
Deprecate [Since 3.0].
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., only RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i]), round down to the nearest whole number. To protect against potential overflows, if, for any channeli, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
rin[0, num_rows), and samplesin[0, num_row_pixels), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT),sample_begin = row_begin + s * NUM_CHANNELS, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS. For a given channelcin[0, NUM_ACTIVE_CHANNELS), the ranges[sample_begin, sample_end)and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of three 256-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), (-, -, -, -), // (0, 6, 7, 5), (3, 0, 2, 6), (1, 1, 1, 1), (-, -, -, -)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [1, 1, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 1, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]should benum_levels[i] - 1.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT lower_level[NUM_ACTIVE_CHANNELS],
- const LevelT upper_level[NUM_ACTIVE_CHANNELS],
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Deprecate [Since 3.0].
Custom bin ranges
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_samples,
- cudaStream_t stream = 0
Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The number of histogram bins is (
num_levels - 1)The value range for bini is
[level[i], level[i+1])The range
[d_histogram, d_histogram + num_levels - 1)shall not overlap[d_samples, d_samples + num_samples)nor[d_levels, d_levels + num_levels)in any way. The ranges[d_levels, d_levels + num_levels)and[d_samples, d_samples + num_samples)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of an six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, 0.3, 2.9, 2.0, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float* d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_samples – [in] The number of data samples per row in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins is (
num_levels - 1)The value range for bini is
[level[i], level[i+1])For a given row
rin[0, num_rows), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)androw_end = row_begin + num_row_samples. The range[d_histogram, d_histogram + num_levels - 1)shall not overlap[row_begin, row_end)nor[d_levels, d_levels + num_levels). The ranges[d_levels, d_levels + num_levels)and[row_begin, row_end)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input samples and // output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; int row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.0, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float *d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_pixels,
- cudaStream_t stream = 0
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)For given channels
c1andc2in[0, NUM_ACTIVE_CHANNELS), the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)shall not overlap[d_samples, d_samples + NUM_CHANNELS * num_pixels)nor[d_levels[c2], d_levels[c2] + num_levels[c2])in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])and[d_samples, d_samples + NUM_CHANNELS * num_pixels)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of three 4-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char *d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(7, 0, 6, 2), // (0, 6, 7, 5),(3, 0, 2, 6)] unsigned int *d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int *d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // d_histogram <-- [ [1, 3, 0, 1], // [3, 0, 0, 2], // [0, 2, 0, 3] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]should benum_levels[i] - 1.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT *const d_levels[NUM_ACTIVE_CHANNELS],
- OffsetT num_pixels,
- cudaStream_t stream = 0
Deprecate [Since 3.0].
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
Added in version 2.2.0: First appears in CUDA Toolkit 12.3.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)For a given row
rin[0, num_rows), and samplesin[0, num_row_pixels), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT),sample_begin = row_begin + s * NUM_CHANNELS, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS. For given channelsc1andc2in[0, NUM_ACTIVE_CHANNELS), the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)shall not overlap[sample_begin, sample_end)nor[d_levels[c2], d_levels[c2] + num_levels[c2])in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])and[sample_begin, sample_end)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
The code snippet below illustrates the computation of three 4-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(1, 1, 1, 1),(-, -, -, -), // (7, 0, 6, 2),(0, 6, 7, 5),(3, 0, 2, 6),(-, -, -, -)] int* d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int* d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [2, 3, 0, 1], // [3, 0, 0, 2], // [1, 2, 0, 3] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
intwill typically yield better performance thansize_tin 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr, the required allocation size is written totemp_storage_bytesand no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storageallocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]should benum_levels[i] - 1.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT *const d_levels[NUM_ACTIVE_CHANNELS],
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0
Deprecate [Since 3.0].
Environment-based overloads
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t HistogramEven( - SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_samples,
- EnvT env = {}
Computes an intensity histogram from a sequence of data samples using equal-width bins.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe number of histogram bins is (
num_levels - 1)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1).If the common type of
SampleTandLevelTis of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level), round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.The ranges
[d_samples, d_samples + num_samples)and[d_histogram, d_histogram + num_levels - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
auto d_samples = thrust::device_vector<int>{0, 2, 1, 0, 3, 4, 2, 1}; int num_samples = static_cast<int>(d_samples.size()); int num_levels = 6; int lower_level = 0; int upper_level = 5; auto d_histogram = thrust::device_vector<int>(num_levels - 1, 0); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::HistogramEven( thrust::raw_pointer_cast(d_samples.data()), thrust::raw_pointer_cast(d_histogram.data()), num_levels, lower_level, upper_level, num_samples, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::HistogramEven failed with status: " << error << std::endl; } thrust::device_vector<int> expected{2, 2, 2, 1, 1};
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_samples – [in] The number of input samples (i.e., the length of
d_samples)env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t HistogramEven( - SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- EnvT env = {}
Computes an intensity histogram from a 2D region of data samples using equal-width bins.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceA two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins is (
num_levels - 1)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)If the common type of
SampleTandLevelTis of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level), round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
rin[0, num_rows), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)androw_end = row_begin + num_row_samples. The ranges[row_begin, row_end)and[d_histogram, d_histogram + num_levels - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 2D region of interest: 2 rows, 3 samples per row, row stride includes 1 padding element // Row 0: [0, 1, 2, PAD] Row 1: [1, 2, 0, PAD] auto d_samples = thrust::device_vector<int>{0, 1, 2, -1, 1, 2, 0, -1}; int num_levels = 4; // 3 bins: [0,1), [1,2), [2,3) int lower_level = 0; int upper_level = 3; int num_row_samples = 3; int num_rows = 2; size_t row_stride_bytes = 4 * sizeof(int); auto d_histogram = thrust::device_vector<int>(num_levels - 1, 0); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::HistogramEven( thrust::raw_pointer_cast(d_samples.data()), thrust::raw_pointer_cast(d_histogram.data()), num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::HistogramEven (2D) failed with status: " << error << std::endl; } // Samples: 0,1,2, 1,2,0 → bin[0]=2, bin[1]=2, bin[2]=2 thrust::device_vector<int> expected{2, 2, 2};
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples.
lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MultiHistogramEven( - SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_pixels,
- EnvT env = {}
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., only RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i]), round down to the nearest whole number. To protect against potential overflows, if, for any channeli, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given channel
cin[0, NUM_ACTIVE_CHANNELS), the ranges[d_samples, d_samples + NUM_CHANNELS * num_pixels)and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 4-channel RGBA pixels, histogram 3 active channels [[maybe_unused]] constexpr int NUM_CHANNELS = 4; [[maybe_unused]] constexpr int NUM_ACTIVE_CHANNELS = 3; // clang-format off // 2 pixels: (R=0, G=2, B=1, A=255), (R=3, G=4, B=2, A=128) auto d_samples = thrust::device_vector<unsigned char>{0, 2, 1, 255, 3, 4, 2, 128}; // clang-format on int num_pixels = 2; // 5 levels per channel → 4 bins per channel: [0,1), [1,2), [2,3), [3,4) cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels = {5, 5, 5}; cuda::std::array<unsigned char, NUM_ACTIVE_CHANNELS> lower_level = {0, 0, 0}; cuda::std::array<unsigned char, NUM_ACTIVE_CHANNELS> upper_level = {4, 4, 4}; auto d_histogram_r = thrust::device_vector<int>(4, 0); auto d_histogram_g = thrust::device_vector<int>(4, 0); auto d_histogram_b = thrust::device_vector<int>(4, 0); cuda::std::array<int*, NUM_ACTIVE_CHANNELS> d_histogram = { thrust::raw_pointer_cast(d_histogram_r.data()), thrust::raw_pointer_cast(d_histogram_g.data()), thrust::raw_pointer_cast(d_histogram_b.data())}; cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::MultiHistogramEven<NUM_CHANNELS, NUM_ACTIVE_CHANNELS>( thrust::raw_pointer_cast(d_samples.data()), d_histogram, num_levels, lower_level, upper_level, num_pixels, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::MultiHistogramEven failed with status: " << error << std::endl; } // R: 0→bin[0], 3→bin[3] thrust::device_vector<int> expected_r{1, 0, 0, 1}; // G: 2→bin[2], 4→out of range thrust::device_vector<int> expected_g{0, 0, 1, 0}; // B: 1→bin[1], 2→bin[2] thrust::device_vector<int> expected_b{0, 1, 1, 0};
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the multi-channel input sequence of data samples.
d_histogram – [out] Array of active channel histogram counter output arrays, each of length
num_levels[channel] - 1.num_levels – [in] Array of the number of boundaries (levels) for each active channel.
lower_level – [in] Array of the lower sample value bound (inclusive) for the lowest bin of each active channel.
upper_level – [in] Array of the upper sample value bound (exclusive) for the highest bin of each active channel.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS)env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MultiHistogramEven( - SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- EnvT env = {}
Computes per-channel intensity histograms from a 2D region of multi-channel “pixel” data samples using equal-width bins.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., only RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i]), round down to the nearest whole number. To protect against potential overflows, if, for any channeli, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)exceeds the number representable by anuint64_t, the cuda errorcudaErrorInvalidValueis returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValuewill only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
rin[0, num_rows), and samplesin[0, num_row_pixels), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT),sample_begin = row_begin + s * NUM_CHANNELS, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS. For a given channelcin[0, NUM_ACTIVE_CHANNELS), the ranges[sample_begin, sample_end)and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible tointand trivially copyable.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 4-channel RGBA pixels, histogram 3 active channels, 2D region [[maybe_unused]] constexpr int NUM_CHANNELS = 4; [[maybe_unused]] constexpr int NUM_ACTIVE_CHANNELS = 3; // 2 rows, 2 pixels per row, stride includes 1 extra padding pixel per row // Row 0: (R=0, G=2, B=1, A=255), (R=3, G=4, B=2, A=128), (PAD, PAD, PAD, PAD) // Row 1: (R=1, G=1, B=3, A=200), (R=2, G=3, B=0, A=100), (PAD, PAD, PAD, PAD) auto d_samples = thrust::device_vector<unsigned char>{ 0, 2, 1, 255, 3, 4, 2, 128, 0, 0, 0, 0, 1, 1, 3, 200, 2, 3, 0, 100, 0, 0, 0, 0}; int num_row_pixels = 2; int num_rows = 2; size_t row_stride_bytes = 3 * NUM_CHANNELS * sizeof(unsigned char); // 3 pixels wide, 2 used cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels = {5, 5, 5}; cuda::std::array<unsigned char, NUM_ACTIVE_CHANNELS> lower_level = {0, 0, 0}; cuda::std::array<unsigned char, NUM_ACTIVE_CHANNELS> upper_level = {4, 4, 4}; auto d_histogram_r = thrust::device_vector<int>(4, 0); auto d_histogram_g = thrust::device_vector<int>(4, 0); auto d_histogram_b = thrust::device_vector<int>(4, 0); cuda::std::array<int*, NUM_ACTIVE_CHANNELS> d_histogram = { thrust::raw_pointer_cast(d_histogram_r.data()), thrust::raw_pointer_cast(d_histogram_g.data()), thrust::raw_pointer_cast(d_histogram_b.data())}; cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::MultiHistogramEven<NUM_CHANNELS, NUM_ACTIVE_CHANNELS>( thrust::raw_pointer_cast(d_samples.data()), d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::MultiHistogramEven (2D) failed with status: " << error << std::endl; } // R: 0, 3, 1, 2 → bin[0]=1, bin[1]=1, bin[2]=1, bin[3]=1 thrust::device_vector<int> expected_r{1, 1, 1, 1}; // G: 2, 4, 1, 3 → bin[1]=1, bin[2]=1, bin[3]=1 (4 is out of range) thrust::device_vector<int> expected_g{0, 1, 1, 1}; // B: 1, 2, 3, 0 → bin[0]=1, bin[1]=1, bin[2]=1, bin[3]=1 thrust::device_vector<int> expected_b{1, 1, 1, 1};
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the multi-channel input sequence of data samples.
d_histogram – [out] Array of active channel histogram counter output arrays, each of length
num_levels[channel] - 1.num_levels – [in] Array of the number of boundaries (levels) for each active channel.
lower_level – [in] Array of the lower sample value bound (inclusive) for the lowest bin of each active channel.
upper_level – [in] Array of the upper sample value bound (exclusive) for the highest bin of each active channel.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t HistogramRange( - SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_samples,
- EnvT env = {}
Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe number of histogram bins is (
num_levels - 1)The value range for bini is
[level[i], level[i+1])The range
[d_histogram, d_histogram + num_levels - 1)shall not overlap[d_samples, d_samples + num_samples)nor[d_levels, d_levels + num_levels)in any way. The ranges[d_levels, d_levels + num_levels)and[d_samples, d_samples + num_samples)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
auto d_samples = thrust::device_vector<float>{2.2f, 6.1f, 7.5f, 2.9f, 3.5f, 0.3f, 2.9f, 2.1f}; int num_samples = static_cast<int>(d_samples.size()); auto d_levels = thrust::device_vector<float>{0.0f, 2.0f, 4.0f, 6.0f, 8.0f}; int num_levels = static_cast<int>(d_levels.size()); auto d_histogram = thrust::device_vector<int>(num_levels - 1, 0); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::HistogramRange( thrust::raw_pointer_cast(d_samples.data()), thrust::raw_pointer_cast(d_histogram.data()), num_levels, thrust::raw_pointer_cast(d_levels.data()), num_samples, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::HistogramRange failed with status: " << error << std::endl; } thrust::device_vector<int> expected{1, 5, 0, 2};
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1.d_levels – [in] The pointer to the array of boundaries (levels). Bins are defined by consecutive pairs.
num_samples – [in] The number of input samples (i.e., the length of
d_samples)env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t HistogramRange( - SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- EnvT env = {}
Computes an intensity histogram from a 2D region of data samples using the specified bin boundary levels.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceA two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins is (
num_levels - 1)The value range for bini is
[level[i], level[i+1])For a given row
rin[0, num_rows), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)androw_end = row_begin + num_row_samples. The range[d_histogram, d_histogram + num_levels - 1)shall not overlap[row_begin, row_end)nor[d_levels, d_levels + num_levels). The ranges[d_levels, d_levels + num_levels)and[row_begin, row_end)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 2D region of interest: 2 rows, 3 samples per row, row stride includes 1 padding element // Row 0: [0, 1, 2, PAD] Row 1: [1, 2, 0, PAD] auto d_samples = thrust::device_vector<int>{0, 1, 2, -1, 1, 2, 0, -1}; auto d_levels = thrust::device_vector<int>{0, 1, 2, 3}; // 3 bins: [0,1), [1,2), [2,3) int num_levels = static_cast<int>(d_levels.size()); int num_row_samples = 3; int num_rows = 2; size_t row_stride_bytes = 4 * sizeof(int); auto d_histogram = thrust::device_vector<int>(num_levels - 1, 0); cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::HistogramRange( thrust::raw_pointer_cast(d_samples.data()), thrust::raw_pointer_cast(d_histogram.data()), num_levels, thrust::raw_pointer_cast(d_levels.data()), num_row_samples, num_rows, row_stride_bytes, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::HistogramRange (2D) failed with status: " << error << std::endl; } // Samples: 0,1,2, 1,2,0 → bin[0]=2, bin[1]=2, bin[2]=2 thrust::device_vector<int> expected{2, 2, 2};
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1.num_levels – [in] The number of boundaries (levels) for delineating histogram samples.
d_levels – [in] The pointer to the array of boundaries (levels). Bins are defined by consecutive pairs.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MultiHistogramRange( - SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_pixels,
- EnvT env = {}
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)For given channels
c1andc2in[0, NUM_ACTIVE_CHANNELS), the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)shall not overlap[d_samples, d_samples + NUM_CHANNELS * num_pixels)nor[d_levels[c2], d_levels[c2] + num_levels[c2])in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])and[d_samples, d_samples + NUM_CHANNELS * num_pixels)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 4-channel RGBA pixels, histogram 3 active channels [[maybe_unused]] constexpr int NUM_CHANNELS = 4; [[maybe_unused]] constexpr int NUM_ACTIVE_CHANNELS = 3; // 2 pixels: (R=0, G=2, B=1, A=255), (R=3, G=4, B=2, A=128) auto d_samples = thrust::device_vector<unsigned char>{0, 2, 1, 255, 3, 4, 2, 128}; int num_pixels = 2; // Custom bin boundaries per channel auto d_levels_r = thrust::device_vector<unsigned char>{0, 2, 4}; // 2 bins: [0,2), [2,4) auto d_levels_g = thrust::device_vector<unsigned char>{0, 3, 5}; // 2 bins: [0,3), [3,5) auto d_levels_b = thrust::device_vector<unsigned char>{0, 1, 2, 3}; // 3 bins: [0,1), [1,2), [2,3) cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels = {3, 3, 4}; cuda::std::array<const unsigned char*, NUM_ACTIVE_CHANNELS> d_levels = { thrust::raw_pointer_cast(d_levels_r.data()), thrust::raw_pointer_cast(d_levels_g.data()), thrust::raw_pointer_cast(d_levels_b.data())}; auto d_histogram_r = thrust::device_vector<int>(2, 0); auto d_histogram_g = thrust::device_vector<int>(2, 0); auto d_histogram_b = thrust::device_vector<int>(3, 0); cuda::std::array<int*, NUM_ACTIVE_CHANNELS> d_histogram = { thrust::raw_pointer_cast(d_histogram_r.data()), thrust::raw_pointer_cast(d_histogram_g.data()), thrust::raw_pointer_cast(d_histogram_b.data())}; cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::MultiHistogramRange<NUM_CHANNELS, NUM_ACTIVE_CHANNELS>( thrust::raw_pointer_cast(d_samples.data()), d_histogram, num_levels, d_levels, num_pixels, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::MultiHistogramRange failed with status: " << error << std::endl; } // R: 0→[0,2), 3→[2,4) thrust::device_vector<int> expected_r{1, 1}; // G: 2→[0,3), 4→[3,5) thrust::device_vector<int> expected_g{1, 1}; // B: 1→[1,2), 2→[2,3) thrust::device_vector<int> expected_b{0, 1, 1};
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the multi-channel input sequence of data samples.
d_histogram – [out] Array of active channel histogram counter output arrays, each of length
num_levels[channel] - 1.num_levels – [in] Array of the number of boundaries (levels) for each active channel.
d_levels – [in] Array of pointers to the arrays of boundaries (levels) for each active channel.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS)env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT, typename EnvT = ::cuda::std::execution::env<>>
static inline cudaError_t MultiHistogramRange( - SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- EnvT env = {}
Computes per-channel intensity histograms from a 2D region of multi-channel “pixel” data samples using the specified bin boundary levels.
Added in version 3.4.0: First appears in CUDA Toolkit 13.4.
This is an environment-based API that allows customization of:
Stream: Query via
cuda::get_streamMemory resource: Query via
cuda::mr::get_memory_resourceThe input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELSconsecutive data samples (e.g., an RGBA pixel).NUM_CHANNELScan be up to 4.Of the
NUM_CHANNELSspecified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS(e.g., RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samplescan be specified using thenum_row_samples,num_rows, androw_stride_bytesparameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0.The number of histogram bins for channeli is
num_levels[i] - 1.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)For a given row
rin[0, num_rows), and samplesin[0, num_row_pixels), letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT),sample_begin = row_begin + s * NUM_CHANNELS, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS. For given channelsc1andc2in[0, NUM_ACTIVE_CHANNELS), the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)shall not overlap[sample_begin, sample_end)nor[d_levels[c2], d_levels[c2] + num_levels[c2])in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])and[sample_begin, sample_end)may overlap.When
d_temp_storageisnullptr, no work is done and the required allocation size is returned intemp_storage_bytes. See Determining Temporary Storage Requirements for usage guidance.
Snippet#
// 4-channel RGBA pixels, histogram 3 active channels, 2D region [[maybe_unused]] constexpr int NUM_CHANNELS = 4; [[maybe_unused]] constexpr int NUM_ACTIVE_CHANNELS = 3; // 2 rows, 2 pixels per row, stride includes 1 extra padding pixel per row // Row 0: (R=0, G=2, B=1, A=255), (R=3, G=4, B=2, A=128), (PAD, PAD, PAD, PAD) // Row 1: (R=1, G=1, B=3, A=200), (R=2, G=3, B=0, A=100), (PAD, PAD, PAD, PAD) auto d_samples = thrust::device_vector<unsigned char>{ 0, 2, 1, 255, 3, 4, 2, 128, 0, 0, 0, 0, 1, 1, 3, 200, 2, 3, 0, 100, 0, 0, 0, 0}; int num_row_pixels = 2; int num_rows = 2; size_t row_stride_bytes = 3 * NUM_CHANNELS * sizeof(unsigned char); // 3 pixels wide, 2 used auto d_levels_r = thrust::device_vector<unsigned char>{0, 2, 4}; // 2 bins: [0,2), [2,4) auto d_levels_g = thrust::device_vector<unsigned char>{0, 3, 5}; // 2 bins: [0,3), [3,5) auto d_levels_b = thrust::device_vector<unsigned char>{0, 1, 2, 3}; // 3 bins: [0,1), [1,2), [2,3) cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels = {3, 3, 4}; cuda::std::array<const unsigned char*, NUM_ACTIVE_CHANNELS> d_levels = { thrust::raw_pointer_cast(d_levels_r.data()), thrust::raw_pointer_cast(d_levels_g.data()), thrust::raw_pointer_cast(d_levels_b.data())}; auto d_histogram_r = thrust::device_vector<int>(2, 0); auto d_histogram_g = thrust::device_vector<int>(2, 0); auto d_histogram_b = thrust::device_vector<int>(3, 0); cuda::std::array<int*, NUM_ACTIVE_CHANNELS> d_histogram = { thrust::raw_pointer_cast(d_histogram_r.data()), thrust::raw_pointer_cast(d_histogram_g.data()), thrust::raw_pointer_cast(d_histogram_b.data())}; cuda::stream stream{cuda::devices[0]}; cuda::stream_ref stream_ref{stream}; auto env = cuda::std::execution::env{stream_ref}; auto error = cub::DeviceHistogram::MultiHistogramRange<NUM_CHANNELS, NUM_ACTIVE_CHANNELS>( thrust::raw_pointer_cast(d_samples.data()), d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes, env); if (error != cudaSuccess) { std::cerr << "cub::DeviceHistogram::MultiHistogramRange (2D) failed with status: " << error << std::endl; } // R: 0, 3, 1, 2 → [0,2)=2, [2,4)=2 thrust::device_vector<int> expected_r{2, 2}; // G: 2, 4, 1, 3 → [0,3)=2, [3,5)=2 thrust::device_vector<int> expected_g{2, 2}; // B: 1, 2, 3, 0 → [0,1)=1, [1,2)=1, [2,3)=1 (3 is out of range) thrust::device_vector<int> expected_b{1, 1, 1};
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc.
EnvT – [inferred] Environment type (e.g.,
cuda::std::execution::env<...>)
- Parameters:
d_samples – [in] The pointer to the multi-channel input sequence of data samples.
d_histogram – [out] Array of active channel histogram counter output arrays, each of length
num_levels[channel] - 1.num_levels – [in] Array of the number of boundaries (levels) for each active channel.
d_levels – [in] Array of pointers to the arrays of boundaries (levels) for each active channel.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
env – [in]
[optional] Execution environment. Default is
cuda::std::execution::env{}.