cub::DeviceHistogram
Defined in cub/device/device_histogram.cuh
-
struct DeviceHistogram
DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
Overview
A histogram counts the number of observations that fall into each of the disjoint categories (known as bins).
Usage Considerations
Dynamic parallelism. DeviceHistogram methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Evenly-segmented bin ranges
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram, int num_levels, LevelT lower_level, LevelT upper_level, OffsetT num_samples, cudaStream_t stream = 0) Computes an intensity histogram from a sequence of data samples using equal-width bins.
The number of histogram bins is (
num_levels - 1
)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)
.If the common type of
SampleT
andLevelT
is of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level)
, round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.The ranges
[d_samples, d_samples + num_samples)
and[d_histogram, d_histogram + num_levels - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of a six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, 0.3, 2.9, 2.1, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_samples – [in] The number of input samples (i.e., the length of
d_samples
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram, int num_levels, LevelT lower_level, LevelT upper_level, OffsetT num_row_samples, OffsetT num_rows, size_t row_stride_bytes, cudaStream_t stream = 0) Computes an intensity histogram from a sequence of data samples using equal-width bins.
A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins is (
num_levels - 1
)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)
If the common type of
SampleT
andLevelT
is of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level)
, round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
r
in[0, num_rows)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
androw_end = row_begin + num_row_samples
. The ranges[row_begin, row_end)
and[d_histogram, d_histogram + num_levels - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; size_t row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.1, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram[NUM_ACTIVE_CHANNELS], const int num_levels[NUM_ACTIVE_CHANNELS], const LevelT lower_level[NUM_ACTIVE_CHANNELS], const LevelT upper_level[NUM_ACTIVE_CHANNELS], OffsetT num_pixels, cudaStream_t stream = 0) Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., only RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i])
, round down to the nearest whole number. To protect against potential overflows, if, for any channeli
, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given channel
c
in[0, NUM_ACTIVE_CHANNELS)
, the ranges[d_samples, d_samples + NUM_CHANNELS * num_pixels)
and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of three 256-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), // (0, 6, 7, 5), (3, 0, 2, 6)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 0, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should be num_levels[i] - 1`.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram[NUM_ACTIVE_CHANNELS], const int num_levels[NUM_ACTIVE_CHANNELS], const LevelT lower_level[NUM_ACTIVE_CHANNELS], const LevelT upper_level[NUM_ACTIVE_CHANNELS], OffsetT num_row_pixels, OffsetT num_rows, size_t row_stride_bytes, cudaStream_t stream = 0) Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., only RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i])
, round down to the nearest whole number. To protect against potential overflows, if, for any channeli
, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
r
in[0, num_rows)
, and samples
in[0, num_row_pixels)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
,sample_begin = row_begin + s * NUM_CHANNELS
, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS
. For a given channelc
in[0, NUM_ACTIVE_CHANNELS)
, the ranges[sample_begin, sample_end)
and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of three 256-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), (-, -, -, -), // (0, 6, 7, 5), (3, 0, 2, 6), (1, 1, 1, 1), (-, -, -, -)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [1, 1, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 1, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
Custom bin ranges
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram, int num_levels, const LevelT *d_levels, OffsetT num_samples, cudaStream_t stream = 0) Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
The number of histogram bins is (
num_levels - 1
)The value range for bini is
[level[i], level[i+1])
The range
[d_histogram, d_histogram + num_levels - 1)
shall not overlap[d_samples, d_samples + num_samples)
nor[d_levels, d_levels + num_levels)
in any way. The ranges[d_levels, d_levels + num_levels)
and[d_samples, d_samples + num_samples)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of an six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, 0.3, 2.9, 2.0, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float* d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_samples – [in] The number of data samples per row in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram, int num_levels, const LevelT *d_levels, OffsetT num_row_samples, OffsetT num_rows, size_t row_stride_bytes, cudaStream_t stream = 0) Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins is (
num_levels - 1
)The value range for bini is
[level[i], level[i+1])
For a given row
r
in[0, num_rows)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
androw_end = row_begin + num_row_samples
. The range[d_histogram, d_histogram + num_levels - 1)
shall not overlap[row_begin, row_end)
nor[d_levels, d_levels + num_levels)
. The ranges[d_levels, d_levels + num_levels)
and[row_begin, row_end)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input samples and // output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; int row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.0, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float *d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram[NUM_ACTIVE_CHANNELS], const int num_levels[NUM_ACTIVE_CHANNELS], const LevelT *const d_levels[NUM_ACTIVE_CHANNELS], OffsetT num_pixels, cudaStream_t stream = 0) Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
For given channels
c1
andc2
in[0, NUM_ACTIVE_CHANNELS)
, the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)
shall not overlap[d_samples, d_samples + NUM_CHANNELS * num_pixels)
nor[d_levels[c2], d_levels[c2] + num_levels[c2])
in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])
and[d_samples, d_samples + NUM_CHANNELS * num_pixels)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of three 4-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char *d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(7, 0, 6, 2), // (0, 6, 7, 5),(3, 0, 2, 6)] unsigned int *d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int *d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // d_histogram <-- [ [1, 3, 0, 1], // [3, 0, 0, 2], // [0, 2, 0, 3] ]
- Template Parameters
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange(void *d_temp_storage, size_t &temp_storage_bytes, SampleIteratorT d_samples, CounterT *d_histogram[NUM_ACTIVE_CHANNELS], const int num_levels[NUM_ACTIVE_CHANNELS], const LevelT *const d_levels[NUM_ACTIVE_CHANNELS], OffsetT num_row_pixels, OffsetT num_rows, size_t row_stride_bytes, cudaStream_t stream = 0) Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
For a given row
r
in[0, num_rows)
, and samples
in[0, num_row_pixels)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
,sample_begin = row_begin + s * NUM_CHANNELS
, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS
. For given channelsc1
andc2
in[0, NUM_ACTIVE_CHANNELS)
, the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)
shall not overlap[sample_begin, sample_end)
nor[d_levels[c2], d_levels[c2] + num_levels[c2])
in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])
and[sample_begin, sample_end)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet
The code snippet below illustrates the computation of three 4-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(1, 1, 1, 1),(-, -, -, -), // (7, 0, 6, 2),(0, 6, 7, 5),(3, 0, 2, 6),(-, -, -, -)] int* d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int* d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [2, 3, 0, 1], // [3, 0, 0, 2], // [1, 2, 0, 3] ]
- Template Parameters
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.