cub::DeviceHistogram#
-
struct DeviceHistogram#
DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
Overview#
A histogram counts the number of observations that fall into each of the disjoint categories (known as bins).
Usage Considerations#
Dynamic parallelism. DeviceHistogram methods can be called within kernel code on devices in which CUDA dynamic parallelism is supported.
Evenly-segmented bin ranges
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_samples,
- cudaStream_t stream = 0,
Computes an intensity histogram from a sequence of data samples using equal-width bins.
The number of histogram bins is (
num_levels - 1
)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)
.If the common type of
SampleT
andLevelT
is of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level)
, round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.The ranges
[d_samples, d_samples + num_samples)
and[d_histogram, d_histogram + num_levels - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, 0.3, 2.9, 2.1, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_samples – [in] The number of input samples (i.e., the length of
d_samples
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- LevelT lower_level,
- LevelT upper_level,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Computes an intensity histogram from a sequence of data samples using equal-width bins.
A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins is (
num_levels - 1
)All bins comprise the same width of sample values:
(upper_level - lower_level) / (num_levels - 1)
If the common type of
SampleT
andLevelT
is of integral type, the bin for a sample is computed as(sample - lower_level) * (num_levels - 1) / (upper_level - lower_level)
, round down to the nearest whole number. To protect against potential overflows, if the product(upper_level - lower_level) * (num_levels - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
r
in[0, num_rows)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
androw_end = row_begin + num_row_samples
. The ranges[row_begin, row_end)
and[d_histogram, d_histogram + num_levels - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; size_t row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.1, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.1, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels; // e.g., 7 (seven level boundaries for six bins) float lower_level; // e.g., 0.0 (lower sample value boundary of lowest bin) float upper_level; // e.g., 12.0 (upper sample value boundary of upper bin) ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramEven( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_pixels,
- cudaStream_t stream = 0,
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., only RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i])
, round down to the nearest whole number. To protect against potential overflows, if, for any channeli
, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given channel
c
in[0, NUM_ACTIVE_CHANNELS)
, the ranges[d_samples, d_samples + NUM_CHANNELS * num_pixels)
and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of three 256-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), // (0, 6, 7, 5), (3, 0, 2, 6)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_pixels); // d_histogram <-- [ [1, 0, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 3, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 0, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should be num_levels[i] - 1`.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT lower_level[NUM_ACTIVE_CHANNELS],
- const LevelT upper_level[NUM_ACTIVE_CHANNELS],
- OffsetT num_pixels,
- cudaStream_t stream = 0,
Deprecate [Since 3.0].
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> lower_level,
- ::cuda::std::array<LevelT, NUM_ACTIVE_CHANNELS> upper_level,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using equal-width bins.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., only RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
If the common type of sample and level is of integral type, the bin for a sample is computed as
(sample - lower_level[i]) * (num_levels - 1) / (upper_level[i] - lower_level[i])
, round down to the nearest whole number. To protect against potential overflows, if, for any channeli
, the product(upper_level[i] - lower_level[i]) * (num_levels[i] - 1)
exceeds the number representable by anuint64_t
, the cuda errorcudaErrorInvalidValue
is returned. If the common type is 128 bits wide, bin computation will use 128-bit arithmetic andcudaErrorInvalidValue
will only be returned if bin computation would overflow for 128-bit arithmetic.For a given row
r
in[0, num_rows)
, and samples
in[0, num_row_pixels)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
,sample_begin = row_begin + s * NUM_CHANNELS
, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS
. For a given channelc
in[0, NUM_ACTIVE_CHANNELS)
, the ranges[sample_begin, sample_end)
and[d_histogram[c], d_histogram[c] + num_levels[c] - 1)
shall not overlap in any way.cuda::std::common_type<LevelT, SampleT>
must be valid, and both LevelT and SampleT must be valid arithmetic types. The common type must be convertible toint
and trivially copyable.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of three 256-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5), (3, 0, 2, 1), (7, 0, 6, 2), (-, -, -, -), // (0, 6, 7, 5), (3, 0, 2, 6), (1, 1, 1, 1), (-, -, -, -)] int* d_histogram[3]; // e.g., three device pointers to three device buffers, // each allocated with 256 integer counters int num_levels[3]; // e.g., {257, 257, 257}; unsigned int lower_level[3]; // e.g., {0, 0, 0}; unsigned int upper_level[3]; // e.g., {256, 256, 256}; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramEven<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, lower_level, upper_level, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [1, 1, 1, 2, 0, 0, 0, 1, 0, 0, 0, ..., 0], // [0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, ..., 0], // [0, 1, 2, 0, 0, 0, 1, 2, 0, 0, 0, ..., 0] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.lower_level – [in] The lower sample value bound (inclusive) for the lowest histogram bin in each active channel.
upper_level – [in] The upper sample value bound (exclusive) for the highest histogram bin in each active channel.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramEven( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT lower_level[NUM_ACTIVE_CHANNELS],
- const LevelT upper_level[NUM_ACTIVE_CHANNELS],
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Deprecate [Since 3.0].
Custom bin ranges
end member group
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_samples,
- cudaStream_t stream = 0,
Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
The number of histogram bins is (
num_levels - 1
)The value range for bini is
[level[i], level[i+1])
The range
[d_histogram, d_histogram + num_levels - 1)
shall not overlap[d_samples, d_samples + num_samples)
nor[d_levels, d_levels + num_levels)
in any way. The ranges[d_levels, d_levels + num_levels)
and[d_samples, d_samples + num_samples)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of an six-bin histogram from a sequence of float samples
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histogram int num_samples; // e.g., 10 float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, 0.3, 2.9, 2.0, 6.1, 999.5] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float* d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_samples); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_samples – [in] The number of data samples per row in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t HistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram,
- int num_levels,
- const LevelT *d_levels,
- OffsetT num_row_samples,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Computes an intensity histogram from a sequence of data samples using the specified bin boundary levels.
A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins is (
num_levels - 1
)The value range for bini is
[level[i], level[i+1])
For a given row
r
in[0, num_rows)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
androw_end = row_begin + num_row_samples
. The range[d_histogram, d_histogram + num_levels - 1)
shall not overlap[row_begin, row_end)
nor[d_levels, d_levels + num_levels)
. The ranges[d_levels, d_levels + num_levels)
and[row_begin, row_end)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of a six-bin histogram from a 2x5 region of interest within a flattened 2x7 array of float samples.
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input samples and // output histogram int num_row_samples; // e.g., 5 int num_rows; // e.g., 2; int row_stride_bytes; // e.g., 7 * sizeof(float) float* d_samples; // e.g., [2.2, 6.0, 7.1, 2.9, 3.5, -, -, // 0.3, 2.9, 2.0, 6.1, 999.5, -, -] int* d_histogram; // e.g., [ -, -, -, -, -, -] int num_levels // e.g., 7 (seven level boundaries for six bins) float *d_levels; // e.g., [0.0, 2.0, 4.0, 6.0, 8.0, 12.0, 16.0] ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::HistogramRange( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_samples, num_rows, row_stride_bytes); // d_histogram <-- [1, 5, 0, 3, 0, 0];
- Template Parameters:
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the input sequence of data samples.
d_histogram – [out] The pointer to the histogram counter output array of length
num_levels - 1
.num_levels – [in] The number of boundaries (levels) for delineating histogram samples. Implies that the number of bins is
num_levels - 1
.d_levels – [in] The pointer to the array of boundaries (levels). Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_samples – [in] The number of data samples per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_pixels,
- cudaStream_t stream = 0,
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., RGB histograms from RGBA pixel samples).The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
For given channels
c1
andc2
in[0, NUM_ACTIVE_CHANNELS)
, the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)
shall not overlap[d_samples, d_samples + NUM_CHANNELS * num_pixels)
nor[d_levels[c2], d_levels[c2] + num_levels[c2])
in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])
and[d_samples, d_samples + NUM_CHANNELS * num_pixels)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of three 4-bin RGB histograms from a quad-channel sequence of RGBA pixels (8 bits per channel per pixel)
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for // input samples and output histograms int num_pixels; // e.g., 5 unsigned char *d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(7, 0, 6, 2), // (0, 6, 7, 5),(3, 0, 2, 6)] unsigned int *d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int *d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_pixels); // d_histogram <-- [ [1, 3, 0, 1], // [3, 0, 0, 2], // [0, 2, 0, 3] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_pixels – [in] The number of multi-channel pixels (i.e., the length of
d_samples / NUM_CHANNELS
)stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT *const d_levels[NUM_ACTIVE_CHANNELS],
- OffsetT num_pixels,
- cudaStream_t stream = 0,
Deprecate [Since 3.0].
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- ::cuda::std::array<CounterT*, NUM_ACTIVE_CHANNELS> d_histogram,
- ::cuda::std::array<int, NUM_ACTIVE_CHANNELS> num_levels,
- ::cuda::std::array<const LevelT*, NUM_ACTIVE_CHANNELS> d_levels,
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Computes per-channel intensity histograms from a sequence of multi-channel “pixel” data samples using the specified bin boundary levels.
The input is a sequence of pixel structures, where each pixel comprises a record of
NUM_CHANNELS
consecutive data samples (e.g., an RGBA pixel).NUM_CHANNELS
can be up to 4.Of the
NUM_CHANNELS
specified, the function will only compute histograms for the firstNUM_ACTIVE_CHANNELS
(e.g., RGB histograms from RGBA pixel samples).A two-dimensional region of interest within
d_samples
can be specified using thenum_row_samples
,num_rows
, androw_stride_bytes
parameters.The row stride must be a whole multiple of the sample data type size, i.e.,
(row_stride_bytes % sizeof(SampleT)) == 0
.The number of histogram bins for channeli is
num_levels[i] - 1
.For channeli, the range of values for all histogram bins have the same width:
(upper_level[i] - lower_level[i]) / (num_levels[i] - 1)
For a given row
r
in[0, num_rows)
, and samples
in[0, num_row_pixels)
, letrow_begin = d_samples + r * row_stride_bytes / sizeof(SampleT)
,sample_begin = row_begin + s * NUM_CHANNELS
, andsample_end = sample_begin + NUM_ACTIVE_CHANNELS
. For given channelsc1
andc2
in[0, NUM_ACTIVE_CHANNELS)
, the range[d_histogram[c1], d_histogram[c1] + num_levels[c1] - 1)
shall not overlap[sample_begin, sample_end)
nor[d_levels[c2], d_levels[c2] + num_levels[c2])
in any way. The ranges[d_levels[c2], d_levels[c2] + num_levels[c2])
and[sample_begin, sample_end)
may overlap.When
d_temp_storage
isnullptr
, no work is done and the required allocation size is returned intemp_storage_bytes
.
Snippet#
The code snippet below illustrates the computation of three 4-bin RGB histograms from a 2x3 region of interest of within a flattened 2x4 array of quad-channel RGBA pixels (8 bits per channel per pixel).
#include <cub/cub.cuh> // or equivalently <cub/device/device_histogram.cuh> // Declare, allocate, and initialize device-accessible pointers for input // samples and output histograms int num_row_pixels; // e.g., 3 int num_rows; // e.g., 2 size_t row_stride_bytes; // e.g., 4 * sizeof(unsigned char) * NUM_CHANNELS unsigned char* d_samples; // e.g., [(2, 6, 7, 5),(3, 0, 2, 1),(1, 1, 1, 1),(-, -, -, -), // (7, 0, 6, 2),(0, 6, 7, 5),(3, 0, 2, 6),(-, -, -, -)] int* d_histogram[3]; // e.g., [[ -, -, -, -],[ -, -, -, -],[ -, -, -, -]]; int num_levels[3]; // e.g., {5, 5, 5}; unsigned int* d_levels[3]; // e.g., [ [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8], // [0, 2, 4, 6, 8] ]; ... // Determine temporary device storage requirements void* d_temp_storage = nullptr; size_t temp_storage_bytes = 0; cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // Allocate temporary storage cudaMalloc(&d_temp_storage, temp_storage_bytes); // Compute histograms cub::DeviceHistogram::MultiHistogramRange<4, 3>( d_temp_storage, temp_storage_bytes, d_samples, d_histogram, num_levels, d_levels, num_row_pixels, num_rows, row_stride_bytes); // d_histogram <-- [ [2, 3, 0, 1], // [3, 0, 0, 2], // [1, 2, 0, 3] ]
- Template Parameters:
NUM_CHANNELS – Number of channels interleaved in the input data (may be greater than the number of channels being actively histogrammed)
NUM_ACTIVE_CHANNELS – [inferred] Number of channels actively being histogrammed
SampleIteratorT – [inferred] Random-access input iterator type for reading input samples. (may be a simple pointer type)
CounterT – [inferred] Integer type for histogram bin counters
LevelT – [inferred] Type for specifying boundaries (levels)
OffsetT – [inferred] Signed integer type for sequence offsets, list lengths, pointer differences, etc. (Consider using 32-bit values as offsets/lengths/etc. For example,
int
will typically yield better performance thansize_t
in 64-bit memory mode.)
- Parameters:
d_temp_storage – [in] Device-accessible allocation of temporary storage. When
nullptr
, the required allocation size is written totemp_storage_bytes
and no work is done.temp_storage_bytes – [inout] Reference to size in bytes of
d_temp_storage
allocationd_samples – [in] The pointer to the multi-channel input sequence of data samples. The samples from different channels are assumed to be interleaved (e.g., an array of 32-bit pixels where each pixel consists of four RGBA 8-bit samples).
d_histogram – [out]
The pointers to the histogram counter output arrays, one for each active channel. For channeli, the allocation length of
d_histogram[i]
should benum_levels[i] - 1
.num_levels – [in]
The number of boundaries (levels) for delineating histogram samples in each active channel. Implies that the number of bins for channeli is
num_levels[i] - 1
.d_levels – [in] The pointers to the arrays of boundaries (levels), one for each active channel. Bin ranges are defined by consecutive boundary pairings: lower sample value boundaries are inclusive and upper sample value boundaries are exclusive.
num_row_pixels – [in] The number of multi-channel pixels per row in the region of interest
num_rows – [in] The number of rows in the region of interest
row_stride_bytes – [in] The number of bytes between starts of consecutive rows in the region of interest
stream – [in]
[optional] CUDA stream to launch kernels within. Default is stream0.
-
template<int NUM_CHANNELS, int NUM_ACTIVE_CHANNELS, typename SampleIteratorT, typename CounterT, typename LevelT, typename OffsetT>
static inline cudaError_t MultiHistogramRange( - void *d_temp_storage,
- size_t &temp_storage_bytes,
- SampleIteratorT d_samples,
- CounterT *d_histogram[NUM_ACTIVE_CHANNELS],
- const int num_levels[NUM_ACTIVE_CHANNELS],
- const LevelT *const d_levels[NUM_ACTIVE_CHANNELS],
- OffsetT num_row_pixels,
- OffsetT num_rows,
- size_t row_stride_bytes,
- cudaStream_t stream = 0,
Deprecate [Since 3.0].