cuda::device::warp_match_all#
Defined in <cuda/warp> header.
namespace cuda::device {
template <typename T>
[[nodiscard]] __device__ bool
warp_match_all(const T& data, lane_mask = lane_mask::all());
} // namespace cuda::device
The functionality provides a generalized and safe alternative to CUDA warp match all intrinsic __match_all_sync.
The function allows bitwise comparison of any data size, including raw arrays, pointers, and structs.
Note
The underlying CUDA intrinsic does not provide memory ordering.
Parameters
data: data to compare.lane_mask: mask of the active lanes.
Return value
trueif all lanes in thelane_maskhave the same value fordata.falseotherwise.
Constraints
Tshall be trivially copyable, see cuda::is_trivially_copyable.Tshall be bitwise comparable, see cuda::is_bitwise_comparable, except when__builtin_clear_paddingis supported. In the latter case,Tcan have padding bits.
Preconditions
The functionality is only supported on
SM >= 70.lane_maskmust be non-zero.
Undefined Behavior
lane_maskmust represent a subset of the active lanes.All non-exited lanes specified by
lane_maskmust execute the function with the samelane_maskvalue.
Performance considerations
The function calls the PTX instruction
match.sync\(ceil\left(\frac{sizeof(data)}{4}\right)\) times.The function is faster when called with a mask representing all active lanes in a warp (default value of the second parameter
lane_mask).The function uses
__ballot_syncwhenTisbool.
References
Example#
#include <cuda/std/array>
#include <cuda/std/cassert>
#include <cuda/warp>
struct MyStruct {
double x; // 8 bytes
int y; // 4 bytes
}; // 4 bytes of padding
__global__ void warp_match_kernel() {
assert(cuda::device::warp_match_all(2));
assert(cuda::device::warp_match_all(2, cuda::device::lane_mask::all()));
assert(cuda::device::warp_match_all(MyStruct{1.0, 3})); // compile error, except when __builtin_clear_padding is supported
assert(!cuda::device::warp_match_all(threadIdx.x));
}
int main() {
warp_match_kernel<<<1, 32>>>();
cudaDeviceSynchronize();
return 0;
}