Defined in cub/util_ptx.cuh
Compute a 32b mask of threads having the same least-significant LABEL_BITS of label as the calling thread.
label