cuda::stream_ref: a wrapper around a cudaStream_t

CUDA stream-ordered allocations rely on cudaStream_t as a handle to the cuda stream.

However, as this is just an alias for a plain pointer type it carries with it a lot of common pitfals around implicit conversions from e.g nullptr or a literal 0.

These hard to spot bugs can be avoided through cuda::stream_ref, which is a simple wrapper around a cudaStream_t that prevents implicit conversions. It also provides the wait() and ready() member functions to facilitate waiting for a stream to finish and checking whether it is finished.

cudaStream_t stream;
cudaStreamCreate(&stream);
cuda::stream_ref ref{stream};

ref.wait();          // synchronizes the stream via cudaStreamSynchronize
assert(ref.ready()); // verifies that the stream has finished all operations via cudaStreamQuery
cudaStreamDestroy(stream);