cuda::experimental::stf::cuda_try#

Overloads#

cuda_try(status, loc=::cuda::std::source_location::current())#

template<typename Status>
void cuda::experimental::stf::cuda_try(
Status status,
const ::cuda::std::source_location loc = ::cuda::std::source_location::current()
)

Throws a cuda_exception if the given status is an error code.

The typical usage is to place a CUDA function call inside cuda_try, i.e. cuda_try(cudaFunc(args)) (the same way cuda_safe_call would be called). For example, cuda_try(cudaCreateStream(&stream)) is equivalent to cudaCreateStream(&stream), with the note that the former call throws an exception in case of error.

  cuda_try(CUDA_SUCCESS); // no effect, returns CUDA_SUCCESS
  int dev;
  cuda_try(cudaGetDevice(&dev)); // equivalent to the line above
  try
  {
    cuda_try(CUDA_ERROR_INVALID_VALUE); // would abort application if called
  }
  catch (...)
  {
    // This point will be reached
    return;
  }
  EXPECT(false, "Should not get here.");
Template Parameters:

Status – CUDA error code type, such as cudaError_t, cublasStatus_t, or cusolverStatus_t

Parameters:
  • status – CUDA error code value, usually the result of a CUDA API call

  • loc – location of the call, defaulted

cuda_try(ps)#

template<auto fun, typename ...Ps>
auto cuda::experimental::stf::cuda_try(
Ps&&... ps
)

Calls a CUDA function with optional output-parameter inference and throws a cuda_exception on failure.

Calls fun and translates a non-zero CUDA status into a thrown cuda_exception. Three call shapes are supported, selected at compile time in the following order:

  1. Direct form. If fun(ps...) is invocable, the call is made directly and the status is checked. The return type is void.

  2. First-parameter output form. Otherwise, if fun's first parameter is a non-const pointer (an output pointer by CUDA convention) and fun(&result, ps...) is invocable, a temporary result of the pointee type is value-initialized, the call is made, and result is returned. This matches CUDA APIs like cudaStreamCreate(cudaStream_t*), cudaGraphAddEmptyNode(cudaGraphNode_t*, ...), and cudaDeviceCanAccessPeer(int*, ...).

  3. Last-parameter output form. Otherwise, if fun's last parameter is a non-const pointer and fun(ps..., &result) is invocable, the temporary is appended instead and returned. This matches CUDA APIs like cuStreamGetId(CUstream, unsigned long long*) and cuCtxGetId(CUcontext, unsigned long long*).

If none of the three forms apply, compilation fails with a static_assert that says no valid invocation form exists for the given function and arguments.

Ambiguity rejection. When fun has non-const pointer parameters in both the first and last positions, the same user arguments can satisfy both the first- and last-parameter output forms with different effects (for example, cudaMemGetInfo(size_t* free, size_t* total) called with one user-supplied size_t*). In that case a static_assert rejects the call with the message:

"Ambiguous cuda_try: both first- and last-output forms apply; call the function explicitly to disambiguate."
The single zero-argument case (cuda_try<fun>()) is exempt because the synthesized call fun(&result) is identical for both interpretations.

  int dev = cuda_try<cudaGetDevice>(); // continue execution if the call is successful
  cuda_try(cudaGetDevice(&dev)); // equivalent to the line above
  EXPECT(cuda_try<test_first_output_param>() == 1);
  EXPECT(cuda_try<test_last_output_param>(2.0) == 2);
Examples
auto dev = cuda_try<cudaGetDevice>();                          // first-parameter output form
auto id  = cuda_try<cuStreamGetId>(some_cu_stream);            // last-parameter output form
cuda_try<cudaSetDevice>(0);                                    // direct form, returns void
cuda_try(cudaSetDevice(0));                                    // equivalent runtime-status overload
Limitations

  • Overloaded functions are not supported. CUDA’s templated wrappers in cuda_runtime.h (e.g. cudaMalloc, cudaMallocHost, cudaMallocManaged, cudaMallocAsync, cudaHostAlloc) are overloads and must be invoked using the runtime-status overload, e.g. cuda_try(cudaMalloc(&p, n)).

  • The synthesized output parameter must be a non-const pointer; in/out parameters expressed as pointer-to-existing-storage are not synthesized and must be passed explicitly.

  • In ambiguous cases (see above) the call must be written explicitly via the runtime-status overload.

Template Parameters:
  • fun – The CUDA function to invoke. Must not be an overloaded name (templated overloads in cuda_runtime.h such as cudaMalloc/cudaMallocHost/cudaMallocAsync therefore do not work).

  • Ps – Argument types deduced from ps.

Parameters:

ps – Arguments forwarded to fun.

Returns:

void if fun does not have a synthesized output parameter (see below); otherwise the value of the synthesized output parameter.