cuda::experimental::stf::cuda_try#
Overloads#
cuda_try(status, loc=::cuda::std::source_location::current())#
-
template<typename Status>
void cuda::experimental::stf::cuda_try( - Status status,
- const ::cuda::std::source_location loc = ::cuda::std::source_location::current()
Throws a
cuda_exceptionif the givenstatusis an error code.The typical usage is to place a CUDA function call inside
cuda_try, i.e.cuda_try(cudaFunc(args))(the same waycuda_safe_callwould be called). For example,cuda_try(cudaCreateStream(&stream))is equivalent tocudaCreateStream(&stream), with the note that the former call throws an exception in case of error.cuda_try(CUDA_SUCCESS); // no effect, returns CUDA_SUCCESS int dev; cuda_try(cudaGetDevice(&dev)); // equivalent to the line above try { cuda_try(CUDA_ERROR_INVALID_VALUE); // would abort application if called } catch (...) { // This point will be reached return; } EXPECT(false, "Should not get here.");
- Template Parameters:
Status – CUDA error code type, such as
cudaError_t,cublasStatus_t, orcusolverStatus_t- Parameters:
status – CUDA error code value, usually the result of a CUDA API call
loc – location of the call, defaulted
cuda_try(ps)#
-
template<auto fun, typename ...Ps>
auto cuda::experimental::stf::cuda_try( - Ps&&... ps
Calls a CUDA function with optional output-parameter inference and throws a
cuda_exceptionon failure.Calls
funand translates a non-zero CUDA status into a throwncuda_exception. Three call shapes are supported, selected at compile time in the following order:Direct form. If
fun(ps...)is invocable, the call is made directly and the status is checked. The return type isvoid.First-parameter output form. Otherwise, if
fun'sfirst parameter is a non-constpointer (an output pointer by CUDA convention) andfun(&result, ps...)is invocable, a temporaryresultof the pointee type is value-initialized, the call is made, andresultis returned. This matches CUDA APIs likecudaStreamCreate(cudaStream_t*),cudaGraphAddEmptyNode(cudaGraphNode_t*, ...), andcudaDeviceCanAccessPeer(int*, ...).Last-parameter output form. Otherwise, if
fun'slast parameter is a non-constpointer andfun(ps..., &result)is invocable, the temporary is appended instead and returned. This matches CUDA APIs likecuStreamGetId(CUstream, unsigned long long*)andcuCtxGetId(CUcontext, unsigned long long*).
If none of the three forms apply, compilation fails with a
static_assertthat says no valid invocation form exists for the given function and arguments.Ambiguity rejection. When
funhas non-constpointer parameters in both the first and last positions, the same user arguments can satisfy both the first- and last-parameter output forms with different effects (for example,cudaMemGetInfo(size_t* free, size_t* total)called with one user-suppliedsize_t*). In that case astatic_assertrejects the call with the message:The single zero-argument case ("Ambiguous cuda_try: both first- and last-output forms apply; call the function explicitly to disambiguate."cuda_try<fun>()) is exempt because the synthesized callfun(&result)is identical for both interpretations.int dev = cuda_try<cudaGetDevice>(); // continue execution if the call is successful cuda_try(cudaGetDevice(&dev)); // equivalent to the line above EXPECT(cuda_try<test_first_output_param>() == 1); EXPECT(cuda_try<test_last_output_param>(2.0) == 2);
- Examples
auto dev = cuda_try<cudaGetDevice>(); // first-parameter output form auto id = cuda_try<cuStreamGetId>(some_cu_stream); // last-parameter output form cuda_try<cudaSetDevice>(0); // direct form, returns void cuda_try(cudaSetDevice(0)); // equivalent runtime-status overload
- Limitations
Overloaded functions are not supported. CUDA’s templated wrappers in
cuda_runtime.h(e.g.cudaMalloc,cudaMallocHost,cudaMallocManaged,cudaMallocAsync,cudaHostAlloc) are overloads and must be invoked using the runtime-status overload, e.g.cuda_try(cudaMalloc(&p, n)).The synthesized output parameter must be a non-
constpointer; in/out parameters expressed as pointer-to-existing-storage are not synthesized and must be passed explicitly.In ambiguous cases (see above) the call must be written explicitly via the runtime-status overload.
- Template Parameters:
fun – The CUDA function to invoke. Must not be an overloaded name (templated overloads in
cuda_runtime.hsuch ascudaMalloc/cudaMallocHost/cudaMallocAsynctherefore do not work).Ps – Argument types deduced from
ps.
- Parameters:
ps – Arguments forwarded to
fun.- Returns:
voidiffundoes not have a synthesized output parameter (see below); otherwise the value of the synthesized output parameter.