CUDA Utils#

namespace trt_edgellm

Functions

template<typename T1, typename T2>
inline size_t divUp(
const T1 &a,
const T2 &n
)#

Divide and round up utility function.

Computes ceiling division: (a + n - 1) / n

Template Parameters:
  • T1 – Type of dividend

  • T2 – Type of divisor

Parameters:
  • a – Dividend

  • n – Divisor

Returns:

Ceiling of a/n

inline int getSMVersion()#

Get CUDA compute capability version.

Returns the compute capability as an integer (e.g., 89 for SM 8.9).

Returns:

Compute capability version (major * 10 + minor)

inline cudaError_t instantiateCudaGraph(
cudaGraphExec_t *exec,
cudaGraph_t graph
)#

Instantiate a CUDA graph with handling CUDA version.

This function wraps cudaGraphInstantiate and abstracts away the API difference between CUDA versions before and after 12.0. For CUDA < 12.0, it uses the legacy signature with extra arguments; for CUDA >= 12.0, it uses the simplified signature.

Parameters:
  • exec – Pointer to the cudaGraphExec_t to be created.

  • graph – The cudaGraph_t to instantiate.

Returns:

cudaError_t indicating success or failure of the instantiation.