CUDA Utils#
-
namespace trt_edgellm
Functions
-
template<typename T1, typename T2>
inline size_t divUp(
)# Divide and round up utility function.
Computes ceiling division: (a + n - 1) / n
- Template Parameters:
T1 – Type of dividend
T2 – Type of divisor
- Parameters:
a – Dividend
n – Divisor
- Returns:
Ceiling of a/n
-
inline int getSMVersion()#
Get CUDA compute capability version.
Returns the compute capability as an integer (e.g., 89 for SM 8.9).
- Returns:
Compute capability version (major * 10 + minor)
- inline cudaError_t instantiateCudaGraph(
- cudaGraphExec_t *exec,
- cudaGraph_t graph
Instantiate a CUDA graph with handling CUDA version.
This function wraps cudaGraphInstantiate and abstracts away the API difference between CUDA versions before and after 12.0. For CUDA < 12.0, it uses the legacy signature with extra arguments; for CUDA >= 12.0, it uses the simplified signature.
- Parameters:
exec – Pointer to the cudaGraphExec_t to be created.
graph – The cudaGraph_t to instantiate.
- Returns:
cudaError_t indicating success or failure of the instantiation.
-
template<typename T1, typename T2>