Devices#
cuda::device_ref#
cuda::device_ref is a lightweight, non-owning handle to a CUDA device ordinal.
It offers:
get(): native device ordinalname(): device nameinit(): initialize the device contextpeers(): list peers for which peer access can be enabledhas_peer_access_to(device_ref): query if peer access can be enabled to the given deviceattribute(attr)/attribute<::cudaDeviceAttr>(): attribute queries
Availability: CCCL 3.1.0 / CUDA 13.1
cuda::devices#
cuda::devices is a random-access view of all available CUDA devices in form of cuda::device_ref objects`. It provides indexing, size, and iteration for use
in range-based loops.
Availability: CCCL 3.1.0 / CUDA 13.1
Example:
#include <cuda/devices>
#include <iostream>
void print_devices() {
for (auto& dev : cuda::devices) {
std::cout << "Device " << dev.get() << ": " << dev.name() << std::endl;
}
}
Device attributes#
cuda::device_attributes provides strongly-typed attribute query objects usable with
device_ref::attribute. Selected examples:
compute_capabilitymultiprocessor_countconcurrent_managed_accessclock_ratenuma_id
Availability: CCCL 3.1.0 / CUDA 13.1
Example:
#include <cuda/devices>
int get_max_blocks_on_device(cuda::device_ref dev) {
return cuda::device_attributes::multiprocessor_count(dev) * cuda::device_attributes::blocks_per_multiprocessor(dev);
}
cuda::arch_traits#
Per-architecture trait accessors providing limits and capabilities common to all devices of an architecture.
Compared to device_attributes, cuda::arch_traits provide a compile-time accessible structure that describes common characteristics of all devices of an architecture, while attributes are run-time queries of a single characteristic of a specific device.
cuda::arch_traits<cuda::arch_id::sm_80>()(compile-time) orcuda::arch_traits_for(cuda::arch_id)/cuda::arch_traits_for(cuda::compute_capability)(run-time).Returns a
cuda::arch_traits_twith fields likemax_threads_per_block,max_shared_memory_per_block,cluster_supportedand other capability flags.Traits for the current architecture can be accessed with
cuda::device::current_arch_traits()
Availability: CCCL 3.1.0 / CUDA 13.1
Example:
#include <cuda/devices>
template <cuda::arch_id Arch>
__device__ void fn() {
auto traits = cuda::arch_traits<Arch>();
if constexpr (traits.cluster_supported) {
// cluster specific code
}
else {
// non-cluster code
}
}
__global__ void kernel() {
fn<cuda::arch_id::sm_90>();
}