cub::MaxSmOccupancy
Defined in cub/util_device.cuh
-
template<typename KernelPtr>
inline cudaError_t cub::MaxSmOccupancy(int &max_sm_occupancy, KernelPtr kernel_ptr, int block_threads, int dynamic_smem_bytes = 0) Computes maximum SM occupancy in thread blocks for executing the given kernel function pointer
kernel_ptr
on the current device withblock_threads
per thread block.- Snippet
The code snippet below illustrates the use of the MaxSmOccupancy function.
#include <cub/cub.cuh> // or equivalently <cub/util_device.cuh> template <typename T> __global__ void ExampleKernel() { // Allocate shared memory for BlockScan __shared__ volatile T buffer[4096]; ... } ... // Determine SM occupancy for ExampleKernel specialized for unsigned char int max_sm_occupancy; MaxSmOccupancy(max_sm_occupancy, ExampleKernel<unsigned char>, 64); // max_sm_occupancy <-- 4 on SM10 // max_sm_occupancy <-- 8 on SM20 // max_sm_occupancy <-- 12 on SM35
- Parameters
max_sm_occupancy – [out] maximum number of thread blocks that can reside on a single SM
kernel_ptr – [in] Kernel pointer for which to compute SM occupancy
block_threads – [in] Number of threads per thread block
dynamic_smem_bytes – [in] Dynamically allocated shared memory in bytes. Default is 0.