cub::MaxSmOccupancy

Defined in cub/util_device.cuh

template<typename KernelPtr> inline cudaError_t cub::MaxSmOccupancy(int &max_sm_occupancy, KernelPtr kernel_ptr, int block_threads, int dynamic_smem_bytes = 0)

Computes maximum SM occupancy in thread blocks for executing the given kernel function pointer kernel_ptr on the current device with block_threads per thread block.

Snippet

The code snippet below illustrates the use of the MaxSmOccupancy function.

#include <cub/cub.cuh>   // or equivalently <cub/util_device.cuh>

template <typename T>
__global__ void ExampleKernel()
{
    // Allocate shared memory for BlockScan
    __shared__ volatile T buffer[4096];

       ...
}

    ...

// Determine SM occupancy for ExampleKernel specialized for unsigned char
int max_sm_occupancy;
MaxSmOccupancy(max_sm_occupancy, ExampleKernel<unsigned char>, 64);

// max_sm_occupancy  <-- 4 on SM10
// max_sm_occupancy  <-- 8 on SM20
// max_sm_occupancy  <-- 12 on SM35

Parameters

max_sm_occupancy – [out] maximum number of thread blocks that can reside on a single SM
kernel_ptr – [in] Kernel pointer for which to compute SM occupancy
block_threads – [in] Number of threads per thread block
dynamic_smem_bytes – [in] Dynamically allocated shared memory in bytes. Default is 0.