cub::MaxSmOccupancy

Defined in /home/runner/work/cccl/cccl/cub/cub/util_device.cuh

template<typename KernelPtr>
inline cudaError_t cub::MaxSmOccupancy(int &max_sm_occupancy, KernelPtr kernel_ptr, int block_threads, int dynamic_smem_bytes = 0)

Computes maximum SM occupancy in thread blocks for executing the given kernel function pointer kernel_ptr on the current device with block_threads per thread block.

Snippet

The code snippet below illustrates the use of the MaxSmOccupancy function.

#include <cub/cub.cuh>   // or equivalently <cub/util_device.cuh>

template <typename T>
__global__ void ExampleKernel()
{
    // Allocate shared memory for BlockScan
    __shared__ volatile T buffer[4096];

       ...
}

    ...

// Determine SM occupancy for ExampleKernel specialized for unsigned char
int max_sm_occupancy;
MaxSmOccupancy(max_sm_occupancy, ExampleKernel<unsigned char>, 64);

// max_sm_occupancy  <-- 4 on SM10
// max_sm_occupancy  <-- 8 on SM20
// max_sm_occupancy  <-- 12 on SM35

Parameters
  • max_sm_occupancy[out] maximum number of thread blocks that can reside on a single SM

  • kernel_ptr[in] Kernel pointer for which to compute SM occupancy

  • block_threads[in] Number of threads per thread block

  • dynamic_smem_bytes[in] Dynamically allocated shared memory in bytes. Default is 0.