cuda.core.experimental._module.KernelOccupancy¶

class cuda.core.experimental._module.KernelOccupancy(*args, **kwargs)¶

Methods

__init__()¶

available_dynamic_shared_memory_per_block(num_blocks_per_multiprocessor: int, block_size: int) → int¶

Dynamic shared memory available per block for given launch configuration.

The amount of dynamic shared memory per block, in bytes, for given kernel launch configuration.

Parameters:

num_blocks_per_multiprocessor (int) – Number of blocks to be concurrently executing on a multiprocessor.
block_size (int) – Block size parameter used to launch this kernel.

Returns:

Dynamic shared memory available per block for given launch configuration.

Return type:

int

max_active_blocks_per_multiprocessor(block_size: int, dynamic_shared_memory_size: int) → int¶

Occupancy of the kernel.

Returns the maximum number of active blocks per multiprocessor for this kernel.

Parameters:

block_size (int) – Block size parameter used to launch this kernel.
dynamic_shared_memory_size (int) – The amount of dynamic shared memory in bytes needed by block. Use 0 if block does not need shared memory.

Returns:

The maximum number of active blocks per multiprocessor.

Return type:

int

Note

The fraction of the product of maximum number of active blocks per multiprocessor and the block size to the maximum number of threads per multiprocessor is known as theoretical multiprocessor utilization (occupancy).

max_active_clusters(config: LaunchConfig, stream: Stream | None = None) → int¶

Maximum number of active clusters on the target device.

The maximum number of clusters that could concurrently execute on the target device.

Parameters:

config (LaunchConfig) – Kernel launch configuration.
stream (Stream, optional) – The stream on which this kernel is to be launched.

Returns:

The maximum number of clusters that could co-exist on the target device.

Return type:

int

max_potential_block_size(dynamic_shared_memory_needed: int | CUoccupancyB2DSize, block_size_limit: int) → MaxPotential¶

MaxPotentialBlockSizeOccupancyResult: Suggested launch configuration for reasonable occupancy.

Returns the minimum grid size needed to achieve the maximum occupancy and the maximum block size that can achieve the maximum occupancy.

Parameters:

dynamic_shared_memory_needed (Union[int, driver.CUoccupancyB2DSize]) – The amount of dynamic shared memory in bytes needed by block. Use 0 if block does not need shared memory. Use C-callable represented by CUoccupancyB2DSize to encode amount of needed dynamic shared memory which varies depending on tne block size.
block_size_limit (int) – Known upper limit on the kernel block size. Use 0 to indicate the maximum block size permitted by the device / kernel instead

Returns:

An object with min_grid_size amd max_block_size attributes encoding the suggested launch configuration.

Return type:

MaxPotentialBlockSizeOccupancyResult

Note

Please be advised that use of C-callable that requires Python Global Interpreter Lock may lead to deadlocks.

max_potential_cluster_size(config: LaunchConfig, stream: Stream | None = None) → int¶

Maximum potential cluster size.

The maximum potential cluster size for this kernel and given launch configuration.

Parameters:

config (LaunchConfig) – Kernel launch configuration. Cluster dimensions in the configuration are ignored.
stream (Stream, optional) – The stream on which this kernel is to be launched.

Returns:

The maximum cluster size that can be launched for this kernel and launch configuration.

Return type:

int

Attributes