cuda.core.experimental._module.KernelOccupancy¶
- class cuda.core.experimental._module.KernelOccupancy(*args, **kwargs)¶
Methods
- __init__()¶
Dynamic shared memory available per block for given launch configuration.
The amount of dynamic shared memory per block, in bytes, for given kernel launch configuration.
- max_active_blocks_per_multiprocessor(block_size: int, dynamic_shared_memory_size: int) int ¶
Occupancy of the kernel.
Returns the maximum number of active blocks per multiprocessor for this kernel.
- Parameters:
- Returns:
The maximum number of active blocks per multiprocessor.
- Return type:
Note
The fraction of the product of maximum number of active blocks per multiprocessor and the block size to the maximum number of threads per multiprocessor is known as theoretical multiprocessor utilization (occupancy).
- max_active_clusters(config: LaunchConfig, stream: Stream | None = None) int ¶
Maximum number of active clusters on the target device.
The maximum number of clusters that could concurrently execute on the target device.
- Parameters:
config (
LaunchConfig
) – Kernel launch configuration.stream (
Stream
, optional) – The stream on which this kernel is to be launched.
- Returns:
The maximum number of clusters that could co-exist on the target device.
- Return type:
- max_potential_block_size(dynamic_shared_memory_needed: int | CUoccupancyB2DSize, block_size_limit: int) MaxPotential ¶
MaxPotentialBlockSizeOccupancyResult: Suggested launch configuration for reasonable occupancy.
Returns the minimum grid size needed to achieve the maximum occupancy and the maximum block size that can achieve the maximum occupancy.
- Parameters:
dynamic_shared_memory_needed (Union[int, driver.CUoccupancyB2DSize]) – The amount of dynamic shared memory in bytes needed by block. Use 0 if block does not need shared memory. Use C-callable represented by
CUoccupancyB2DSize
to encode amount of needed dynamic shared memory which varies depending on tne block size.block_size_limit (int) – Known upper limit on the kernel block size. Use 0 to indicate the maximum block size permitted by the device / kernel instead
- Returns:
An object with min_grid_size amd max_block_size attributes encoding the suggested launch configuration.
- Return type:
Note
Please be advised that use of C-callable that requires Python Global Interpreter Lock may lead to deadlocks.
- max_potential_cluster_size(config: LaunchConfig, stream: Stream | None = None) int ¶
Maximum potential cluster size.
The maximum potential cluster size for this kernel and given launch configuration.
- Parameters:
config (
LaunchConfig
) – Kernel launch configuration. Cluster dimensions in the configuration are ignored.stream (
Stream
, optional) – The stream on which this kernel is to be launched.
- Returns:
The maximum cluster size that can be launched for this kernel and launch configuration.
- Return type:
Attributes