Script.cluster¶
Block cluster instructions for multi-CTA coordination on Hopper+ GPUs.
A cluster is a group of thread blocks (CTAs) that can directly access each other’s
shared memory and synchronize collectively. Clusters are configured at launch time via
self.attrs.cluster_blocks.
This instruction group provides:
Synchronization:
sync()is a cluster-wide barrier — all threads across all CTAs in the cluster must arrive before any can proceed.Introspection:
blockIdx,clusterDim, andblockRankprovide the current CTA’s position and rank within the cluster.Cross-CTA addressing:
map_shared_addr()translates a shared memory address from the current CTA’s address space to another CTA’s, enabling direct remote shared memory access (e.g., signaling a peer CTA’s mbarrier).
Instructions
|
Synchronize all thread blocks in the current cluster. |
|
Map shared memory address(es) to the corresponding address(es) in another CTA's shared memory. |
Properties
The block index within the cluster. |
|
The linear rank of the current block within the cluster. |
|
The dimensions of the cluster. |