Script.cluster

Script.cluster

Block cluster instructions for multi-CTA coordination on Hopper+ GPUs.

A cluster is a group of thread blocks (CTAs) that can directly access each other’s shared memory and synchronize collectively. Clusters are configured at launch time via self.attrs.cluster_blocks.

This instruction group provides:

  • Synchronization: sync() is a cluster-wide barrier — all threads across all CTAs in the cluster must arrive before any can proceed.

  • Introspection: blockIdx, clusterDim, and blockRank provide the current CTA’s position and rank within the cluster.

  • Cross-CTA addressing: map_shared_addr() translates a shared memory address from the current CTA’s address space to another CTA’s, enabling direct remote shared memory access (e.g., signaling a peer CTA’s mbarrier).

Instructions

sync()

Synchronize all thread blocks in the current cluster.

map_shared_addr(addr, target_rank)

Map shared memory address(es) to the corresponding address(es) in another CTA's shared memory.

Properties

blockIdx

The block index within the cluster.

blockRank

The linear rank of the current block within the cluster.

clusterDim

The dimensions of the cluster.