Script.clc

Cluster Launch Control (CLC) instructions for dynamic work scheduling on Blackwell GPUs.

CLC enables a running kernel to cancel clusters that have not yet started, effectively implementing dynamic grid scheduling and work-stealing patterns. A scheduler CTA can request cancellation of a pending cluster, and if successful, take over that cluster’s work.

The workflow is:

  1. try_cancel() — asynchronously request cancellation. An opaque 16-byte response is written to shared memory, tracked by an mbarrier.

  2. mbarrier.wait() — wait for the response to arrive.

  3. query_response() — decode the response to check if cancellation succeeded and retrieve the canceled cluster’s CTA coordinates.

If cancellation succeeds, the scheduler can use the returned CTA ID to compute the work tile that the canceled cluster would have processed, and execute that work itself. If it fails (the cluster already started), the scheduler can retry or proceed with other work.

With multicast=True, the response is broadcast to all CTAs in the requesting cluster, so all CTAs can independently query the result.

Instructions

try_cancel(response, mbarrier, multicast)

Request cancellation of a cluster that has not yet been launched.

query_response(response)

Query the response from a cluster launch control try_cancel operation.