Script.clc.try_cancel¶
- Script.clc.try_cancel(response, mbarrier, multicast)[source]¶
Request cancellation of a cluster that has not yet been launched.
This instruction asynchronously requests the cancellation of a cluster that has not started running yet. It writes an opaque 16-byte response to shared memory indicating whether the operation succeeded or failed. The completion of the asynchronous operation is tracked using the provided mbarrier.
On success, the response contains the CTA ID of the first CTA of the canceled cluster. No other successful response from other try_cancel operations from the same grid will contain that ID.
The response can be decoded using the query_response method to determine if the cancellation was successful and to retrieve the CTA ID of the first CTA in the canceled cluster.
Important: If the executing CTA has already observed the completion of a try_cancel instruction as failed, then issuing a subsequent try_cancel instruction results in undefined behavior.
- Parameters:
response (SharedTensor) – A naturally aligned 16-byte wide shared memory tensor where the request’s response will be written. Must be in .shared::cta state space.
mbarrier (Expr | RegisterTensor) – The mbarrier object used to track completion of the asynchronous operation. This instruction automatically performs an mbarrier arrive operation combined with an expect-tx operation on the mbarrier, setting the transaction count to 16 bytes. When the asynchronous write to response completes, a complete-tx operation with completeCount equal to 16 bytes will be performed on this mbarrier, decrementing the tx-count by 16 bytes and potentially allowing the mbarrier to transition to the next phase once both tx-count and pending arrivals reach zero.
multicast (Expr | bool) – If True, the response is asynchronously written using weak async-proxy writes to the corresponding local shared memory address of each CTA in the requesting cluster. In multicast mode, for each CTA in the cluster, an mbarrier arrive operation combined with an expect-tx operation (16 bytes) is performed on that CTA’s mbarrier. The completion of the writes to each CTA is signaled via a complete-tx operation to the mbarrier object on that CTA’s shared memory. When using multicast, at least 32 threads are required in the current thread group, and the behavior is undefined if any CTA in the cluster has exited. If False, a single mbarrier arrive with expect-tx operation is performed on the local mbarrier, and the response is written only to the local shared memory of the calling CTA.
- Return type:
None
Notes
Thread group: Can be executed by any sized thread group (at least 32 threads if
multicast=True).Hardware: Requires compute capability 10.0+ (sm_100).
PTX:
clusterlaunchcontrol.try_cancelThis instruction performs an mbarrier arrive operation combined with an expect-tx operation (16 bytes) before issuing the cancellation request. The mbarrier’s tx-count is increased by 16 bytes when the instruction is issued, and decreased by 16 bytes when the response write completes asynchronously.