cuda.core.graph.KernelNode#
- class cuda.core.graph.KernelNode#
A kernel launch node.
Properties#
- gridtuple of int
Grid dimensions (gridDimX, gridDimY, gridDimZ).
- blocktuple of int
Block dimensions (blockDimX, blockDimY, blockDimZ).
- shmem_sizeint
Dynamic shared memory size in bytes.
- kernelKernel
The kernel object for this launch node.
- configLaunchConfig
A LaunchConfig reconstructed from this node’s parameters.
Methods
- __init__()#
- alloc(self, size_t size, options=None) AllocNode#
Add a memory allocation node depending on this node.
- Parameters:
size (int) – Number of bytes to allocate.
options (GraphAllocOptions, optional) – Allocation options. If None, allocates on the current device.
- Returns:
A new AllocNode representing the allocation. Access the allocated device pointer via the dptr property.
- Return type:
- callback(self, fn, *, user_data=None) HostCallbackNode#
Add a host callback node depending on this node.
The callback runs on the host CPU when the graph reaches this node. Two modes are supported:
Python callable: Pass any callable. The GIL is acquired automatically. The callable must take no arguments; use closures or
functools.partialto bind state.ctypes function pointer: Pass a
ctypes.CFUNCTYPEinstance. The function receives a singlevoid*argument (theuser_data). The caller must keep the ctypes wrapper alive for the lifetime of the graph.
Warning
Callbacks must not call CUDA API functions. Doing so may deadlock or corrupt driver state.
- Parameters:
fn (callable or ctypes function pointer) – The callback function.
user_data (int or bytes-like, optional) – Only for ctypes function pointers. If
int, passed as a raw pointer (caller manages lifetime). If bytes-like, the data is copied and its lifetime is tied to the graph.
- Returns:
A new HostCallbackNode representing the callback.
- Return type:
- destroy(self)#
Destroy this node and remove all its edges from the parent graph.
After this call,
is_validreturnsFalseand the node cannot be re-added to any graph. Safe to call on an already-destroyed node (no-op).
- embed(self, GraphDef child: GraphDef) ChildGraphNode#
Add a child graph node depending on this node.
Embeds a clone of the given graph definition as a sub-graph node. The child graph must not contain allocation, free, or conditional nodes.
- Parameters:
child (GraphDef) – The graph definition to embed (will be cloned).
- Returns:
A new ChildGraphNode representing the embedded sub-graph.
- Return type:
- if_cond(self, Condition condition: Condition) IfNode#
Add an if-conditional node depending on this node.
The body graph executes only when the condition evaluates to a non-zero value at runtime.
- Parameters:
condition (Condition) – Condition from
GraphDef.create_condition().- Returns:
A new IfNode with one branch accessible via
.then.- Return type:
- if_else(self, Condition condition: Condition) IfElseNode#
Add an if-else conditional node depending on this node.
Two body graphs: the first executes when the condition is non-zero, the second when it is zero.
- Parameters:
condition (Condition) – Condition from
GraphDef.create_condition().- Returns:
A new IfElseNode with branches accessible via
.thenand.else_.- Return type:
- join(self, *nodes: GraphNode) EmptyNode#
Create an empty node that depends on this node and all given nodes.
This is used to synchronize multiple branches of execution.
- launch(
- self,
- LaunchConfig config: LaunchConfig,
- Kernel kernel: Kernel,
- *args,
Add a kernel launch node depending on this node.
- Parameters:
config (LaunchConfig) – Launch configuration (grid, block, shared memory, etc.)
kernel (Kernel) – The kernel to launch.
*args – Kernel arguments.
- Returns:
A new KernelNode representing the kernel launch.
- Return type:
- memcpy(
- self,
- int dst: int,
- int src: int,
- size_t size,
Add a memcpy node depending on this node.
Copies
sizebytes fromsrctodst. Memory types are auto-detected via the driver, so both device and pinned host pointers are supported.- Parameters:
- Returns:
A new MemcpyNode representing the copy operation.
- Return type:
- memset(
- self,
- int dst: int,
- value,
- size_t width,
- size_t height=1,
- size_t pitch=0,
Add a memset node depending on this node.
- Parameters:
dst (int) – Destination device pointer.
value (int or buffer-protocol object) – Fill value. int for 1-byte fill (range [0, 256)), or buffer-protocol object of 1, 2, or 4 bytes.
width (int) – Width of the row in elements.
height (int, optional) – Number of rows (default 1).
pitch (int, optional) – Pitch of destination in bytes (default 0, unused if height is 1).
- Returns:
A new MemsetNode representing the memset operation.
- Return type:
- record_event(self, Event event: Event) EventRecordNode#
Add an event record node depending on this node.
- Parameters:
event (Event) – The event to record.
- Returns:
A new EventRecordNode representing the event record operation.
- Return type:
- switch(
- self,
- Condition condition: Condition,
- unsigned int count,
Add a switch conditional node depending on this node.
The condition value selects which branch to execute. If the value is out of range, no branch executes.
- Parameters:
condition (Condition) – Condition from
GraphDef.create_condition().count (int) – Number of switch cases (branches).
- Returns:
A new SwitchNode with branches accessible via
.branches.- Return type:
- wait_event(self, Event event: Event) EventWaitNode#
Add an event wait node depending on this node.
- Parameters:
event (Event) – The event to wait for.
- Returns:
A new EventWaitNode representing the event wait operation.
- Return type:
- while_loop(
- self,
- Condition condition: Condition,
Add a while-loop conditional node depending on this node.
The body graph executes repeatedly while the condition evaluates to a non-zero value.
- Parameters:
condition (Condition) – Condition from
GraphDef.create_condition().- Returns:
A new WhileNode with body accessible via
.body.- Return type:
Attributes
- block#
tuple
Block dimensions as a 3-tuple (blockDimX, blockDimY, blockDimZ).
- Type:
KernelNode.block
- config#
LaunchConfig
A LaunchConfig reconstructed from this node’s grid, block, and shmem_size.
Note: cluster dimensions and cooperative_launch are not preserved by the CUDA driver’s kernel node params, so they are not included.
- Type:
KernelNode.config
- graph#
‘GraphDef’
Return the GraphDef this node belongs to.
- Type:
GraphNode.graph
- grid#
tuple
Grid dimensions as a 3-tuple (gridDimX, gridDimY, gridDimZ).
- Type:
KernelNode.grid
- handle#
driver.CUgraphNode
Return the underlying driver CUgraphNode handle.
Returns None for the entry node.
- Type:
GraphNode.handle
- is_valid#
Whether this node is valid (not destroyed).
Returns
Falseafterdestroy()has been called.
- kernel#
Kernel
The Kernel object for this launch node.
- Type:
KernelNode.kernel
- pred#
A mutable set-like view of this node’s predecessors.
- shmem_size#
int
Dynamic shared memory size in bytes.
- Type:
KernelNode.shmem_size
- succ#
A mutable set-like view of this node’s successors.
- type#
Return the CUDA graph node type.
- Returns:
The node type enum value, or None for the entry node.
- Return type:
CUgraphNodeType or None