warp.capture_begin#

warp.capture_begin(
device=None,
stream=None,
force_module_load=None,
external=False,
apic=False,
)[source]#

Begin capture of a graph.

Captures all subsequent kernel launches and memory operations. On CUDA devices, operations are captured by the CUDA driver into a native graph. On CPU devices, there is no native graph equivalent; operations are always recorded into an APIC (API Capture) byte stream, which capture_launch() replays.

If apic=True, APIC recording is also performed alongside the CUDA native graph, and the result can be serialized to a .wrp file via capture_save(). The flag has no effect on CPU (recording is always on there) beyond gating whether capture_save() is allowed.

Parameters:
  • device (Device | str | None) – The device to capture on (CUDA or CPU).

  • stream (Stream | None) – The CUDA stream to capture on (CUDA only).

  • force_module_load (bool | None) – Whether to force loading of all kernels before capture. In general it is better to use load_module() to selectively load kernels. When running with CUDA drivers that support CUDA 12.3 or newer, this option is not recommended to be set to True because kernels can be loaded during graph capture on more recent drivers. If this argument is None, then the behavior inherits from warp.config.enable_graph_capture_module_load_by_default if the driver is older than CUDA 12.3.

  • external (bool) – Whether the capture was already started externally (CUDA only).

  • apic (bool) – Whether to allow capture_save() on the captured graph. On CUDA this also enables APIC byte-stream recording during the capture; on CPU, recording happens regardless because it is the only replay mechanism.