warp.capture_begin#

warp.capture_begin(
device=None,
stream=None,
force_module_load=None,
external=False,
)[source]#

Begin capture of a CUDA graph

Captures all subsequent kernel launches and memory operations on CUDA devices. This can be used to record large numbers of kernels and replay them with low overhead.

If device is specified, the capture will begin on the CUDA stream currently associated with the device. If stream is specified, the capture will begin on the given stream. If both are omitted, the capture will begin on the current stream of the current device.

Parameters:
  • device (Device | str | None) – The CUDA device to capture on

  • stream (Stream | None) – The CUDA stream to capture on

  • force_module_load (bool | None) – Whether to force loading of all kernels before capture. In general it is better to use load_module() to selectively load kernels. When running with CUDA drivers that support CUDA 12.3 or newer, this option is not recommended to be set to True because kernels can be loaded during graph capture on more recent drivers. If this argument is None, then the behavior inherits from wp.config.enable_graph_capture_module_load_by_default if the driver is older than CUDA 12.3.

  • external (bool) – Whether the capture was already started externally