warp.capture_begin#
- warp.capture_begin(
- device=None,
- stream=None,
- force_module_load=None,
- external=False,
Begin capture of a CUDA graph
Captures all subsequent kernel launches and memory operations on CUDA devices. This can be used to record large numbers of kernels and replay them with low overhead.
If device is specified, the capture will begin on the CUDA stream currently associated with the device. If stream is specified, the capture will begin on the given stream. If both are omitted, the capture will begin on the current stream of the current device.
- Parameters:
device (Device | str | None) – The CUDA device to capture on
stream (Stream | None) – The CUDA stream to capture on
force_module_load (bool | None) – Whether to force loading of all kernels before capture. In general it is better to use
load_module()to selectively load kernels. When running with CUDA drivers that support CUDA 12.3 or newer, this option is not recommended to be set toTruebecause kernels can be loaded during graph capture on more recent drivers. If this argument isNone, then the behavior inherits fromwp.config.enable_graph_capture_module_load_by_defaultif the driver is older than CUDA 12.3.external (bool) – Whether the capture was already started externally