cuda.core.utils.prefetch_batch#

cuda.core.utils.prefetch_batch( stream: Stream | GraphBuilder, buffers: Sequence[Buffer], locations: Device | Host | Sequence[Device | Host], ) → None#

Prefetch a batch of managed-memory ranges to target locations.

Requires CUDA 13+. For a single buffer, use ManagedBuffer.prefetch() instead.

Parameters:

stream (Stream | GraphBuilder) – Stream for the asynchronous prefetch. First positional, required (mirrors launch()).
buffers (Sequence[Buffer]) – Two or more managed allocations to operate on.
locations (Device | Host | Sequence[…]) – Target location(s). A single location applies to all buffers; a sequence must match len(buffers).

Notes

On a CUDA 12 build, falls back to a Python-level loop calling cuMemPrefetchAsync per buffer (no batched driver entry point on CUDA 12). CUDA 13 builds use cuMemPrefetchBatchAsync directly.