cuda.core.utils.prefetch_batch#

cuda.core.utils.prefetch_batch(
stream: Stream | GraphBuilder,
buffers: Sequence[Buffer],
locations: Device | Host | Sequence[Device | Host],
) None#

Prefetch a batch of managed-memory ranges to target locations.

Requires CUDA 13+. For a single buffer, use ManagedBuffer.prefetch() instead.

Parameters:
  • stream (Stream | GraphBuilder) – Stream for the asynchronous prefetch. First positional, required (mirrors launch()).

  • buffers (Sequence[Buffer]) – Two or more managed allocations to operate on.

  • locations (Device | Host | Sequence[…]) – Target location(s). A single location applies to all buffers; a sequence must match len(buffers).

Notes

On a CUDA 12 build, falls back to a Python-level loop calling cuMemPrefetchAsync per buffer (no batched driver entry point on CUDA 12). CUDA 13 builds use cuMemPrefetchBatchAsync directly.