cuda.core.utils.FileStreamProgramCache#
- class cuda.core.utils.FileStreamProgramCache( )#
Persistent program cache backed by a directory of atomic files.
Designed for multi-process use: writes stage a temporary file and then
os.replace()it into place, so concurrent readers never observe a partially-written entry. Each entry on disk is the raw compiled binary – cubin / PTX / LTO-IR – with no header, framing, or pickle wrapper, so the files are directly consumable by external NVIDIA tools (cuobjdump,nvdisasm,cuda-gdb).Eviction is by least-recently-read time: every successful read bumps the entry’s
atime, and the size enforcer evicts oldest atime first.Note
Best-effort writes.
On Windows,
os.replaceraisesPermissionError(winerror 32 / 33) when another process holds the target file open. This backend retries with bounded backoff (~185 ms) and, if still failing, drops the cache write silently and returns success-shaped control flow. The next call will see no entry and recompile. POSIX and otherPermissionErrorcodes propagate.Note
Atomic for readers, not crash-durable.
Each entry’s temp file is
fsync-ed beforeos.replace, but the containing directory is notfsync-ed. A host crash between write and the next directory commit may lose recently added entries; surviving entries remain consistent.Note
Cross-version sharing.
The cache is safe to share across
cuda.corepatch releases: every key produced bymake_program_cache_key()encodes the relevant backend/compiler/runtime fingerprints for its compilation path (NVRTC entries pin the NVRTC version, NVVM entries pin the libNVVM library and IR versions, PTX/linker entries pin the chosen linker backend and its version – and, when the cuLink/driver backend is selected, the driver version too; nvJitLink-backed PTX entries are deliberately driver-version independent). Bumping_KEY_SCHEMA_VERSION(mixed into the digest bymake_program_cache_key) produces new keys that don’t collide with old entries: post-bump lookups miss the old on-disk paths, and the orphaned files are reaped on the next size-cap eviction pass. Entries are stored verbatim as the compiled binary, so cross-patch sharing only requires that the compiler-pinning surface above stays stable – there is no Python-pickle compatibility involved.- Parameters:
path – Directory that owns the cache. Created if missing. If omitted, the OS-conventional user cache directory is used:
$XDG_CACHE_HOME/cuda-python/program-cache(Linux, defaulting to~/.cache/cuda-python/program-cache) or%LOCALAPPDATA%\cuda-python\program-cache(Windows).max_size_bytes – Optional soft cap on total on-disk size. Enforced opportunistically on writes; concurrent writers may briefly exceed it. Eviction is by least-recently-read time (oldest
st_atimefirst).
Methods
- close() None#
Release backend resources.
The default implementation does nothing. Subclasses that hold long-lived state (open file handles, database connections, network sockets, …) should override this to release them.
Callers should use the context-manager form (
with cache:) or callclose()explicitly when finished, so code stays portable across backends that do hold resources.
- update(
- items: Mapping[bytes | str, bytes | bytearray | memoryview | ObjectCode] | Iterable[tuple[bytes | str, bytes | bytearray | memoryview | ObjectCode]],
- /,
Bulk
__setitem__.Accepts a mapping or an iterable of
(key, value)pairs. Each write goes through__setitem__so backend-specific value coercion (e.g. extracting bytes from anObjectCode) and size-cap enforcement run on every entry. Not transactional – a failure mid-iteration leaves earlier writes committed.