cuda.core.utils.FileStreamProgramCache#

class cuda.core.utils.FileStreamProgramCache( path: str | PathLike | None = None, *, max_size_bytes: int | None = None, )#

Persistent program cache backed by a directory of atomic files.

Designed for multi-process use: writes stage a temporary file and then os.replace() it into place, so concurrent readers never observe a partially-written entry. Each entry on disk is the raw compiled binary – cubin / PTX / LTO-IR – with no header, framing, or pickle wrapper, so the files are directly consumable by external NVIDIA tools (cuobjdump, nvdisasm, cuda-gdb).

Eviction is by least-recently-read time: every successful read bumps the entry’s atime, and the size enforcer evicts oldest atime first.

Note

Best-effort writes.

On Windows, os.replace raises PermissionError (winerror 32 / 33) when another process holds the target file open. This backend retries with bounded backoff (~185 ms) and, if still failing, drops the cache write silently and returns success-shaped control flow. The next call will see no entry and recompile. POSIX and other PermissionError codes propagate.

Note

Atomic for readers, not crash-durable.

Each entry’s temp file is fsync-ed before os.replace, but the containing directory is not fsync-ed. A host crash between write and the next directory commit may lose recently added entries; surviving entries remain consistent.

Note

Cross-version sharing.

The cache is safe to share across cuda.core patch releases: every key produced by make_program_cache_key() encodes the relevant backend/compiler/runtime fingerprints for its compilation path (NVRTC entries pin the NVRTC version, NVVM entries pin the libNVVM library and IR versions, PTX/linker entries pin the chosen linker backend and its version – and, when the cuLink/driver backend is selected, the driver version too; nvJitLink-backed PTX entries are deliberately driver-version independent). Bumping _KEY_SCHEMA_VERSION (mixed into the digest by make_program_cache_key) produces new keys that don’t collide with old entries: post-bump lookups miss the old on-disk paths, and the orphaned files are reaped on the next size-cap eviction pass. Entries are stored verbatim as the compiled binary, so cross-patch sharing only requires that the compiler-pinning surface above stays stable – there is no Python-pickle compatibility involved.

Parameters:

path – Directory that owns the cache. Created if missing. If omitted, the OS-conventional user cache directory is used: $XDG_CACHE_HOME/cuda-python/program-cache (Linux, defaulting to ~/.cache/cuda-python/program-cache) or %LOCALAPPDATA%\cuda-python\program-cache (Windows).
max_size_bytes – Optional soft cap on total on-disk size. Enforced opportunistically on writes; concurrent writers may briefly exceed it. Eviction is by least-recently-read time (oldest st_atime first).

Methods

__init__( path: str | PathLike | None = None, *, max_size_bytes: int | None = None, ) → None#

clear() → None#: Remove every entry from the cache.

close() → None#

Release backend resources.

The default implementation does nothing. Subclasses that hold long-lived state (open file handles, database connections, network sockets, …) should override this to release them.

Callers should use the context-manager form (with cache:) or call close() explicitly when finished, so code stays portable across backends that do hold resources.

get( key: bytes | str, default: bytes | None = None, ) → bytes | None#: Return self[key] or default if absent.

Bulk __setitem__.

Accepts a mapping or an iterable of (key, value) pairs. Each write goes through __setitem__ so backend-specific value coercion (e.g. extracting bytes from an ObjectCode) and size-cap enforcement run on every entry. Not transactional – a failure mid-iteration leaves earlier writes committed.