On-disk Kernel Caching
When the cache
keyword argument of the @cuda.jit
decorator is True
, a file-based cache is enabled. This shortens compilation
times when the function was already compiled in a previous invocation.
The cache is maintained in the __pycache__
subdirectory of the directory
containing the source file; if the current user is not allowed to write to it,
the cache implementation falls back to a platform-specific user-wide cache
directory (such as $HOME/.cache/numba
on Unix platforms).
Compute capability considerations
Separate cache files are maintained for each compute capability. When a cached kernel is loaded, the compute capability of the device the kernel is first launched on in the current run is used to determine which version to load. Therefore, on systems that have multiple GPUs with differing compute capabilities, the cached versions of kernels are only used for one compute capability, and recompilation will occur for other compute capabilities.
For example: if a system has two GPUs, one of compute capability 7.5 and one of 8.0, then:
If a cached kernel is first launched on the CC 7.5 device, then the cached version for CC 7.5 is used. If it is subsequently launched on the CC 8.0 device, a recompilation will occur.
If in a subsequent run the cached kernel is first launched on the CC 8.0 device, then the cached version for CC 8.0 is used. A subsequent launch on the CC 7.5 device will require a recompilation.
This limitation is not expected to present issues in most practical scenarios, as multi-GPU production systems tend to have identical GPUs within each node.