Global Variables and Captured Values#
Numba CUDA kernels and device functions can reference global variables defined at module scope. This section describes how these values are captured and the implications for your code.
Capture as constants#
By default, global variables referenced in kernels are captured as constants at compilation time. This applies to scalars and host arrays (e.g. NumPy arrays).
The following example demonstrates this behavior. Both TAX_RATE and
PRICES are captured when the kernel is first compiled. Because they are
embedded as constants, modifications to these variables after compilation
have no effect—the second kernel call still uses the original values:
1import numpy as np
2from numba import cuda
3
4TAX_RATE = 0.08
5PRICES = np.array([10.0, 25.0, 5.0, 15.0, 30.0], dtype=np.float64)
6
7@cuda.jit
8def compute_totals(quantities, totals):
9 i = cuda.grid(1)
10 if i < totals.size:
11 totals[i] = quantities[i] * PRICES[i] * (1 + TAX_RATE)
12
13d_quantities = cuda.to_device(
14 np.array([1, 2, 3, 4, 5], dtype=np.float64)
15)
16d_totals = cuda.device_array(5, dtype=np.float64)
17
18# First kernel call - compiles and captures values
19compute_totals[1, 32](d_quantities, d_totals)
20print("Value of d_totals:", d_totals.copy_to_host())
21
22# These modifications have no effect on subsequent kernel calls
23TAX_RATE = 0.10 # noqa: F841
24PRICES[:] = [20.0, 50.0, 10.0, 30.0, 60.0]
25
26# Second kernel call still uses the original values
27compute_totals[1, 32](d_quantities, d_totals)
28print("Value of d_totals:", d_totals.copy_to_host())
Running the above code prints:
Value of d_totals: [ 10.8 54. 16.2 64.8 162. ]
Value of d_totals: [ 10.8 54. 16.2 64.8 162. ]
Note that both outputs are identical—the modifications to TAX_RATE and
PRICES after the first kernel call have no effect.
This behaviour is useful for small amounts of truly constant data like configuration values, lookup tables, or mathematical constants. For larger arrays, consider using device arrays instead.
Device array capture#
Device arrays are an exception to the constant capture rule. When a kernel
references a global device array—any object implementing
__cuda_array_interface__, such as CuPy arrays or Numba device arrays—the
device pointer is captured rather than the data. No copy occurs, and
modifications to the array are visible to subsequent kernel calls.
The following example demonstrates this behavior. The global PRICES device
array is mutated after the first kernel call, and the second kernel call sees
the updated values:
1import numpy as np
2from numba import cuda
3
4# Global device array - pointer is captured, not data
5PRICES = cuda.to_device(
6 np.array([10.0, 25.0, 5.0, 15.0, 30.0], dtype=np.float32)
7)
8
9@cuda.jit
10def compute_totals(quantities, totals):
11 i = cuda.grid(1)
12 if i < totals.size:
13 totals[i] = quantities[i] * PRICES[i]
14
15d_quantities = cuda.to_device(
16 np.array([1.0, 1.0, 1.0, 1.0, 1.0], dtype=np.float32)
17)
18d_totals = cuda.device_array(5, dtype=np.float32)
19
20# First kernel call
21compute_totals[1, 32](d_quantities, d_totals)
22print(d_totals.copy_to_host()) # [10. 25. 5. 15. 30.]
23
24# Mutate the device array in-place
25PRICES.copy_to_device(
26 np.array([20.0, 50.0, 10.0, 30.0, 60.0], dtype=np.float32)
27)
28
29# Second kernel call sees the updated values
30compute_totals[1, 32](d_quantities, d_totals)
31print(d_totals.copy_to_host()) # [20. 50. 10. 30. 60.]
Running the above code prints:
[10. 25. 5. 15. 30.]
[20. 50. 10. 30. 60.]
Note that the outputs are different—the mutation to PRICES after the first
kernel call is visible to the second call, unlike with host arrays.
This makes device arrays suitable for global state that needs to be updated between kernel calls without recompilation.
Note
Kernels and device functions that capture global device arrays cannot use
cache=True. Because the device pointer is embedded in the compiled code,
caching would serialize an invalid pointer. Attempting to cache such a kernel
will raise a PicklingError. See On-disk Kernel Caching for more information on
kernel caching.