Global Variables and Captured Values#

Numba CUDA kernels and device functions can reference global variables defined at module scope. This section describes how these values are captured and the implications for your code.

Capture as constants#

By default, global variables referenced in kernels are captured as constants at compilation time. This applies to scalars and host arrays (e.g. NumPy arrays).

The following example demonstrates this behavior. Both TAX_RATE and PRICES are captured when the kernel is first compiled. Because they are embedded as constants, modifications to these variables after compilation have no effect—the second kernel call still uses the original values:

Demonstrating constant capture of global variables#
 1import numpy as np
 2from numba import cuda
 3
 4TAX_RATE = 0.08
 5PRICES = np.array([10.0, 25.0, 5.0, 15.0, 30.0], dtype=np.float64)
 6
 7@cuda.jit
 8def compute_totals(quantities, totals):
 9    i = cuda.grid(1)
10    if i < totals.size:
11        totals[i] = quantities[i] * PRICES[i] * (1 + TAX_RATE)
12
13d_quantities = cuda.to_device(
14    np.array([1, 2, 3, 4, 5], dtype=np.float64)
15)
16d_totals = cuda.device_array(5, dtype=np.float64)
17
18# First kernel call - compiles and captures values
19compute_totals[1, 32](d_quantities, d_totals)
20print("Value of d_totals:", d_totals.copy_to_host())
21
22# These modifications have no effect on subsequent kernel calls
23TAX_RATE = 0.10  # noqa: F841
24PRICES[:] = [20.0, 50.0, 10.0, 30.0, 60.0]
25
26# Second kernel call still uses the original values
27compute_totals[1, 32](d_quantities, d_totals)
28print("Value of d_totals:", d_totals.copy_to_host())

Running the above code prints:

Value of d_totals: [ 10.8  54.   16.2  64.8 162. ]
Value of d_totals: [ 10.8  54.   16.2  64.8 162. ]

Note that both outputs are identical—the modifications to TAX_RATE and PRICES after the first kernel call have no effect.

This behaviour is useful for small amounts of truly constant data like configuration values, lookup tables, or mathematical constants. For larger arrays, consider using device arrays instead.

Device array capture#

Device arrays are an exception to the constant capture rule. When a kernel references a global device array—any object implementing __cuda_array_interface__, such as CuPy arrays or Numba device arrays—the device pointer is captured rather than the data. No copy occurs, and modifications to the array are visible to subsequent kernel calls.

The following example demonstrates this behavior. The global PRICES device array is mutated after the first kernel call, and the second kernel call sees the updated values:

Demonstrating device array capture by pointer#
 1import numpy as np
 2from numba import cuda
 3
 4# Global device array - pointer is captured, not data
 5PRICES = cuda.to_device(
 6    np.array([10.0, 25.0, 5.0, 15.0, 30.0], dtype=np.float32)
 7)
 8
 9@cuda.jit
10def compute_totals(quantities, totals):
11    i = cuda.grid(1)
12    if i < totals.size:
13        totals[i] = quantities[i] * PRICES[i]
14
15d_quantities = cuda.to_device(
16    np.array([1.0, 1.0, 1.0, 1.0, 1.0], dtype=np.float32)
17)
18d_totals = cuda.device_array(5, dtype=np.float32)
19
20# First kernel call
21compute_totals[1, 32](d_quantities, d_totals)
22print(d_totals.copy_to_host())  # [10. 25.  5. 15. 30.]
23
24# Mutate the device array in-place
25PRICES.copy_to_device(
26    np.array([20.0, 50.0, 10.0, 30.0, 60.0], dtype=np.float32)
27)
28
29# Second kernel call sees the updated values
30compute_totals[1, 32](d_quantities, d_totals)
31print(d_totals.copy_to_host())  # [20. 50. 10. 30. 60.]

Running the above code prints:

[10. 25.  5. 15. 30.]
[20. 50. 10. 30. 60.]

Note that the outputs are different—the mutation to PRICES after the first kernel call is visible to the second call, unlike with host arrays.

This makes device arrays suitable for global state that needs to be updated between kernel calls without recompilation.

Note

Kernels and device functions that capture global device arrays cannot use cache=True. Because the device pointer is embedded in the compiled code, caching would serialize an invalid pointer. Attempting to cache such a kernel will raise a PicklingError. See On-disk Kernel Caching for more information on kernel caching.