Memory Management
- numba.cuda.to_device(obj, stream=0, copy=True, to=None)
Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = np.arange(10) d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream() d_ary = cuda.to_device(ary, stream=stream)
The resulting
d_ary
is aDeviceNDArray
.To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype) d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
- numba.cuda.device_array(shape, dtype=np.float64, strides=None, order='C', stream=0)
Allocate an empty device ndarray. Similar to
numpy.empty()
.
- numba.cuda.device_array_like(ary, stream=0)
Call
device_array()
with information from the array.
- numba.cuda.pinned_array(shape, dtype=np.float64, strides=None, order='C')
Allocate an
ndarray
with a buffer that is pinned (pagelocked). Similar tonp.empty()
.
- numba.cuda.pinned_array_like(ary)
Call
pinned_array()
with the information from the array.
- numba.cuda.mapped_array(shape, dtype=np.float64, strides=None, order='C', stream=0, portable=False, wc=False)
Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()
- Parameters:
portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
- numba.cuda.mapped_array_like(ary, stream=0, portable=False, wc=False)
Call
mapped_array()
with the information from the array.
- numba.cuda.managed_array(shape, dtype=np.float64, strides=None, order='C', stream=0, attach_global=True)
Allocate a np.ndarray with a buffer that is managed. Similar to np.empty().
Managed memory is supported on Linux / x86 and PowerPC, and is considered experimental on Windows and Linux / AArch64.
- Parameters:
attach_global – A flag indicating whether to attach globally. Global attachment implies that the memory is accessible from any stream on any device. If
False
, attachment is host, and memory is only accessible by devices with Compute Capability 6.0 and later.
- numba.cuda.pinned(*arylist)
A context manager for temporary pinning a sequence of host ndarrays.
- numba.cuda.mapped(*arylist, **kws)
A context manager for temporarily mapping a sequence of host ndarrays.
Device Objects
- class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, gpu_data=None)
An on-GPU array type
- copy_to_device(ary, stream=0)
Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
- copy_to_host(ary=None, stream=0)
Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
- is_c_contiguous()
Return true if the array is C-contiguous.
- is_f_contiguous()
Return true if the array is Fortran-contiguous.
- ravel(order='C', stream=0)
Flattens a contiguous array without changing its contents, similar to
numpy.ndarray.ravel()
. If the array is not contiguous, raises an exception.
- reshape(*newshape, **kws)
Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:d_arr = d_arr.reshape(20, 50, order='F')
- split(section, stream=0)
Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
- class numba.cuda.cudadrv.devicearray.DeviceRecord(dtype, stream=0, gpu_data=None)
An on-GPU record type
- copy_to_device(ary, stream=0)
Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
- copy_to_host(ary=None, stream=0)
Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
- class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, gpu_data=None)
A host array that uses CUDA mapped memory.
- copy_to_device(ary, stream=0)
Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
- copy_to_host(ary=None, stream=0)
Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
- split(section, stream=0)
Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.