External Memory Management (EMM) Plugin interface
The CUDA Array Interface enables sharing of data between different Python libraries that access CUDA devices. However, each library manages its own memory distinctly from the others. For example:
By default, Numba allocates memory on CUDA devices by interacting with the CUDA driver API to call functions such as
cuMemAlloc
andcuMemFree
, which is suitable for many use cases.The RAPIDS libraries (cuDF, cuML, etc.) use the RAPIDS Memory Manager (RMM) for allocating device memory.
CuPy includes a memory pool implementation for both device and pinned memory.
When multiple CUDA-aware libraries are used together, it may be preferable for Numba to defer to another library for memory management. The EMM Plugin interface facilitates this, by enabling Numba to use another CUDA-aware library for all allocations and deallocations.
An EMM Plugin is used to facilitate the use of an external library for memory management. An EMM Plugin can be a part of an external library, or could be implemented as a separate library.
Overview of External Memory Management
When an EMM Plugin is in use (see Setting the EMM Plugin), Numba will make
memory allocations and deallocations through the Plugin. It will never directly call
functions such as cuMemAlloc
, cuMemFree
, etc.
EMM Plugins always take responsibility for the management of device memory. However, not all CUDA-aware libraries also support managing host memory, so a facility for Numba to continue the management of host memory whilst ceding control of device memory to the EMM is provided (see The Host-Only CUDA Memory Manager).
Effects on Deallocation Strategies
Numba’s internal Deallocation Behavior is designed to increase efficiency
by deferring deallocations until a significant quantity are pending. It also
provides a mechanism for preventing deallocations entirely during critical
sections, using the defer_cleanup()
context manager.
When an EMM Plugin is in use, the deallocation strategy is implemented by the EMM, and Numba’s internal deallocation mechanism is not used. The EMM Plugin could implement:
A similar strategy to the Numba deallocation behaviour, or
Something more appropriate to the plugin - for example, deallocated memory might immediately be returned to a memory pool.
The defer_cleanup
context manager may behave differently with an EMM Plugin
- an EMM Plugin should be accompanied by documentation of the behaviour of the
defer_cleanup
context manager when it is in use. For example, a pool
allocator could always immediately return memory to a pool even when the
context manager is in use, but could choose not to free empty pools until
defer_cleanup
is not in use.
Management of other objects
In addition to memory, Numba manages the allocation and deallocation of
events, streams, and modules (a module is a
compiled object, which is generated from @cuda.jit
-ted functions). The
management of events, streams, and modules is unchanged by the use of an EMM
Plugin.
Asynchronous allocation and deallocation
The present EMM Plugin interface does not provide support for asynchronous allocation and deallocation. This may be added to a future version of the interface.
Implementing an EMM Plugin
An EMM Plugin is implemented by deriving from
BaseCUDAMemoryManager
. A summary of considerations for the
implementation follows:
Numba instantiates one instance of the EMM Plugin class per context. The context that owns an EMM Plugin object is accessible through
self.context
, if required.The EMM Plugin is transparent to any code that uses Numba - all its methods are invoked by Numba, and never need to be called by code that uses Numba.
The allocation methods
memalloc
,memhostalloc
, andmempin
, should use the underlying library to allocate and/or pin device or host memory, and construct an instance of a memory pointer representing the memory to return back to Numba. These methods are always called when the current CUDA context is the context that owns the EMM Plugin instance.The
initialize
method is called by Numba prior to the first use of the EMM Plugin object for a context. This method should do anything required to prepare the underlying library for allocations in the current context. This method may be called multiple times, and must not invalidate previous state when it is called.The
reset
method is called when all allocations in the context are to be cleaned up. It may be called even prior toinitialize
, and an EMM Plugin implementation needs to guard against this.To support inter-GPU communication, the
get_ipc_handle
method should provide anIpcHandle
for a givenMemoryPointer
instance. This method is part of the EMM interface (rather than being handled within Numba) because the base address of the allocation is only known by the underlying library. Closing an IPC handle is handled internally within Numba.It is optional to provide memory info from the
get_memory_info
method, which provides a count of the total and free memory on the device for the context. It is preferable to implement the method, but this may not be practical for all allocators. If memory info is not provided, this method should raise aRuntimeError
.The
defer_cleanup
method should return a context manager that ensures that expensive cleanup operations are avoided whilst it is active. The nuances of this will vary between plugins, so the plugin documentation should include an explanation of how deferring cleanup affects deallocations, and performance in general.The
interface_version
property is used to ensure that the plugin version matches the interface provided by the version of Numba. At present, this should always be 1.
Full documentation for the base class follows:
- class numba.cuda.BaseCUDAMemoryManager(*args, **kwargs)
Abstract base class for External Memory Management (EMM) Plugins.
- abstract memalloc(size)
Allocate on-device memory in the current context.
- Parameters:
size (int) – Size of allocation in bytes
- Returns:
A memory pointer instance that owns the allocated memory
- Return type:
- abstract memhostalloc(size, mapped, portable, wc)
Allocate pinned host memory.
- Parameters:
size (int) – Size of the allocation in bytes
mapped (bool) – Whether the allocated memory should be mapped into the CUDA address space.
portable (bool) – Whether the memory will be considered pinned by all contexts, and not just the calling context.
wc (bool) – Whether to allocate the memory as write-combined.
- Returns:
A memory pointer instance that owns the allocated memory. The return type depends on whether the region was mapped into device memory.
- Return type:
- abstract mempin(owner, pointer, size, mapped)
Pin a region of host memory that is already allocated.
- Parameters:
- Returns:
A memory pointer instance that refers to the allocated memory.
- Return type:
- abstract initialize()
Perform any initialization required for the EMM plugin instance to be ready to use.
- Returns:
None
- abstract get_ipc_handle(memory)
Return an IPC handle from a GPU allocation.
- Parameters:
memory (
MemoryPointer
) – Memory for which the IPC handle should be created.- Returns:
IPC handle for the allocation
- Return type:
- abstract get_memory_info()
Returns
(free, total)
memory in bytes in the context. May raiseNotImplementedError
, if returning such information is not practical (e.g. for a pool allocator).- Returns:
Memory info
- Return type:
- abstract reset()
Clears up all memory allocated in this context.
- Returns:
None
- abstract defer_cleanup()
Returns a context manager that ensures the implementation of deferred cleanup whilst it is active.
- Returns:
Context manager
- abstract property interface_version
Returns an integer specifying the version of the EMM Plugin interface supported by the plugin implementation. Should always return 1 for implementations of this version of the specification.
The Host-Only CUDA Memory Manager
Some external memory managers will support management of on-device memory but
not host memory. For implementing EMM Plugins using one of these memory
managers, a partial implementation of a plugin that implements host-side
allocation and pinning is provided. To use it, derive from
HostOnlyCUDAMemoryManager
instead of
BaseCUDAMemoryManager
. Guidelines for using this class
are:
The host-only memory manager implements
memhostalloc
andmempin
- the EMM Plugin should still implementmemalloc
.If
reset
is overridden, it must also callsuper().reset()
to allow the host allocations to be cleaned up.If
defer_cleanup
is overridden, it must hold an active context manager fromsuper().defer_cleanup()
to ensure that host-side cleanup is also deferred.
Documentation for the methods of HostOnlyCUDAMemoryManager
follows:
- class numba.cuda.HostOnlyCUDAMemoryManager(*args, **kwargs)
Base class for External Memory Management (EMM) Plugins that only implement on-device allocation. A subclass need not implement the
memhostalloc
andmempin
methods.This class also implements
reset
anddefer_cleanup
(seenumba.cuda.BaseCUDAMemoryManager
) for its own internal state management. If an EMM Plugin based on this class also implements these methods, then its implementations of these must also call the method fromsuper()
to giveHostOnlyCUDAMemoryManager
an opportunity to do the necessary work for the host allocations it is managing.This class does not implement
interface_version
, as it will always be consistent with the version of Numba in which it is implemented. An EMM Plugin subclassing this class should implementinterface_version
instead.- memhostalloc(size, mapped=False, portable=False, wc=False)
Implements the allocation of pinned host memory.
It is recommended that this method is not overridden by EMM Plugin implementations - instead, use the
BaseCUDAMemoryManager
.
- mempin(owner, pointer, size, mapped=False)
Implements the pinning of host memory.
It is recommended that this method is not overridden by EMM Plugin implementations - instead, use the
BaseCUDAMemoryManager
.
- reset()
Clears up all host memory (mapped and/or pinned) in the current context.
EMM Plugins that override this method must call
super().reset()
to ensure that host allocations are also cleaned up.
- defer_cleanup()
Returns a context manager that disables cleanup of mapped or pinned host memory in the current context whilst it is active.
EMM Plugins that override this method must obtain the context manager from this method before yielding to ensure that cleanup of host allocations is also deferred.
The IPC Handle Mixin
An implementation of the get_ipc_handle()
function is is provided in the
GetIpcHandleMixin
class. This uses the driver API to determine the base
address of an allocation for opening an IPC handle. If this implementation is
appropriate for an EMM plugin, it can be added by mixing in the
GetIpcHandleMixin
class:
- class numba.cuda.GetIpcHandleMixin
A class that provides a default implementation of
get_ipc_handle()
.- get_ipc_handle(memory)
Open an IPC memory handle by using
cuMemGetAddressRange
to determine the base pointer of the allocation. An IPC handle of typecu_ipc_mem_handle
is constructed and initialized withcuIpcGetMemHandle
. Anumba.cuda.IpcHandle
is returned, populated with the underlyingipc_mem_handle
.
Classes and structures of returned objects
This section provides an overview of the classes and structures that need to be constructed by an EMM Plugin.
Memory Pointers
EMM Plugins should construct memory pointer instances that represent their allocations, for return to Numba. The appropriate memory pointer class to use in each method is:
MemoryPointer
: returned frommemalloc
MappedMemory
: returned frommemhostalloc
ormempin
when the host memory is mapped into the device memory space.PinnedMemory
: return frommemhostalloc
ormempin
when the host memory is not mapped into the device memory space.
Memory pointers can take a finalizer, which is a function that is called when the buffer is no longer needed. Usually the finalizer will make a call to the memory management library (either internal to Numba, or external if allocated by an EMM Plugin) to inform it that the memory is no longer required, and that it could potentially be freed and/or unpinned. The memory manager may choose to defer actually cleaning up the memory to any later time after the finalizer runs - it is not required to free the buffer immediately.
Documentation for the memory pointer classes follows.
- class numba.cuda.MemoryPointer(context, pointer, size, owner=None, finalizer=None)
A memory pointer that owns a buffer, with an optional finalizer. Memory pointers provide reference counting, and instances are initialized with a reference count of 1.
The base
MemoryPointer
class does not use the reference count for managing the buffer lifetime. Instead, the buffer lifetime is tied to the memory pointer instance’s lifetime:When the instance is deleted, the finalizer will be called.
When the reference count drops to 0, no action is taken.
Subclasses of
MemoryPointer
may modify these semantics, for example to tie the buffer lifetime to the reference count, so that the buffer is freed when there are no more references.- Parameters:
context (Context) – The context in which the pointer was allocated.
pointer (ctypes.c_void_p) – The address of the buffer.
size (int) – The size of the allocation in bytes.
owner (NoneType) – The owner is sometimes set by the internals of this class, or used for Numba’s internal memory management. It should not be provided by an external user of the
MemoryPointer
class (e.g. from within an EMM Plugin); the default of None should always suffice.finalizer (function) – A function that is called when the buffer is to be freed.
The AutoFreePointer
class need not be used directly, but is documented here
as it is subclassed by numba.cuda.MappedMemory
:
- class numba.cuda.cudadrv.driver.AutoFreePointer(*args, **kwargs)
Modifies the ownership semantic of the MemoryPointer so that the instance lifetime is directly tied to the number of references.
When the reference count reaches zero, the finalizer is invoked.
Constructor arguments are the same as for
MemoryPointer
.
- class numba.cuda.MappedMemory(context, pointer, size, owner=None, finalizer=None)
A memory pointer that refers to a buffer on the host that is mapped into device memory.
- Parameters:
context (Context) – The context in which the pointer was mapped.
pointer (ctypes.c_void_p) – The address of the buffer.
size (int) – The size of the buffer in bytes.
owner (NoneType) – The owner is sometimes set by the internals of this class, or used for Numba’s internal memory management. It should not be provided by an external user of the
MappedMemory
class (e.g. from within an EMM Plugin); the default of None should always suffice.finalizer (function) – A function that is called when the buffer is to be freed.
- class numba.cuda.PinnedMemory(context, pointer, size, owner=None, finalizer=None)
A pointer to a pinned buffer on the host.
- Parameters:
context (Context) – The context in which the pointer was mapped.
owner – The object owning the memory. For EMM plugin implementation, this ca
pointer (ctypes.c_void_p) – The address of the buffer.
size (int) – The size of the buffer in bytes.
owner – An object owning the buffer that has been pinned. For EMM plugin implementation, the default of
None
suffices for memory allocated inmemhostalloc
- formempin
, it should be the owner passed in to themempin
method.finalizer (function) – A function that is called when the buffer is to be freed.
Memory Info
If an implementation of
get_memory_info()
is to provide a
result, then it should return an instance of the MemoryInfo
named tuple:
IPC
An instance of IpcHandle
is required to be returned from an implementation
of get_ipc_handle()
:
- class numba.cuda.IpcHandle(base, handle, size, source_info=None, offset=0)
CUDA IPC handle. Serialization of the CUDA IPC handle object is implemented here.
- Parameters:
base (MemoryPointer) – A reference to the original allocation to keep it alive
handle – The CUDA IPC handle, as a ctypes array of bytes.
size (int) – Size of the original allocation
source_info (dict) – The identity of the device on which the IPC handle was opened.
offset (int) – The offset into the underlying allocation of the memory referred to by this IPC handle.
Guidance for constructing an IPC handle in the context of implementing an EMM Plugin:
The
memory
parameter passed to theget_ipc_handle
method of an EMM Plugin can be passed as thebase
parameter.A suitable type for the
handle
can be constructed asctypes.c_byte * 64
. The data forhandle
must be populated using a method for obtaining a CUDA IPC handle appropriate to the underlying library.size
should match the size of the original allocation, which can be obtained withmemory.size
inget_ipc_handle
.An appropriate value for
source_info
can be created by callingself.context.device.get_device_identity()
.If the underlying memory does not point to the base of an allocation returned by the CUDA driver or runtime API (e.g. if a pool allocator is in use) then the
offset
from the base must be provided.
Setting the EMM Plugin
By default, Numba uses its internal memory management - if an EMM Plugin is to be used, it must be configured. There are two mechanisms for configuring the use of an EMM Plugin: an environment variable, and a function.
Environment variable
A module name can be provided in the environment variable,
NUMBA_CUDA_MEMORY_MANAGER
. If this environment variable is set, Numba will
attempt to import the module, and and use its _numba_memory_manager
global
variable as the memory manager class. This is primarily useful for running the
Numba test suite with an EMM Plugin, e.g.:
$ NUMBA_CUDA_MEMORY_MANAGER=rmm python -m numba.runtests numba.cuda.tests
Function
The set_memory_manager()
function can be used to set the
memory manager at runtime. This should be called prior to the initialization of
any contexts, as EMM Plugin instances are instantiated along with contexts.
- numba.cuda.set_memory_manager(mm_plugin)
Configure Numba to use an External Memory Management (EMM) Plugin. If the EMM Plugin version does not match one supported by this version of Numba, a RuntimeError will be raised.
- Parameters:
mm_plugin (BaseCUDAMemoryManager) – The class implementing the EMM Plugin.
- Returns:
None
Resetting the memory manager
It is recommended that the memory manager is set once prior to using any CUDA functionality, and left unchanged for the remainder of execution. It is possible to set the memory manager multiple times, noting the following:
At the time of their creation, contexts are bound to an instance of a memory manager for their lifetime.
Changing the memory manager will have no effect on existing contexts - only contexts created after the memory manager was updated will use instances of the new memory manager.
numba.cuda.close()
can be used to destroy contexts after setting the memory manager so that they get re-created with the new memory manager.This will invalidate any arrays, streams, events, and modules owned by the context.
Attempting to use invalid arrays, streams, or events will likely fail with an exception being raised due to a
CUDA_ERROR_INVALID_CONTEXT
orCUDA_ERROR_CONTEXT_IS_DESTROYED
return code from a Driver API function.Attempting to use an invalid module will result in similar, or in some cases a segmentation fault / access violation.
Note
The invalidation of modules means that all functions compiled with
@cuda.jit
prior to context destruction will need to be
redefined, as the code underlying them will also have been unloaded
from the GPU.