cub::CachingDeviceAllocator
Defined in cub/util_allocator.cuh
-
struct CachingDeviceAllocator
A simple caching allocator for device memory allocations.
- Overview
The allocator is thread-safe and stream-safe and is capable of managing cached device allocations on multiple devices. It behaves as follows:
Allocations from the allocator are associated with an
active_stream
. Once freed, the allocation becomes available immediately for reuse within theactive_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted toactive_stream
has completed.Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.
Bin limits progress geometrically in accordance with the growth factor
bin_growth
provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.Allocation requests below (
bin_growth
^min_bin
) are rounded up to (bin_growth
^min_bin
).Allocations above (
bin_growth
^max_bin
) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.If the total storage of cached allocations on a given device will exceed
max_cached_bytes
, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.
For example, the default-constructed CachingDeviceAllocator is configured with:
bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= 6MB - 1B
which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
Public Functions
-
inline CachingDeviceAllocator(unsigned int bin_growth, unsigned int min_bin = 1, unsigned int max_bin = INVALID_BIN, size_t max_cached_bytes = INVALID_SIZE, bool skip_cleanup = false)
Constructor.
- Parameters
bin_growth – Geometric growth factor for bin-sizes
min_bin – Minimum bin (default is bin_growth ^ 1)
max_bin – Maximum bin (default is no max bin)
max_cached_bytes – Maximum aggregate cached bytes per device (default is no limit)
skip_cleanup – Whether or not to skip a call to
FreeAllCached()
when the destructor is called (default is to deallocate)debug – Whether or not to print (de)allocation events to stdout (default is no stderr output)
-
inline CachingDeviceAllocator(unsigned int bin_growth, unsigned int min_bin, unsigned int max_bin, size_t max_cached_bytes, bool skip_cleanup, bool)
Constructor.
- Parameters
bin_growth – Geometric growth factor for bin-sizes
min_bin – Minimum bin (default is bin_growth ^ 1)
max_bin – Maximum bin (default is no max bin)
max_cached_bytes – Maximum aggregate cached bytes per device (default is no limit)
skip_cleanup – Whether or not to skip a call to
FreeAllCached()
when the destructor is called (default is to deallocate)debug – Whether or not to print (de)allocation events to stdout (default is no stderr output)
-
inline CachingDeviceAllocator(bool skip_cleanup = false, bool debug = false)
Default constructor.
Configured with:
which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
bin_growth
= 8min_bin
= 3max_bin
= 7max_cached_bytes
= (bin_growth
^max_bin
) * 3 ) - 1 = 6,291,455 bytes
-
inline cudaError_t SetMaxCachedBytes(size_t max_cached_bytes_)
Sets the limit on the number bytes this allocator is allowed to cache per device.
Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See
FreeAllCached()
.
-
inline cudaError_t DeviceAllocate(int device, void **d_ptr, size_t bytes, cudaStream_t active_stream = 0)
Provides a suitable allocation of device memory for the given size on the specified device.
Once freed, the allocation becomes available immediately for reuse within the
active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted toactive_stream
has completed.- Parameters
device – [in] Device on which to place the allocation
d_ptr – [out] Reference to pointer to the allocation
bytes – [in] Minimum number of bytes for the allocation
active_stream – [in] The stream to be associated with this allocation
-
inline cudaError_t DeviceAllocate(void **d_ptr, size_t bytes, cudaStream_t active_stream = 0)
Provides a suitable allocation of device memory for the given size on the current device.
Once freed, the allocation becomes available immediately for reuse within the
active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted toactive_stream
has completed.- Parameters
d_ptr – [out] Reference to pointer to the allocation
bytes – [in] Minimum number of bytes for the allocation
active_stream – [in] The stream to be associated with this allocation
-
inline cudaError_t DeviceFree(int device, void *d_ptr)
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the
active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted toactive_stream
has completed.
-
inline cudaError_t DeviceFree(void *d_ptr)
Frees a live allocation of device memory on the current device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the
active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted toactive_stream
has completed.
-
inline cudaError_t FreeAllCached()
Frees all cached device allocations on all devices.
-
inline virtual ~CachingDeviceAllocator()
Destructor.