cub::CachingDeviceAllocator#

struct CachingDeviceAllocator#

A simple caching allocator for device memory allocations.

Overview

The allocator is thread-safe and stream-safe and is capable of managing cached device allocations on multiple devices. It behaves as follows:

Allocations from the allocator are associated with an active_stream. Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.
Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.
Bin limits progress geometrically in accordance with the growth factor bin_growth provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.
Allocation requests below ( bin_growth ^ min_bin ) are rounded up to ( bin_growth ^ min_bin ).
Allocations above ( bin_growth ^ max_bin ) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.
If the total storage of cached allocations on a given device will exceed max_cached_bytes, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.

For example, the default-constructed CachingDeviceAllocator is configured with:

bin_growth = 8
min_bin = 3
max_bin = 7
max_cached_bytes = 6MB - 1B

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

Public Functions

inline CachingDeviceAllocator( unsigned int bin_growth, unsigned int min_bin = 1, unsigned int max_bin = INVALID_BIN, size_t max_cached_bytes = INVALID_SIZE, bool skip_cleanup = false, )#

Constructor.

Parameters:

bin_growth – Geometric growth factor for bin-sizes
min_bin – Minimum bin (default is bin_growth ^ 1)
max_bin – Maximum bin (default is no max bin)
max_cached_bytes – Maximum aggregate cached bytes per device (default is no limit)
skip_cleanup – Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate)
debug – Whether or not to print (de)allocation events to stdout (default is no stderr output)

inline CachingDeviceAllocator( bool skip_cleanup = false, bool debug = false, )#

Default constructor.

Configured with:

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

bin_growth = 8
min_bin = 3
max_bin = 7
max_cached_bytes = ( bin_growth ^ max_bin) * 3 ) - 1 = 6,291,455 bytes

inline cudaError_t SetMaxCachedBytes(size_t max_cached_bytes_)#

Sets the limit on the number bytes this allocator is allowed to cache per device.

Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached().

inline cudaError_t DeviceAllocate(

int device,

void **d_ptr,

size_t bytes,

cudaStream_t active_stream = 0,

)#

Provides a suitable allocation of device memory for the given size on the specified device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters:

device – [in] Device on which to place the allocation
d_ptr – [out] Reference to pointer to the allocation
bytes – [in] Minimum number of bytes for the allocation
active_stream – [in] The stream to be associated with this allocation

inline cudaError_t DeviceAllocate(

void **d_ptr,

size_t bytes,

cudaStream_t active_stream = 0,

)#

Provides a suitable allocation of device memory for the given size on the current device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters:

d_ptr – [out] Reference to pointer to the allocation
bytes – [in] Minimum number of bytes for the allocation
active_stream – [in] The stream to be associated with this allocation

inline cudaError_t DeviceFree(int device, void *d_ptr)#

Frees a live allocation of device memory on the specified device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

inline cudaError_t DeviceFree(void *d_ptr)#

Frees a live allocation of device memory on the current device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

inline cudaError_t FreeAllCached()#: Frees all cached device allocations on all devices.

inline virtual ~CachingDeviceAllocator()#: Destructor.

Public Static Attributes

static constexpr unsigned int INVALID_BIN = (unsigned int)-1#: Out-of-bounds bin.

static constexpr size_t INVALID_SIZE = (size_t)-1#: Invalid size.