cub::CachingDeviceAllocator

Defined in cub/util_allocator.cuh

struct CachingDeviceAllocator

A simple caching allocator for device memory allocations.

Overview

The allocator is thread-safe and stream-safe and is capable of managing cached device allocations on multiple devices. It behaves as follows:

  • Allocations from the allocator are associated with an active_stream. Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

  • Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.

  • Bin limits progress geometrically in accordance with the growth factor bin_growth provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.

  • Allocation requests below ( bin_growth ^ min_bin ) are rounded up to ( bin_growth ^ min_bin ).

  • Allocations above ( bin_growth ^ max_bin ) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.

  • If the total storage of cached allocations on a given device will exceed max_cached_bytes, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.

For example, the default-constructed CachingDeviceAllocator is configured with:

  • bin_growth = 8

  • min_bin = 3

  • max_bin = 7

  • max_cached_bytes = 6MB - 1B

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

Public Functions

inline CachingDeviceAllocator(unsigned int bin_growth, unsigned int min_bin = 1, unsigned int max_bin = INVALID_BIN, size_t max_cached_bytes = INVALID_SIZE, bool skip_cleanup = false)

Constructor.

Parameters
  • bin_growth – Geometric growth factor for bin-sizes

  • min_bin – Minimum bin (default is bin_growth ^ 1)

  • max_bin – Maximum bin (default is no max bin)

  • max_cached_bytes – Maximum aggregate cached bytes per device (default is no limit)

  • skip_cleanup – Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate)

  • debug – Whether or not to print (de)allocation events to stdout (default is no stderr output)

inline CachingDeviceAllocator(unsigned int bin_growth, unsigned int min_bin, unsigned int max_bin, size_t max_cached_bytes, bool skip_cleanup, bool)

Constructor.

Parameters
  • bin_growth – Geometric growth factor for bin-sizes

  • min_bin – Minimum bin (default is bin_growth ^ 1)

  • max_bin – Maximum bin (default is no max bin)

  • max_cached_bytes – Maximum aggregate cached bytes per device (default is no limit)

  • skip_cleanup – Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate)

  • debug – Whether or not to print (de)allocation events to stdout (default is no stderr output)

inline CachingDeviceAllocator(bool skip_cleanup = false, bool debug = false)

Default constructor.

Configured with:

which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device

  • bin_growth = 8

  • min_bin = 3

  • max_bin = 7

  • max_cached_bytes = ( bin_growth ^ max_bin) * 3 ) - 1 = 6,291,455 bytes

inline cudaError_t SetMaxCachedBytes(size_t max_cached_bytes_)

Sets the limit on the number bytes this allocator is allowed to cache per device.

Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached().

inline cudaError_t DeviceAllocate(int device, void **d_ptr, size_t bytes, cudaStream_t active_stream = 0)

Provides a suitable allocation of device memory for the given size on the specified device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters
  • device[in] Device on which to place the allocation

  • d_ptr[out] Reference to pointer to the allocation

  • bytes[in] Minimum number of bytes for the allocation

  • active_stream[in] The stream to be associated with this allocation

inline cudaError_t DeviceAllocate(void **d_ptr, size_t bytes, cudaStream_t active_stream = 0)

Provides a suitable allocation of device memory for the given size on the current device.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

Parameters
  • d_ptr[out] Reference to pointer to the allocation

  • bytes[in] Minimum number of bytes for the allocation

  • active_stream[in] The stream to be associated with this allocation

inline cudaError_t DeviceFree(int device, void *d_ptr)

Frees a live allocation of device memory on the specified device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

inline cudaError_t DeviceFree(void *d_ptr)

Frees a live allocation of device memory on the current device, returning it to the allocator.

Once freed, the allocation becomes available immediately for reuse within the active_stream with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream has completed.

inline cudaError_t FreeAllCached()

Frees all cached device allocations on all devices.

inline virtual ~CachingDeviceAllocator()

Destructor.

Public Static Attributes

static constexpr unsigned int INVALID_BIN = (unsigned int)-1

Out-of-bounds bin.

static constexpr size_t INVALID_SIZE = (size_t)-1

Invalid size.