Memory Pools#
Memory pools provide efficient, stream-ordered memory allocation using CUDA’s memory pool API. They support both synchronous and stream-ordered allocation/deallocation and can be configured with various memory spaces, properties and attributes.
Memory pool objects implement the cuda::memory_resource interface with allocate(stream, size, alignment) and deallocate(stream, ptr, size, alignment) member functions. They also provide synchronous variants with allocate_sync(size, alignment) and deallocate_sync(ptr, size, alignment) member functions. For all of them, the alignment argument is optional.
For the full memory resource model and property system, see Memory Resources (Extended API).
Host memory pools are supported on CUDA 12.6 and later. Managed memory pools are supported on CUDA 13.0 and later and are not supported on Windows. For those cases use cuda::mr::legacy_pinned_memory_resource and cuda::mr::legacy_managed_memory_resource instead.
cuda::device_memory_pool#
cuda::device_memory_pool allocates device memory using CUDA’s stream-ordered memory pool API (cudaMallocFromPoolAsync / cudaFreeAsync). When constructed, it creates and owns an underlying cudaMemPool_t with location type set to cudaMemLocationTypeDevice.
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
#include <cuda/devices>
void use_device_pool(cuda::stream_ref stream) {
// Create a device memory pool
cuda::device_memory_pool pool{cuda::devices[0]};
// Allocate memory in stream order
void* ptr = pool.allocate(stream, 1024, 16);
// Use memory...
// Deallocate in stream order
pool.deallocate(stream, ptr, 1024, 16);
}
cuda::device_memory_pool_ref#
cuda::device_memory_pool_ref is a non-owning reference to a device memory pool. It does not own the underlying cudaMemPool_t, so the user must ensure the pool’s lifetime exceeds the reference’s lifetime.
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
void use_pool_ref(cuda::stream_ref stream, cuda::device_memory_pool_ref pool_ref) {
void* ptr = pool_ref.allocate(stream, 1024);
// Use memory...
pool_ref.deallocate(stream, ptr, 1024);
}
cuda::managed_memory_pool#
cuda::managed_memory_pool allocates managed (unified) memory using CUDA’s memory pool API. It creates and owns an underlying cudaMemPool_t with allocation type set to cudaMemAllocationTypeManaged. Managed memory is accessible from both host and device.
Availability: CCCL 3.2.0 / CUDA 13.2 (requires CTK 13.0+). Not supported on Windows
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
void use_managed_pool(cuda::stream_ref stream) {
cuda::managed_memory_pool pool{};
// Allocate managed memory
void* ptr = pool.allocate(stream, 1024);
// Accessible from both host and device
// Use memory...
pool.deallocate(stream, ptr, 1024);
}
cuda::managed_memory_pool_ref#
cuda::managed_memory_pool_ref is a non-owning reference to a managed memory pool.
Availability: CCCL 3.2.0 / CUDA 13.2 (requires CTK 13.0+). Not supported on Windows
cuda::pinned_memory_pool#
cuda::pinned_memory_pool allocates pinned (page-locked) host memory using CUDA’s memory pool API. Pinned memory enables faster host-to-device transfers and can be accessed from all devices. The pool can be optionally created for a specific host NUMA node.
Availability: CCCL 3.2.0 / CUDA 13.2 (requires CTK 12.6+)
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
void use_pinned_pool(cuda::stream_ref stream) {
// Create pinned memory pool
cuda::pinned_memory_pool pool{};
// Allocate pinned memory
void* ptr = pool.allocate(stream, 1024);
// Use for fast host-device transfers...
pool.deallocate(stream, ptr, 1024);
}
// With NUMA node
void use_pinned_pool_numa(cuda::stream_ref stream, int numa_id) {
cuda::pinned_memory_pool pool{numa_id};
void* ptr = pool.allocate(stream, 1024);
// Use memory...
pool.deallocate(stream, ptr, 1024);
}
cuda::pinned_memory_pool_ref#
cuda::pinned_memory_pool_ref is a non-owning reference to a pinned memory pool.
Availability: CCCL 3.2.0 / CUDA 13.2 (requires CTK 12.6+)
Default Memory Pools#
CUDA provides default memory pools for each memory type. These pools are managed by the CUDA runtime and can be accessed through helper functions. Default pools are useful when you don’t need custom pool configuration and want to use the system defaults.
cuda::device_default_memory_pool#
cuda::device_default_memory_pool(device_ref) returns a non-owning reference to the default device memory pool for the specified device. The default pool is created automatically by CUDA and is shared across all users of the device.
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/devices>
#include <cuda/stream>
void use_default_device_pool(cuda::stream_ref stream) {
// Get the default device memory pool
auto pool = cuda::device_default_memory_pool(cuda::devices[0]);
// Allocate from the default pool
void* ptr = pool.allocate(stream, 1024);
// Use memory...
// Deallocate back to the pool
pool.deallocate(stream, ptr, 1024);
}
cuda::managed_default_memory_pool#
cuda::managed_default_memory_pool() returns a non-owning reference to the default managed (unified) memory pool.
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
void use_default_managed_pool(cuda::stream_ref stream) {
// Get the default managed memory pool
auto pool = cuda::managed_default_memory_pool();
// Allocate managed memory
void* ptr = pool.allocate(stream, 1024);
// Accessible from both host and device
// Use memory...
pool.deallocate(stream, ptr, 1024);
}
cuda::pinned_default_memory_pool#
cuda::pinned_default_memory_pool() returns a non-owning reference to the default pinned (page-locked) host memory pool.
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/stream>
void use_default_pinned_pool(cuda::stream_ref stream) {
// Get the default pinned memory pool
auto pool = cuda::pinned_default_memory_pool();
// Allocate pinned memory
void* ptr = pool.allocate(stream, 1024);
// Use for fast host-device transfers...
pool.deallocate(stream, ptr, 1024);
}
Notes on Default Pools#
Default pools are created automatically by CUDA and shared across the application
The pools are returned as non-owning references (
*_pool_reftypes)Default pools use CUDA’s default configuration and cannot be destroyed
Multiple calls to the same getter function return references to the same pool
Default pools are thread-safe and can be used concurrently from multiple threads
Underlying CUDA default memory pools have 0 release threshold by default. First access to a default pool through one of the getters above will set the release threshold to the maximum value, unless previously modified by the user.
Memory Pool Properties#
cuda::memory_pool_properties controls memory pool creation options:
initial_pool_size- Initial size of the pool (default: 0)release_threshold- Threshold at which unused memory is released (default: no limit on the reserved memory)allocation_handle_type- Handle type for inter-process sharing (default: none)max_pool_size- Maximum size of the pool (default: no limit on the pool size)
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/devices>
void create_pool_with_properties() {
cuda::memory_pool_properties props{};
props.initial_pool_size = 1024 * 1024; // 1 MB initial size
props.release_threshold = 10 * 1024 * 1024; // Release if over 10 MB
cuda::device_memory_pool pool{cuda::devices[0], props};
}
Memory Pool Attributes#
cuda::memory_pool_attributes provides access to pool attributes for querying and configuration:
release_threshold- Get/set the release threshold, which controls how much memory the pool can keep reserved, both used and unusedreuse_follow_event_dependencies- Enable/disable reuse across streams with event dependenciesreuse_allow_opportunistic- Enable/disable opportunistic reusereuse_allow_internal_dependencies- Enable/disable reuse with internal dependenciesreserved_mem_current- Query current reserved memory (read-only)used_mem_current- Query current used memory (read-only)reserved_mem_high- Get/set high watermark for reserved memoryused_mem_high- Get/set high watermark for used memory
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/devices>
void configure_pool_attributes() {
cuda::device_memory_pool pool{cuda::devices[0]};
// Set release threshold
pool.set_attribute(cuda::memory_pool_attributes::release_threshold, 5 * 1024 * 1024);
// Enable opportunistic reuse
pool.set_attribute(cuda::memory_pool_attributes::reuse_allow_opportunistic, true);
// Query current usage
auto reserved = pool.attribute(cuda::memory_pool_attributes::reserved_mem_current);
auto used = pool.attribute(cuda::memory_pool_attributes::used_mem_current);
}
Pool Management#
Memory pools provide additional management functions:
trim_to(min_bytes)- Release memory down to a minimum sizeenable/disable_access_from(devices)- Enable or disable access from specific devices (for peer access or access to host pinned memory)get()- Get the underlyingcudaMemPool_thandlerelease()- Release ownership of the pool handle
Availability: CCCL 3.2.0 / CUDA 13.2
Example:
#include <cuda/memory_resource>
#include <cuda/devices>
void manage_pool() {
cuda::pinned_memory_pool pool{};
// Enable access from all devices
pool.enable_access_from(cuda::devices);
// Trim pool to 1 MB minimum
pool.trim_to(1024 * 1024);
// Get native handle
cudaMemPool_t handle = pool.get();
}