cuda::experimental::stf::data_place_extension#

class data_place_extension#

Base class for data_place extensions.

Custom data place types inherit from this class and override virtual methods to provide place-specific behavior. This enables extensibility without modifying the core data_place class.

Example usage for a custom place type:

class my_custom_extension : public data_place_extension {
public:
  exec_place get_affine_exec_place() const override { ... }
  int get_device_ordinal() const override { return my_device_id; }
  ::std::string to_string() const override { return "my_custom_place"; }
  size_t hash() const override { return std::hash<int>{}(my_device_id); }
  bool equals(const data_place_extension& other) const override { ... }
};

Subclassed by cuda::experimental::stf::green_ctx_data_place::extension

Public Functions

virtual ~data_place_extension() = default#
virtual exec_place get_affine_exec_place() const = 0#

Get the affine execution place for this data place.

Returns the exec_place that should be used for computation on data stored at this place. The exec_place may have its own virtual methods (e.g., activate/deactivate) for execution-specific behavior.

virtual int get_device_ordinal() const = 0#

Get the device ordinal for this place.

Returns the CUDA device ID associated with this place. For host-only places, this should return -1.

virtual ::std::string to_string() const = 0#

Get a string representation of this place.

Used for debugging and logging purposes.

virtual size_t hash() const = 0#

Compute a hash value for this place.

Used for storing data_place in hash-based containers.

virtual bool equals(const data_place_extension &other) const = 0#

Check equality with another extension.

Parameters:

other – The other extension to compare with

Returns:

true if the extensions represent the same place

inline virtual CUresult mem_create(
CUmemGenericAllocationHandle *handle,
size_t size
) const#

Create a physical memory allocation for this place (VMM API)

This method is used by localized arrays (composite_slice) to create physical memory segments that are then mapped into a contiguous virtual address space. Custom place types can override this method to provide specialized memory allocation behavior.

See also

allocate() for regular memory allocation

Note

Managed memory is not supported by the VMM API.

Parameters:
  • handle – Output parameter for the allocation handle

  • size – Size of the allocation in bytes

Returns:

CUresult indicating success or failure

virtual void *allocate(
::std::ptrdiff_t size,
cudaStream_t stream
) const = 0#

Allocate memory for this place (raw allocation)

This is the low-level allocation interface. For stream-ordered allocations (where allocation_is_stream_ordered() returns true), the allocation will be ordered with respect to other operations on the stream. For immediate allocations, the stream parameter is ignored.

Parameters:
  • size – Size of the allocation in bytes

  • stream – CUDA stream for stream-ordered allocations (ignored for immediate allocations)

Returns:

Pointer to allocated memory

virtual void deallocate(
void *ptr,
size_t size,
cudaStream_t stream
) const = 0#

Deallocate memory for this place (raw deallocation)

Parameters:
  • ptr – Pointer to memory to deallocate

  • size – Size of the allocation

  • stream – CUDA stream for stream-ordered deallocations (ignored for immediate deallocations)

inline virtual bool allocation_is_stream_ordered() const#

Returns true if allocation/deallocation is stream-ordered.

When this returns true, the allocation uses stream-ordered APIs like cudaMallocAsync, and allocators should use stream_async_op to synchronize prerequisites before allocation.

When this returns false, the allocation is immediate (like cudaMallocHost) and the stream parameter is ignored. Note that immediate deallocations (e.g., cudaFree) may or may not introduce implicit synchronization.

Default is true since most GPU-based extensions use cudaMallocAsync.