mdspan to DLPack#
This functionality provides a conversion from cuda::host_mdspan, cuda::device_mdspan, and cuda::managed_mdspan to DLPack DLTensor view.
Defined in the <cuda/mdspan> header.
Conversion functions#
namespace cuda {
template <typename T, typename Extents, typename Layout, typename Accessor>
[[nodiscard]] /*dlpack_tensor*/<Extents::rank()>
to_dlpack_tensor(const host_mdspan<T, Extents, Layout, Accessor>& mdspan);
template <typename T, typename Extents, typename Layout, typename Accessor>
[[nodiscard]] /*dlpack_tensor*/<Extents::rank()>
to_dlpack_tensor(const device_mdspan<T, Extents, Layout, Accessor>& mdspan);
template <typename T, typename Extents, typename Layout, typename Accessor>
[[nodiscard]] /*dlpack_tensor*/<Extents::rank()>
to_dlpack_tensor(const managed_mdspan<T, Extents, Layout, Accessor>& mdspan);
} // namespace cuda
Types#
/*dlpack_tensor*/ is a internal helper class that stores a DLTensor and owns the backing storage for its shape and strides pointers. The class does not use any heap allocation.
namespace cuda {
template <size_t Rank>
struct /*dlpack_tensor*/ {
// cuda::std::array<int64_t, Rank> shape;
// cuda::std::array<int64_t, Rank> strides;
DLTensor get() & const noexcept [[lifetimebound]];
DLTensor get() && = delete;
};
} // namespace cuda
/*dlpack_tensor*/ stores a DLTensor and owns the backing storage for its shape and strides pointers. The class does not use any heap allocation.
Note
Lifetime
The DLTensor associated with /*dlpack_tensor*/ must not outlive the wrapper. If the wrapper is destroyed, the returned DLTensor::shape and DLTensor::strides pointers will dangle.
Note
Const-correctness
DLTensor::data points at mdspan.data_handle() (or is nullptr if mdspan.size() == 0). If T is const, the pointer is const_cast’d because DLTensor::data is unqualified.
Semantics#
The conversion produces a non-owning DLPack view of the mdspan data and metadata:
DLTensor::ndimismdspan.rank().For rank > 0,
DLTensor::shape[i]ismdspan.extent(i).For rank > 0,
DLTensor::strides[i]ismdspan.stride(i).DLTensor::byte_offsetis always0.DLTensor::deviceis:{kDLCPU, 0}forcuda::host_mdspan{kDLCUDA, /*device_id*/}forcuda::device_mdspan{kDLCUDAManaged, 0}forcuda::managed_mdspan
Element types are mapped to DLDataType according to the DLPack conventions, including:
bool.Signed and unsigned integers.
IEEE-754 Floating-point and extended precision floating-point, including
__half,__nv_bfloat16,__float128, FP8, FP6, FP4 when available.Complex:
cuda::std::complex<__half>,cuda::std::complex<float>, andcuda::std::complex<double>.CUDA built-in vector types, such as
int2,float4, etc.Vector types for extended floating-point, such as
__half2,__nv_fp8x4_e4m3, etc.
Constraints#
The accessor
data_handle_typemust be a pointer type.
Runtime errors#
If any
extent(i)orstride(i)cannot be represented inint64_t, the conversion raises anstd::invalid_argumentexception.
Availability notes#
This API is available only when DLPack header is present, namely
<dlpack/dlpack.h>is found in the include path.This API can be disabled by defining
CCCL_DISABLE_DLPACKbefore including any library headers. In this case,<dlpack/dlpack.h>will not be included.
References#
DLPack C API documentation.
Example#
#include <dlpack/dlpack.h>
#include <cuda/mdspan>
#include <cuda/std/cassert>
#include <cuda/std/cstdint>
int main() {
using extents_t = cuda::std::extents<size_t, 2, 3>;
int data[6] = {0, 1, 2, 3, 4, 5};
cuda::host_mdspan<int, extents_t> md{data, extents_t{}};
auto dl = cuda::to_dlpack_tensor(md);
auto dltensor = dl.get();
// `dl` owns the shape/stride storage; `dltensor.data` is a non-owning pointer to `data`.
assert(dltensor.device.device_type == kDLCPU);
assert(dltensor.ndim == 2);
assert(dltensor.shape[0] == 2 && dltensor.shape[1] == 3);
assert(dltensor.strides[0] == 3 && dltensor.strides[1] == 1);
assert(dltensor.data == data);
}
Examples of invalid usage:
#include <dlpack/dlpack.h>
#include <cuda/mdspan>
#include <cuda/std/cstdint>
void show_invalid_usage1() {
using extents_t = cuda::std::extents<size_t, 2, 3>;
int data[6] = {0, 1, 2, 3, 4, 5};
cuda::host_mdspan<int, extents_t> md{data, extents_t{}};
// WRONG: calling get() on a temporary is deleted to prevent dangling references.
// const DLTensor& dltensor = cuda::to_dlpack_tensor(md).get(); // compile error
}
#include <dlpack/dlpack.h>
#include <cuda/mdspan>
#include <cuda/std/cstdint>
int64_t* show_invalid_usage2() {
using extents_t = cuda::std::extents<size_t, 2, 3>;
int data[6] = {0, 1, 2, 3, 4, 5};
cuda::host_mdspan<int, extents_t> md{data, extents_t{}};
auto dl = cuda::to_dlpack_tensor(md);
auto dltensor = dl.get();
return dltensor.shape; // WRONG: returns a dangling pointer
}