DLPack to `mdspan`#

This functionality provides a conversion from DLPack DLTensor to cuda::host_mdspan, cuda::device_mdspan, and cuda::managed_mdspan.

Defined in the <cuda/mdspan> header.

Conversion functions#

namespace cuda {

template <typename ElementType, size_t Rank, typename LayoutPolicy = cuda::std::layout_stride>
[[nodiscard]] cuda::host_mdspan<ElementType, cuda::std::dims<Rank, int64_t>, LayoutPolicy>
to_host_mdspan(const DLTensor& tensor);

template <typename ElementType, size_t Rank, typename LayoutPolicy = cuda::std::layout_stride>
[[nodiscard]] cuda::device_mdspan<ElementType, cuda::std::dims<Rank, int64_t>, LayoutPolicy>
to_device_mdspan(const DLTensor& tensor);

template <typename ElementType, size_t Rank, typename LayoutPolicy = cuda::std::layout_stride>
[[nodiscard]] cuda::managed_mdspan<ElementType, cuda::std::dims<Rank, int64_t>, LayoutPolicy>
to_managed_mdspan(const DLTensor& tensor);

} // namespace cuda

Template parameters#

ElementType: The element type of the resulting mdspan. Must match the DLTensor::dtype.
Rank: The number of dimensions. Must match DLTensor::ndim.
LayoutPolicy: The layout policy for the resulting mdspan. Defaults to cuda::std::layout_stride. Supported layouts are:
- cuda::std::layout_right (C-contiguous, row-major)
- cuda::std::layout_left (Fortran-contiguous, column-major)
- cuda::std::layout_stride (general strided layout)

Semantics#

The conversion produces a non-owning mdspan view of the DLTensor data:

The mdspan data pointer is computed as static_cast<char*>(tensor.data) + tensor.byte_offset.
For rank > 0, mdspan.extent(i) is tensor.shape[i].
For layout_stride, mdspan.stride(i) is tensor.strides[i] (or computed as row-major if strides is nullptr for DLPack < v1.2).
The device type is validated:
- kDLCPU for to_host_mdspan
- kDLCUDA for to_device_mdspan
- kDLCUDAManaged for to_managed_mdspan

Supported element types:

bool.
Signed and unsigned integers.
IEEE-754 Floating-point and extended precision floating-point, including __half, __nv_bfloat16, __float128, FP8, FP6, FP4 when available.
Complex: cuda::std::complex<__half>, cuda::std::complex<float>, and cuda::std::complex<double>.
CUDA built-in vector types, such as int2, float4, etc.
Vector types for extended floating-point, such as __half2, __nv_fp8x4_e4m3, etc.

Constraints#

LayoutPolicy must be one of cuda::std::layout_right, cuda::std::layout_left, or cuda::std::layout_stride.
For layout_right and layout_left, the DLTensor strides must be compatible with the layout.

Runtime errors#

The conversion throws std::invalid_argument in the following cases:

DLTensor::ndim does not match the specified Rank.
DLTensor::dtype does not match ElementType.
DLTensor::data is nullptr.
DLTensor::shape is nullptr (for rank > 0).
Any DLTensor::shape[i] is negative.
DLTensor::strides is nullptr for DLPack v1.2 or later.
DLTensor::strides is nullptr for layout_left with rank > 1 (DLPack < v1.2).
DLTensor::strides[i] is not positive for layout_stride.
DLTensor::strides are not compatible with the requested layout_right or layout_left.
DLTensor::device.device_type does not match the target mdspan type.
Data pointer is not properly aligned for the element type.

Availability notes#

This API is available only when DLPack header is present, namely <dlpack/dlpack.h> is found in the include path.
This API can be disabled by defining CCCL_DISABLE_DLPACK before including any library headers. In this case, <dlpack/dlpack.h> will not be included.

References#

DLPack C API documentation.

Example#

#include <dlpack/dlpack.h>
#include <cuda/mdspan>
#include <cuda/std/cassert>
#include <cuda/std/cstdint>

int main() {
  int data[6] = {0, 1, 2, 3, 4, 5};

  // Create a DLTensor manually for demonstration
  int64_t shape[2]   = {2, 3};
  int64_t strides[2] = {3, 1};  // row-major strides

  DLTensor tensor{};
  tensor.data        = data;
  tensor.device      = {kDLCPU, 0};
  tensor.ndim        = 2;
  tensor.dtype       = DLDataType{kDLInt, 32, 1};
  tensor.shape       = shape;
  tensor.strides     = strides;
  tensor.byte_offset = 0;

  // Convert to host_mdspan
  auto md = cuda::to_host_mdspan<int, 2>(tensor);

  assert(md.rank() == 2);
  assert(md.extent(0) == 2 && md.extent(1) == 3);
  assert(md.stride(0) == 3 && md.stride(1) == 1);
  assert(md.data_handle() == data);
  assert(md(0, 0) == 0 && md(1, 2) == 5);
}

DLPack to mdspan#