apply_idx#

Apply a custom function to one or more operators with full index access. The apply_idx operator allows users to define custom transformations using lambda functions or functors that receive both the current indices as a cuda::std::array and the operators themselves. This provides more flexibility than apply() by allowing access to any element of the input operators, not just the current element.

The function can be provided as either an inline lambda or a functor. Inline __device__ lambdas work in regular code (like main() functions) but NOT in Google Test fixtures due to CUDA’s restriction on extended lambdas in private/protected methods. For test code, use functors instead.

apply_idx() assumes the rank and size of the output is the same as the first input operator. Unlike apply(), the lambda receives the indices and operators directly, allowing for stencil operations, neighbor access, and other spatially-aware computations.

Use Cases#

apply_idx is particularly useful for:

  • Stencil operations (accessing neighboring elements)

  • Convolution-like operations with custom kernels

  • Operations that depend on element position/indices

  • Accessing non-local elements based on the current position

  • Implementing custom boundary conditions

Using cuda::std::apply#

A powerful pattern is to use cuda::std::apply inside your functor to convert the index array back into a parameter pack. Since lambdas with captures can be problematic in device code, we use a helper functor:

// Helper for unpacking indices
template<typename Op>
struct ApplyIndices2D {
  const Op& op;
  __host__ __device__ ApplyIndices2D(const Op& o) : op(o) {}
  __host__ __device__ auto operator()(index_t i, index_t j) const {
    return op(i, j);
  }
};

struct MyFunctor {
  template<typename Op>
  __host__ __device__ auto operator()(cuda::std::array<index_t, 2> idx, const Op& op) const {
    // Use cuda::std::apply to unpack the index array
    return cuda::std::apply(ApplyIndices2D<Op>(op), idx);  // Calls op(idx[0], idx[1])
  }
};

You can also modify the indices before unpacking to implement operations like transposition or neighbor access:

// Access with swapped indices (transpose-like access)
cuda::std::array<index_t, 2> swapped = {idx[1], idx[0]};
return cuda::std::apply(ApplyIndices2D<Op>(op), swapped);  // Calls op(idx[1], idx[0])

Note you may see a naming collision with std::apply or cuda::std::apply. For the MatX function it’s best to use the matx::apply_idx form instead. For the standard library function used to unpack indices, use cuda::std::apply.

template<typename Func, typename ...Ops>
auto __MATX_INLINE__ matx::apply_idx(Func func, const Ops&... ops)#

Apply a custom lambda function or functor to one or more operators with index access.

The apply_idx operator allows applying a custom lambda function or functor to one or more input operators, where the lambda receives the current indices as a cuda::std::array along with the operators themselves. This allows the lambda to access elements at any position, not just the current element position.

The resulting operator has the same rank as the first input operator, and its size matches the size of the first input operator. The value type is deduced from the return type of the lambda function.

Example using an inline lambda (works in main(), not in test fixtures):

auto t_in = make_tensor<float>({10});
auto t_out = make_tensor<float>({10});

auto stencil = [] __device__ (auto idx, auto op) {
  auto i = idx[0];
  // Access current and neighboring elements
  if (i == 0 || i == op.Size(0) - 1) return op(i);
  return (op(i-1) + op(i) + op(i+1)) / 3.0f;
};
(t_out = apply_idx(stencil, t_in)).run();

Example using a functor (works everywhere including tests):

struct StencilFunctor {
  template<typename Op>
  __host__ __device__ auto operator()(cuda::std::array<index_t, 1> idx, const Op& op) const {
    auto i = idx[0];
    if (i == 0 || i == op.Size(0) - 1) return op(i);
    return (op(i-1) + op(i) + op(i+1)) / 3.0f;
  }
};
auto t_in = make_tensor<float>({10});
auto t_out = make_tensor<float>({10});
(t_out = apply_idx(StencilFunctor{}, t_in)).run();

Template Parameters:
  • Func – Lambda function or functor type

  • Ops – Input operator types (one or more operators)

Parameters:
  • func – Lambda function or functor to apply. Can be host, device, or both. The function signature should accept a cuda::std::array<index_t, RANK> followed by the input operators themselves (not their values). Note: Inline device lambdas work in regular code (e.g., main()) but NOT in Google Test fixtures due to private method restrictions. Use functors for tests. Requires &#8212;extended-lambda compiler flag.

  • ops – Input operators (one or more)

Returns:

ApplyIdxOp operator that applies the function element-wise

Examples#

auto t_in = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in.Size(0); i++) {
  t_in(i) = static_cast<detail::value_promote_t<TestType>>(i + 1);
}

// Apply a functor that accesses the current element using indices
(t_out = matx::apply_idx(AccessCurrentFunctor<TestType>{}, t_in)).run(exec);
auto t_in = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in.Size(0); i++) {
  t_in(i) = static_cast<detail::value_promote_t<TestType>>(i);
}

// Apply a 3-point moving average stencil
(t_out = matx::apply_idx(Stencil3Functor<TestType>{}, t_in)).run(exec);
auto t_in1 = make_tensor<TestType>({10});
auto t_in2 = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in1.Size(0); i++) {
  t_in1(i) = static_cast<detail::value_promote_t<TestType>>(i + 1);
  t_in2(i) = static_cast<detail::value_promote_t<TestType>>(i * 2);
}

// Apply a functor that combines two operators with index weighting
(t_out = matx::apply_idx(CombineWithIndexFunctor<TestType>{}, t_in1, t_in2)).run(exec);
auto t_in = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in.Size(0); i++) {
  t_in(i) = static_cast<detail::value_promote_t<TestType>>(i + 1);
}

// Use cuda::std::apply to unpack indices
(t_out = matx::apply_idx(ApplyIndex1DFunctor<TestType>{}, t_in)).run(exec);