apply#

Apply a custom function to one or more operators element-wise. The apply operator allows users to define custom transformations using lambda functions or functors that can be applied to tensor operations. The function can be annotated with __host__, __device__, or both to control where it executes.

apply() assumes the rank and size of the output is the same as the first input operator. For cases where this is not true, a custom operator should be used instead. In general, apply will perform better than a custom operator because of optimization that most custom operators do not take advantage of. Running the black_scholes example shows the performance difference.

Note you may see a naming collision with std::apply or cuda::std::apply. For this function it’s best to use the matx::apply form instead.

template<typename Func, typename ...Ops> auto __MATX_INLINE__ matx::apply(Func func, const Ops&... ops)#

Apply a custom lambda function or functor to one or more operators.

The apply operator allows applying a custom lambda function or functor to one or more input operators. The function is called for each element position, and receives the values from all input operators at that position.

The resulting operator has the same rank as the first input operator, and its size matches the size of the first input operator. The value type is deduced from the return type of the lambda function.

Example using a lambda:

auto t1 = make_tensor<float>({10, 10});
auto t2 = make_tensor<float>({10, 10});
auto result = make_tensor<float>({10, 10});

// Apply a custom function that adds and squares
auto my_func = [] __device__ (float a, float b) { return (a + b) * (a + b); };
(result = apply(my_func, t1, t2)).run();

Example using a functor:

struct SquareFunctor {
  template<typename T>
  __host__ __device__ auto operator()(T x) const { return x * x; }
};

auto t_in = make_tensor<float>({10});
auto t_out = make_tensor<float>({10});
(t_out = apply(SquareFunctor{}, t_in)).run();

Template Parameters:

Func – Lambda function or functor type
Ops – Input operator types (one or more operators)

Parameters:

func – Lambda function or functor to apply. Can be host, device, or both. The function signature should accept value_type from each input operator. Note: Using host device lambdas requires the —extended-lambda compiler flag. For complex scenarios, consider using functors instead of lambdas.
ops – Input operators (one or more)

Returns:

ApplyOp operator that applies the function element-wise

Examples#

auto t_in = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in.Size(0); i++) {
  t_in(i) = static_cast<detail::value_promote_t<TestType>>(i);
}

// Apply a lambda that squares each element
(t_out = matx::apply(SquareFunctor{}, t_in)).run(exec);

auto t_in1 = make_tensor<TestType>({10});
auto t_in2 = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in1.Size(0); i++) {
  t_in1(i) = static_cast<detail::value_promote_t<TestType>>(i);
  t_in2(i) = static_cast<detail::value_promote_t<TestType>>(i + 1);
}

// Apply a lambda that adds two inputs
(t_out = matx::apply(AddFunctor{}, t_in1, t_in2)).run(exec);

auto t_in1 = make_tensor<TestType>({10});
auto t_in2 = make_tensor<TestType>({10});
auto t_in3 = make_tensor<TestType>({10});
auto t_out = make_tensor<TestType>({10});

for (index_t i = 0; i < t_in1.Size(0); i++) {
  t_in1(i) = static_cast<detail::value_promote_t<TestType>>(i);
  t_in2(i) = static_cast<detail::value_promote_t<TestType>>(i + 1);
  t_in3(i) = static_cast<detail::value_promote_t<TestType>>(i + 2);
}

// Apply a lambda that combines three inputs: x + y * z
(t_out = matx::apply(CombineFunctor{}, t_in1, t_in2, t_in3)).run(exec);

apply

Contents

apply#

Examples#