einsum#

Added in version 0.2.3.

MatX provides an einsum function similar to the one found in NumPy. einsum allows a brief syntax to express many different operations in an optimized manner. A non-exhaustive list of einsum operations are:

  • Tensor contractions

  • Matrix multiplies (GEMMs)

  • Inner products

  • Transposes

  • Reductions

  • Trace

While many of these operations are possible using other methods in MatX, einsum typically has a shorter syntax, and is sometimes more optimized than a direct version of the operation.

Note

Using einsum() requires a minimum of cuTENSOR 1.7.0 and cuTensorNet 23.03.0.20. These are downloaded automatically as part of CMake, but for offline environments these versions are needed.

As of now, MatX only supports a limited set of einsum operations that would be supported in the NumPy version. Specifically only tensor contractions, inner products, and GEMMs are supported and tested at this time. MatX also does not support broadcast ‘…’ notation and has no plans to. While the broadcast notation is useful for high-rank tensors, it doesn’t add any new features and isn’t compatible in all expressions inside NumPy. We feel listing out the dimensions makes the syntax more clear without giving up any features. Since einsum requires an output tensor parameter, only explicit mode is supported using the -> operator. This allows type and size checking on the output tensor at the cost of extra verbosity.

For tensor contractions, MatX uses cuTENSOR and cuTensorNet as the optimized backend libraries. Since neither of these libraries are included with CUDA, and not all users need einsum functionality, einsum is an opt-in feature when configuring MatX. To add support, add the following CMake line:

-DMATX_EN_CUTENSOR=ON

Both cuTENSOR and cuTensorNet can have their location specified using cutensor_DIR and cutensornet_DIR, respectively. If these are not specified, CMake will attempt to download both libraries from the internet. einsum is inside the cutensor namespace in MatX to indicate that it’s an optional feature.

To perform a tensor contraction of two 3D tensors across a single dimension:

auto a = make_tensor<float>({3,4,5});
auto b = make_tensor<float>({4,3,2});
auto c = make_tensor<float>({5,2});
cutensor::einsum(c, "ijk,jil->kl", 0, a, b);

The letters in the subscripts argument are names given to each dimension. The letters are arbitrary, but the dimensions being contracted must have matching letters. In this case, we’re contracting along the i and j dimensions of both tensor a and b, resulting in an output tensor with dimensions k x l. The tensor c must match the output dimensions of k x l, which are the third dimension of both a and b (5 x 2).

Like other features in MatX, einsum can take an in artbirary number of tensors with arbitrary ranks. Each tensor’s dimensions are separated by , in the subscript list, and the actual tensors are enumerated at the end of the function.

The first time einsum runs a contraction of a certain signature it can take a long time to complete. this is because cuTensorNet and cuTensor perform optimization heuristics for future contractions. The penalty is only paid on the first call of input tensors with that signature, and subsequent calls will only perform the contraction step.

Note

einsum’s permute capability is significantly faster than the permute operator and should be preferred when possible.

Note

This function is currently not supported with host-based executors (CPU)

API#

template<typename ...InT>
__MATX_INLINE__ auto matx::cutensor::einsum(const std::string &subscripts, const InT&... ops)#

Evaluates the Einstein summation on the operands.

einsum() is a multi-purpose tool capable of performing various operations on tensors in a compact syntax. A non-exhaustive list of operations are: tensor contractions, GEMMs, dot products, and tranposes. Because einsum is extremely powerful, not all features are supported or tested in MatX yet. Currently only tensor contractions are tested. Other operations may work, but they’re not tested yet.

MatX uses a syntax very similar to NumPy’s einsum syntax: https://numpy.org/doc/stable/reference/generated/numpy.einsum.html

Ellipses are not supported yet, but a variadic list of tensors for contraction is supported. The output operator ‘->’ is required in MatX currently, and serves to provide error checking on the output tensor size.

Template Parameters:

InT – Types of input operators

Parameters:
  • subscripts – String containing Einstein notation of operation to perform

  • ops – List of input operators

Examples#

Tensor Contractions#

auto a1 = make_tensor<TestType>({60});
auto b1 = make_tensor<TestType>({24});
auto c2 = make_tensor<TestType>({5,2});

(a1 = linspace<0>(a1.Shape(), (TestType)0, static_cast<TestType>(a1.Size(0) - 1))).run(exec);
(b1 = linspace<0>(b1.Shape(), (TestType)0, static_cast<TestType>(b1.Size(0) - 1))).run(exec);
auto a = a1.View({3,4,5});
auto b = b1.View({4,3,2});

// Perform a 3D tensor contraction
(c2 = cutensor::einsum("ijk,jil->kl", a, b)).run(exec);

Dot Product#

auto a1 = make_tensor<TestType>({60});
auto b1 = make_tensor<TestType>({60});
auto c0 = make_tensor<TestType>({});
(a1 = ones(a1.Shape()) * 2).run(exec);
(b1 = ones(b1.Shape()) * 2).run(exec); 

// Perform a dot product of b1 with itself and store in a1
(c0 = cutensor::einsum("i,i->", a1, b1)).run(exec);

GEMM (Generalized Matrix Multiply)#

auto a2 = make_tensor<TestType>({10,20});
auto b2 = make_tensor<TestType>({20,10});
auto c2 = make_tensor<TestType>({10,10});    
auto c22 = make_tensor<TestType>({10,10});   
(a2 = ones()).run(exec);
(b2 = ones()).run(exec); 

// Perform a GEMM of a2 * b2. Compare results to traditional matmul call
(c2 = cutensor::einsum("mk,kn->mn", a2, b2)).run(exec);
(c22 = matmul(a2, b2)).run(exec);
auto a2 = make_tensor<TestType>({5,20});
auto b2 = make_tensor<TestType>({20,10});
auto c2 = make_tensor<TestType>({10,5});    
auto c22 = make_tensor<TestType>({5,10});   
(a2 = ones()).run(exec);
(b2 = ones()).run(exec); 

// Perform a GEMM of a2 * b2 and store the results transposed
(c2 = cutensor::einsum("mk,kn->nm", a2, b2)).run(exec);

Permute#

auto a = make_tensor<TestType>({5,20,4,3});
auto b = make_tensor<TestType>({20,3,4,5});
auto b2 = make_tensor<TestType>({20,3,4,5});
(a = ones()).run(exec);
(b = ones()).run(exec);

// Permute a 4D tensor. This gives the same output as Permute, but is much faster
(b = cutensor::einsum("ijkl->jlki", a)).run(exec);
(b2 = a.Permute({1,3,2,0})).run(exec);

Sum#

auto a = matx::make_tensor<TestType>({2, 3});
a.SetVals({
    {1, 2, 3},
    {4, 5, 6}
});  

auto b = matx::make_tensor<TestType>({3});
// Sum the columns of "a"
(b = matx::cutensor::einsum("ij->j", a)).run(exec);

Trace#

auto a2 = make_tensor<TestType>({10,10});
auto c0_0 = make_tensor<TestType>({});
auto c0_1 = make_tensor<TestType>({});
(a2 = ones()).run(exec);

// Perform a GEMM of a2 * b2. Compare results to traditional matmul call
(c0_0 = cutensor::einsum("ii->", a2)).run(exec);
(c0_1 = trace(a2)).run(exec);