Tensor Type#

A tensor in MatX tensor_t` is a memory-backed, reference-counted operator that contains metadata about the size, rank, and other properties. The type of memory can be anything that is accessible to where the tensor is being used, including device memory, managed memory, and host memory. MatX tensors are very similar to NumPy’s ndarray type in that common operations like slicing and cloning can be performed on them. Since MatX tensors are also operators they are designed to be accepted as both inputs and outputs to almost all functions.

tensor_t uses a std::shared_ptr for reference-counting the number of times the tensor is shared. This allows the tensor to be passed around on the host by value, and when the last owner goes out of scope the destructor is called, optionally freeing the tensor’s memory.

Tensors can be used on both the host and device. This allows custom operators and functions to utilize the same functionality, such as operator() that’s available on the host. Passing tensors to the device is preferred over raw pointers since tensors maintain their shape and strides to ensure correct accesses with no extra overhead. Since tensor_t contains types that are not available on the device (std::shared_ptr for example), when a tensor is passed to the device it is upcasted to the base class of tensor_impl_t. tensor_impl_t contains only types that are available on both the host and device, and provides a minimal set of functionality needed for device code.

For information on creating tensors, please see Creating Tensors or Quick Start for common usage.

template<typename T, int RANK, typename Storage = DefaultStorage<T>, typename Desc = DefaultDescriptor<RANK>>
class tensor_t : public matx::detail::tensor_impl_t<T, RANK, DefaultDescriptor<RANK>>#

View of an underlying tensor data object

Tensor views do not modify the underlying data; they simply present a different way to look at the data. This includes where the data begins and ends, the stride, the rank, etc. Views are very lightweight, and any number of views can be generated from the same data object. Since views represent different ways of looking at the same data, it is the responsibility of the user to ensure that proper synchronization is done when using multiple views on the same data. Failure to do so can result in race conditions on the device or host.

Public Types

using type = T#

Type of traits.

using value_type = T#

Type of traits.

using matxop = bool#

Indicate this is a MatX operator.

using matxoplvalue = bool#

Indicate this is a MatX operator that can be on the lhs of an equation.

using tensor_view = bool#

Indicate this is a MatX tensor view.

using tensor_t_type = bool#

This is a tensor_t (not a tensor_impl_t)

using storage_type = Storage#

Storage type trait.

using desc_type = Desc#

Descriptor type trait.

Public Functions

inline tensor_t()#

Construct a new 0-D tensor t object.

__MATX_HOST__ inline tensor_t(tensor_t const &rhs) noexcept#

Copy constructor.

Parameters:

rhs – Object to copy from

__MATX_HOST__ inline tensor_t(tensor_t &&rhs) noexcept#

Move constructor.

Parameters:

rhs – Object to move from

__MATX_HOST__ inline void Shallow(const self_type &rhs) noexcept#

Perform a shallow copy of a tensor view

Alternative to operator= since it’s used for lazy evaluation. This function is used to perform a shallow copy of a tensor view where the data pointer points to the same location as the right hand side’s data. *

Parameters:

rhs – Tensor to copy from

template<typename S2 = Storage, typename D2 = Desc, std::enable_if_t<is_matx_storage_v<typename remove_cvref<S2>::type> && is_matx_descriptor_v<typename remove_cvref<D2>::type>, bool> = true>
inline tensor_t(S2 &&s, D2 &&desc)#

Construct a new tensor t object from an arbitrary shape and descriptor.

Template Parameters:
  • S2 – Shape type

  • D2 – Descriptor type

Parameters:
  • s – Shape object

  • desc – Descriptor object

template<typename D2 = Desc>
inline tensor_t(Storage s, D2 &&desc, T *ldata)#

Construct a new tensor t object. Used to copy an existing storage object for proper reference counting.

Parameters:
  • s

  • desc

  • ldata

template<typename D2 = Desc, typename = typename std::enable_if_t<is_matx_descriptor_v<D2>>>
__MATX_INLINE__ inline tensor_t(D2 &&desc)#

Constructor for a rank-1 and above tensor.

Parameters:

desc – Tensor descriptor

__MATX_INLINE__ inline tensor_t(const std::initializer_list<detail::no_size_t>)#

Constructor for a rank-0 tensor.

NOTE: Use empty braces {} for the unused parameter.

__MATX_INLINE__ inline tensor_t(const typename Desc::shape_type (&shape)[RANK])#

Constructor for a rank-1 and above tensor.

Parameters:

shape – Tensor shape

__MATX_INLINE__ __MATX_HOST__ inline auto operator=(const self_type &op)#

Lazy assignment operator=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator=(const T2 &op)#

Lazy assignment operator=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator+=(const self_type &op)#

Lazy assignment operator+=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator+=(const T2 &op)#

Lazy assignment operator+=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator-=(const self_type &op)#

Lazy assignment operator-=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator-=(const T2 &op)#

Lazy assignment operator-=. Used to create a “set” object for deferred execution on a device

Template Parameters:

T2 – Type of operator

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator*=(const self_type &op)#

Lazy assignment operator*=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator*=(const T2 &op)#

Lazy assignment operator*=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator/=(const self_type &op)#

Lazy assignment operator/=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator/=(const T2 &op)#

Lazy assignment operator/=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator<<=(const self_type &op)#

Lazy assignment operator<<=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator<<=(const T2 &op)#

Lazy assignment operator<<=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator>>=(const self_type &op)#

Lazy assignment operator>>=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator>>=(const T2 &op)#

Lazy assignment operator>>=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator|=(const self_type &op)#

Lazy assignment operator|=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator|=(const T2 &op)#

Lazy assignment operator|=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator&=(const self_type &op)#

Lazy assignment operator&=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator&=(const T2 &op)#

Lazy assignment operator&=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator^=(const self_type &op)#

Lazy assignment operator^=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator^=(const T2 &op)#

Lazy assignment operator^=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

__MATX_INLINE__ __MATX_HOST__ inline auto operator%=(const self_type &op)#

Lazy assignment operator%=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Tensor view source

Returns:

set object containing the destination view and source object

template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator%=(const T2 &op)#

Lazy assignment operator%=. Used to create a “set” object for deferred execution on a device

Parameters:

op – Operator or scalar type to assign

Returns:

set object containing the destination view and source object

template<typename M = T, int R = RANK, typename Shape>
__MATX_INLINE__ inline auto View(Shape &&shape)#

Get a view of the tensor from the underlying data using a custom shape

Returns a view based on the shape passed in. Both the rank and the dimensions can be increased or decreased from the original data object as long as they fit within the bounds of the memory allocation. This function only allows a contiguous view of memory, regardless of the shape passed in. For example, if the original shape is {8, 2} and a view of {2, 1} is requested, the data in the new view would be the last two elements of the last dimension of the original data.

The function is similar to MATLAB and Python’s reshape(), except it does NOT make a copy of the data, whereas those languages may, depending on the context. It is up to the user to understand any existing views on the underlying data that may conflict with other views.

While this function is similar to Slice(), it does not allow slicing a particular start and end point as slicing does, and slicing also does not allow increasing the rank of a tensor as View(shape) does.

Note that the type of the data type of the tensor can also change from the original data. This may be useful in situations where a union of data types could be used in different ways. For example, a complex<float> could be reshaped into a float tensor that has twice as many elements, and operations can be done on floats instead of complex types.

Template Parameters:
  • M – New type of tensor

  • R – New rank of tensor

Parameters:

shape – New shape of tensor

Returns:

A view of the data with the appropriate strides and dimensions set

template<typename ShapeIntType, int NRANK>
__MATX_INLINE__ inline auto View(const ShapeIntType (&shape)[NRANK])#

Get a view of the tensor from the underlying data using a custom shape

Returns a view based on the shape passed in. Both the rank and the dimensions can be increased or decreased from the original data object as long as they fit within the bounds of the memory allocation. This function only allows a contiguous view of memory, regardless of the shape passed in. For example, if the original shape is {8, 2} and a view of {2, 1} is requested, the data in the new view would be the last two elements of the last dimension of the original data.

The function is similar to MATLAB and Python’s reshape(), except it does NOT make a copy of the data, whereas those languages may, depending on the context. It is up to the user to understand any existing views on the underlying data that may conflict with other views.

While this function is similar to Slice(), it does not allow slicing a particular start and end point as slicing does, and slicing also does not allow increasing the rank of a tensor as View(shape) does.

Note that the type of the data type of the tensor can also change from the original data. This may be useful in situations where a union of data types could be used in different ways. For example, a complex<float> could be reshaped into a float tensor that has twice as many elements, and operations can be done on floats instead of complex types.

Template Parameters:
  • ShapeIntType – Type of integer shape array

  • NRANK – New rank of tensor

Parameters:

shape – New shape of tensor

Returns:

A view of the data with the appropriate strides and dimensions set

__MATX_INLINE__ inline auto View()#

Make a copy of a tensor and maintain all refcounts.

Returns:

Copy of view

__MATX_INLINE__ inline void PrefetchDevice(cudaStream_t const stream) const noexcept#

Prefetch the data asynchronously from the host to the device.

All copies are done asynchronously in a stream. The order of the copy is predictable within work in the same stream, but not when the transfer will occur.

Parameters:

stream – The CUDA stream to prefetch within

__MATX_INLINE__ inline void PrefetchHost(cudaStream_t const stream) const noexcept#

Prefetch the data asynchronously from the device to the host.

All copies are done asynchronously in a stream. The order of the copy is predictable within work in the same stream, but not when the transfer will occur.

Parameters:

stream – The CUDA stream to prefetch within

template<typename U = T>
__MATX_INLINE__ inline auto RealView() const noexcept#

Create a view of only real-valued components of a complex array

Only available on complex data types.

Returns:

tensor view of only real-valued components

__MATX_INLINE__ inline auto GetStorage() noexcept#

Return the storage container from the tensor.

Returns:

storage container

template<typename U = T>
__MATX_INLINE__ inline auto ImagView() const noexcept#

Create a view of only imaginary-valued components of a complex array

Only available on complex data types.

Returns:

tensor view of only imaginary-valued components

__MATX_INLINE__ inline auto Permute(const cuda::std::array<int32_t, RANK> &dims) const#

Permute the dimensions of a tensor

Accepts any order of permutation. Number of dimensions must match RANK of tensor

Template Parameters:

M – Rank of tensor to permute. Should not be used directly

Parameters:

dims – Dimensions of tensor

Returns:

tensor view of only imaginary-valued components

__MATX_INLINE__ inline auto Permute(const int32_t (&dims)[RANK]) const#

Permute the dimensions of a tensor

Accepts any order of permutation. Number of dimensions must match RANK of tensor

Template Parameters:

M – Rank of tensor to permute. Should not be used directly

Parameters:

dims – Dimensions of tensor

Returns:

tensor view of only imaginary-valued components

__MATX_INLINE__ inline auto PermuteMatrix() const#

Permute the last two dimensions of a matrix

Utility function to permute the last two dimensions of a tensor. This is useful in the numerous operations that take a permuted matrix as input, but we don’t want to permute the inner dimensions of a larger tensor.

Returns:

tensor view with last two dims permuted

__MATX_HOST__ __MATX_INLINE__ inline T *Data() const noexcept#

Get the underlying local data pointer from the view

Returns:

Underlying data pointer of type T

template<typename ShapeType, std::enable_if_t<!std::is_pointer_v<typename remove_cvref<ShapeType>::type>, bool> = true>
__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data, ShapeType &&shape) noexcept#

Set the underlying data pointer from the view

Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.

Template Parameters:

ShapeType – Shape type

Parameters:
  • data – Data pointer to set

  • shape – Shape of tensor

__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data) noexcept#

Set the underlying data pointer from the view

Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.

Parameters:

data – Data pointer to set

__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data, T *const ldata) noexcept#

Set the underlying data and local data pointer from the view

Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.

Parameters:
  • data – Allocated data pointer

  • ldata – Local data pointer offset into allocated

__MATX_INLINE__ __MATX_HOST__ inline Desc::stride_type Stride(uint32_t dim) const#

Get the stride of a single dimension of the tensor

Parameters:

dim – Desired dimension

Returns:

Stride (in elements) in dimension

__MATX_INLINE__ __MATX_HOST__ inline auto GetRefCount() const noexcept#

Get the reference count

Returns:

Reference count or 0 if not tracked

template<int N>
__MATX_INLINE__ inline auto OverlapView(const cuda::std::array<typename Desc::shape_type, N> &windows, const cuda::std::array<typename Desc::stride_type, N> &strides) const#

Create an overlapping tensor view

Creates an overlapping tensor view where an existing tensor can be repeated into a higher rank with overlapping elements. For example, the following 1D tensor [1 2 3 4 5] could be cloned into a 2D tensor with a window size of 2 and overlap of 1, resulting in:

  [1 2
   2 3
   3 4
   4 5]
Currently this only works on 1D tensors going to 2D, but may be expanded for higher dimensions in the future. Note that if the window size does not divide evenly into the existing column dimension, the view may chop off the end of the data to make the tensor rectangular.

Parameters:
  • windows – Window size (columns in output)

  • strides – Strides between data elements

Returns:

Overlapping view of data

template<int N>
__MATX_INLINE__ inline auto Clone(const cuda::std::array<index_t, N> &clones) const#

Clone a tensor into a higher-dimension tensor

Clone() allows a copy-less method to clone data into a higher dimension tensor. The underlying data does not grow or copy, but instead the indices of the higher-ranked tensor access the original data potentially multiple times. Clone is similar to MATLAB’s repmat() function where it’s desired to take a tensor of a lower dimension and apply an operation with it to a tensor in a higher dimension by broadcasting the values.

For example, in a rank=2 tensor that’s MxN, and a rank=1 tensor that’s 1xN, Clone() can take the rank=1 tensor and broadcast to an MxN rank=2 tensor, and operations such as the Hadamard product can be performed. In this example, the final operation will benefit heavily from device caching since the same 1xN rank=1 tensor will be accessed M times.

Parameters:

clones – List of sizes of each dimension to clone. Parameter length must match rank of tensor. A special sentinel value of matxKeepDim should be used when the dimension from the original tensor is to be kept.

Returns:

Cloned view representing the higher-dimension tensor

template<int M = RANK, std::enable_if_t<M == 0, bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(T const &val)#

Rank-0 initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

val – 0 initializer list value

template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 1) || (is_cuda_complex_v<T> && M == 0), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<T> &vals)#

Rank-1 non-complex or rank-0 initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

vals – 1D initializer list of values

template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 2) || (is_cuda_complex_v<T> && M == 1), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<T>> &vals)#

Rank-2 non-complex or rank-1 initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

vals – 1D/2D initializer list of values

template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 3) || (is_cuda_complex_v<T> && M == 2), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>> vals)#

Rank-3 non-complex or rank-2 complex initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

vals – 3D/2D initializer list of values

template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 4) || (is_cuda_complex_v<T> && M == 3), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>>> &vals)#

Rank-4 non-complex or rank-3 complex initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

vals – 3D/4D initializer list of values

template<int M = RANK, std::enable_if_t<is_cuda_complex_v<T> && M == 4, bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>>>> &vals)#

Rank-4 complex initializer list setting

Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.

Parameters:

vals – 4D initializer list of values

template<int N = RANK, typename StrideType>
__MATX_INLINE__ inline auto Slice(const cuda::std::array<typename Desc::shape_type, RANK> &firsts, const cuda::std::array<typename Desc::shape_type, RANK> &ends, StrideType strides) const#

Slice a tensor either within the same dimension or to a lower dimension

Slice() allows a copy-less method to extract a subset of data from one or more dimensions of a tensor. This includes completely dropping an unwanted dimension, or simply taking a piece of a wanted dimension. Slice() is very similar to indexing operations in both Python and MATLAB.

NOTE Users should not call Slice() directly anymore. Use the slice() operator instead.

2) matxDropDim is used to slice (drop) a dimension entirely. This results in a tensor with a smaller rank than the original

Parameters:
  • firsts – List of starting index into each dimension. Indexing is 0-based

  • ends – List of ending index into each dimension. Indexing is 0-based Two special sentinel values can be used: 1) matxEnd is used to indicate the end of that particular dimension without specifying the size. This is similar to “end” in MATLAB and leaving off an end in Python “a[1:]”

  • strides – List of strides for each dimension. A special sentinel value of matxKeepStride is used to keep the existing stride of the dimension

Returns:

Sliced view of tensor

template<int N = RANK>
__MATX_INLINE__ inline auto Slice(const cuda::std::array<typename Desc::shape_type, RANK> &firsts, const cuda::std::array<typename Desc::shape_type, RANK> &ends) const#

Slice a tensor either within the same dimension or to a lower dimension

Slice() allows a copy-less method to extract a subset of data from one or more dimensions of a tensor. This includes completely dropping an unwanted dimension, or simply taking a piece of a wanted dimension. Slice() is very similar to indexing operations in both Python and MATLAB.

2) matxDropDim is used to slice (drop) a dimension entirely. This results in a tensor with a smaller rank than the original

Parameters:
  • firsts – List of starting index into each dimension. Indexing is 0-based

  • ends – List of ending index into each dimension. Indexing is 0-based Two special sentinel values can be used: 1) matxEnd is used to indicate the end of that particular dimension without specifying the size. This is similar to “end” in MATLAB and leaving off an end in Python “a[1:]”

Returns:

Sliced view of tensor

inline DLManagedTensor *GetDLPackTensor() const#

Get a DLPack v0.8 structure representing the tensor.

DLPack is a commonly-used tensor memory layout format for moving tensors between libraries. This function returns a DLPack structure based on a tensor_t. The caller is responsible for freeing the memory by calling ->deleter(self).

Note: This function will increment the reference count of the tensor. It is expected that once a tensor is converted to DLPack someone will eventually call deleter(). If that does not happen a memory leak will occur.

Returns:

Pointer to new DLManagedTensorVersioned pointer. The caller must call the deleter function when finished.

Friends

inline friend void swap(self_type &lhs, self_type &rhs) noexcept#

Swaps two tensors

Swaps members of two tensors, including pointers, shapes, and descriptors

Parameters:
  • lhs – Left argument

  • rhs – Right argument