Tensor Type#
A tensor in MatX tensor_t`
is a memory-backed, reference-counted operator that contains metadata about the
size, rank, and other properties. The type of memory can be anything that is accessible to where the tensor is
being used, including device memory, managed memory, and host memory. MatX tensors are very similar to NumPy’s
ndarray type in that common operations like slicing and cloning can be performed on them. Since MatX tensors
are also operators they are designed to be accepted as both inputs and outputs to almost all functions.
tensor_t
uses a std::shared_ptr
for reference-counting the number of times the tensor is shared. This
allows the tensor to be passed around on the host by value, and when the last owner goes out of scope the
destructor is called, optionally freeing the tensor’s memory.
Tensors can be used on both the host and device. This allows custom operators and functions to utilize the same
functionality, such as operator()
that’s available on the host. Passing tensors to the device is preferred
over raw pointers since tensors maintain their shape and strides to ensure correct accesses with no extra overhead.
Since tensor_t
contains types that are not available on the device (std::shared_ptr
for example),
when a tensor is passed to the device it is upcasted to the base class of tensor_impl_t
. tensor_impl_t
contains only types that are available on both the host and device, and provides a minimal set of functionality
needed for device code.
For information on creating tensors, please see Creating Tensors or Quick Start for common usage.
-
template<typename T, int RANK, typename Storage = DefaultStorage<T>, typename Desc = DefaultDescriptor<RANK>>
class tensor_t : public matx::detail::tensor_impl_t<T, RANK, DefaultDescriptor<RANK>># View of an underlying tensor data object
Tensor views do not modify the underlying data; they simply present a different way to look at the data. This includes where the data begins and ends, the stride, the rank, etc. Views are very lightweight, and any number of views can be generated from the same data object. Since views represent different ways of looking at the same data, it is the responsibility of the user to ensure that proper synchronization is done when using multiple views on the same data. Failure to do so can result in race conditions on the device or host.
Public Types
-
using matxop = bool#
Indicate this is a MatX operator.
-
using matxoplvalue = bool#
Indicate this is a MatX operator that can be on the lhs of an equation.
-
using tensor_view = bool#
Indicate this is a MatX tensor view.
Public Functions
-
inline tensor_t()#
Construct a new 0-D tensor t object.
-
__MATX_HOST__ inline tensor_t(tensor_t const &rhs) noexcept#
Copy constructor.
- Parameters:
rhs – Object to copy from
-
__MATX_HOST__ inline tensor_t(tensor_t &&rhs) noexcept#
Move constructor.
- Parameters:
rhs – Object to move from
-
__MATX_HOST__ inline void Shallow(const self_type &rhs) noexcept#
Perform a shallow copy of a tensor view
Alternative to operator= since it’s used for lazy evaluation. This function is used to perform a shallow copy of a tensor view where the data pointer points to the same location as the right hand side’s data. *
- Parameters:
rhs – Tensor to copy from
-
template<typename S2 = Storage, typename D2 = Desc, std::enable_if_t<is_matx_storage_v<typename remove_cvref<S2>::type> && is_matx_descriptor_v<typename remove_cvref<D2>::type>, bool> = true>
inline tensor_t(S2 &&s, D2 &&desc)# Construct a new tensor t object from an arbitrary shape and descriptor.
- Template Parameters:
S2 – Shape type
D2 – Descriptor type
- Parameters:
s – Shape object
desc – Descriptor object
-
template<typename D2 = Desc>
inline tensor_t(Storage s, D2 &&desc, T *ldata)# Construct a new tensor t object. Used to copy an existing storage object for proper reference counting.
- Parameters:
s –
desc –
ldata –
-
template<typename D2 = Desc, typename = typename std::enable_if_t<is_matx_descriptor_v<D2>>>
__MATX_INLINE__ inline tensor_t(D2 &&desc)# Constructor for a rank-1 and above tensor.
- Parameters:
desc – Tensor descriptor
-
__MATX_INLINE__ inline tensor_t(const std::initializer_list<detail::no_size_t>)#
Constructor for a rank-0 tensor.
NOTE: Use empty braces {} for the unused parameter.
-
__MATX_INLINE__ inline tensor_t(const typename Desc::shape_type (&shape)[RANK])#
Constructor for a rank-1 and above tensor.
- Parameters:
shape – Tensor shape
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator=(const self_type &op)#
Lazy assignment operator=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator=(const T2 &op)# Lazy assignment operator=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator+=(const self_type &op)#
Lazy assignment operator+=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator+=(const T2 &op)# Lazy assignment operator+=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator-=(const self_type &op)#
Lazy assignment operator-=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator-=(const T2 &op)# Lazy assignment operator-=. Used to create a “set” object for deferred execution on a device
- Template Parameters:
T2 – Type of operator
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator*=(const self_type &op)#
Lazy assignment operator*=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator*=(const T2 &op)# Lazy assignment operator*=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator/=(const self_type &op)#
Lazy assignment operator/=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator/=(const T2 &op)# Lazy assignment operator/=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator<<=(const self_type &op)#
Lazy assignment operator<<=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator<<=(const T2 &op)# Lazy assignment operator<<=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator>>=(const self_type &op)#
Lazy assignment operator>>=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator>>=(const T2 &op)# Lazy assignment operator>>=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator|=(const self_type &op)#
Lazy assignment operator|=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator|=(const T2 &op)# Lazy assignment operator|=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator&=(const self_type &op)#
Lazy assignment operator&=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator&=(const T2 &op)# Lazy assignment operator&=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator^=(const self_type &op)#
Lazy assignment operator^=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator^=(const T2 &op)# Lazy assignment operator^=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
__MATX_INLINE__ __MATX_HOST__ inline auto operator%=(const self_type &op)#
Lazy assignment operator%=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Tensor view source
- Returns:
set object containing the destination view and source object
-
template<typename T2>
__MATX_INLINE__ __MATX_HOST__ inline auto operator%=(const T2 &op)# Lazy assignment operator%=. Used to create a “set” object for deferred execution on a device
- Parameters:
op – Operator or scalar type to assign
- Returns:
set object containing the destination view and source object
-
template<typename M = T, int R = RANK, typename Shape>
__MATX_INLINE__ inline auto View(Shape &&shape)# Get a view of the tensor from the underlying data using a custom shape
Returns a view based on the shape passed in. Both the rank and the dimensions can be increased or decreased from the original data object as long as they fit within the bounds of the memory allocation. This function only allows a contiguous view of memory, regardless of the shape passed in. For example, if the original shape is {8, 2} and a view of {2, 1} is requested, the data in the new view would be the last two elements of the last dimension of the original data.
The function is similar to MATLAB and Python’s reshape(), except it does NOT make a copy of the data, whereas those languages may, depending on the context. It is up to the user to understand any existing views on the underlying data that may conflict with other views.
While this function is similar to Slice(), it does not allow slicing a particular start and end point as slicing does, and slicing also does not allow increasing the rank of a tensor as View(shape) does.
Note that the type of the data type of the tensor can also change from the original data. This may be useful in situations where a union of data types could be used in different ways. For example, a complex<float> could be reshaped into a float tensor that has twice as many elements, and operations can be done on floats instead of complex types.
- Template Parameters:
M – New type of tensor
R – New rank of tensor
- Parameters:
shape – New shape of tensor
- Returns:
A view of the data with the appropriate strides and dimensions set
-
template<typename ShapeIntType, int NRANK>
__MATX_INLINE__ inline auto View(const ShapeIntType (&shape)[NRANK])# Get a view of the tensor from the underlying data using a custom shape
Returns a view based on the shape passed in. Both the rank and the dimensions can be increased or decreased from the original data object as long as they fit within the bounds of the memory allocation. This function only allows a contiguous view of memory, regardless of the shape passed in. For example, if the original shape is {8, 2} and a view of {2, 1} is requested, the data in the new view would be the last two elements of the last dimension of the original data.
The function is similar to MATLAB and Python’s reshape(), except it does NOT make a copy of the data, whereas those languages may, depending on the context. It is up to the user to understand any existing views on the underlying data that may conflict with other views.
While this function is similar to Slice(), it does not allow slicing a particular start and end point as slicing does, and slicing also does not allow increasing the rank of a tensor as View(shape) does.
Note that the type of the data type of the tensor can also change from the original data. This may be useful in situations where a union of data types could be used in different ways. For example, a complex<float> could be reshaped into a float tensor that has twice as many elements, and operations can be done on floats instead of complex types.
- Template Parameters:
ShapeIntType – Type of integer shape array
NRANK – New rank of tensor
- Parameters:
shape – New shape of tensor
- Returns:
A view of the data with the appropriate strides and dimensions set
-
__MATX_INLINE__ inline auto View()#
Make a copy of a tensor and maintain all refcounts.
- Returns:
Copy of view
-
__MATX_INLINE__ inline void PrefetchDevice(cudaStream_t const stream) const noexcept#
Prefetch the data asynchronously from the host to the device.
All copies are done asynchronously in a stream. The order of the copy is predictable within work in the same stream, but not when the transfer will occur.
- Parameters:
stream – The CUDA stream to prefetch within
-
__MATX_INLINE__ inline void PrefetchHost(cudaStream_t const stream) const noexcept#
Prefetch the data asynchronously from the device to the host.
All copies are done asynchronously in a stream. The order of the copy is predictable within work in the same stream, but not when the transfer will occur.
- Parameters:
stream – The CUDA stream to prefetch within
-
template<typename U = T>
__MATX_INLINE__ inline auto RealView() const noexcept# Create a view of only real-valued components of a complex array
Only available on complex data types.
- Returns:
tensor view of only real-valued components
-
__MATX_INLINE__ inline auto GetStorage() noexcept#
Return the storage container from the tensor.
- Returns:
storage container
-
template<typename U = T>
__MATX_INLINE__ inline auto ImagView() const noexcept# Create a view of only imaginary-valued components of a complex array
Only available on complex data types.
- Returns:
tensor view of only imaginary-valued components
-
__MATX_INLINE__ inline auto Permute(const cuda::std::array<int32_t, RANK> &dims) const#
Permute the dimensions of a tensor
Accepts any order of permutation. Number of dimensions must match RANK of tensor
- Template Parameters:
M – Rank of tensor to permute. Should not be used directly
- Parameters:
dims – Dimensions of tensor
- Returns:
tensor view of only imaginary-valued components
-
__MATX_INLINE__ inline auto Permute(const int32_t (&dims)[RANK]) const#
Permute the dimensions of a tensor
Accepts any order of permutation. Number of dimensions must match RANK of tensor
- Template Parameters:
M – Rank of tensor to permute. Should not be used directly
- Parameters:
dims – Dimensions of tensor
- Returns:
tensor view of only imaginary-valued components
-
__MATX_INLINE__ inline auto PermuteMatrix() const#
Permute the last two dimensions of a matrix
Utility function to permute the last two dimensions of a tensor. This is useful in the numerous operations that take a permuted matrix as input, but we don’t want to permute the inner dimensions of a larger tensor.
- Returns:
tensor view with last two dims permuted
-
__MATX_HOST__ __MATX_INLINE__ inline T *Data() const noexcept#
Get the underlying local data pointer from the view
- Returns:
Underlying data pointer of type T
-
template<typename ShapeType, std::enable_if_t<!std::is_pointer_v<typename remove_cvref<ShapeType>::type>, bool> = true>
__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data, ShapeType &&shape) noexcept# Set the underlying data pointer from the view
Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.
- Template Parameters:
ShapeType – Shape type
- Parameters:
data – Data pointer to set
shape – Shape of tensor
-
__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data) noexcept#
Set the underlying data pointer from the view
Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.
- Parameters:
data – Data pointer to set
-
__MATX_HOST__ __MATX_INLINE__ inline void Reset(T *const data, T *const ldata) noexcept#
Set the underlying data and local data pointer from the view
Decrements any reference-counted memory and potentially frees before resetting the data pointer. If refcnt is not nullptr, the count is incremented.
- Parameters:
data – Allocated data pointer
ldata – Local data pointer offset into allocated
-
__MATX_INLINE__ __MATX_HOST__ inline Desc::stride_type Stride(uint32_t dim) const#
Get the stride of a single dimension of the tensor
- Parameters:
dim – Desired dimension
- Returns:
Stride (in elements) in dimension
-
__MATX_INLINE__ __MATX_HOST__ inline auto GetRefCount() const noexcept#
Get the reference count
- Returns:
Reference count or 0 if not tracked
-
template<int N>
__MATX_INLINE__ inline auto OverlapView(const cuda::std::array<typename Desc::shape_type, N> &windows, const cuda::std::array<typename Desc::stride_type, N> &strides) const# Create an overlapping tensor view
Creates an overlapping tensor view where an existing tensor can be repeated into a higher rank with overlapping elements. For example, the following 1D tensor [1 2 3 4 5] could be cloned into a 2D tensor with a window size of 2 and overlap of 1, resulting in:
Currently this only works on 1D tensors going to 2D, but may be expanded for higher dimensions in the future. Note that if the window size does not divide evenly into the existing column dimension, the view may chop off the end of the data to make the tensor rectangular.[1 2 2 3 3 4 4 5]
- Parameters:
windows – Window size (columns in output)
strides – Strides between data elements
- Returns:
Overlapping view of data
-
template<int N>
__MATX_INLINE__ inline auto Clone(const cuda::std::array<index_t, N> &clones) const# Clone a tensor into a higher-dimension tensor
Clone() allows a copy-less method to clone data into a higher dimension tensor. The underlying data does not grow or copy, but instead the indices of the higher-ranked tensor access the original data potentially multiple times. Clone is similar to MATLAB’s repmat() function where it’s desired to take a tensor of a lower dimension and apply an operation with it to a tensor in a higher dimension by broadcasting the values.
For example, in a rank=2 tensor that’s MxN, and a rank=1 tensor that’s 1xN, Clone() can take the rank=1 tensor and broadcast to an MxN rank=2 tensor, and operations such as the Hadamard product can be performed. In this example, the final operation will benefit heavily from device caching since the same 1xN rank=1 tensor will be accessed M times.
- Parameters:
clones – List of sizes of each dimension to clone. Parameter length must match rank of tensor. A special sentinel value of matxKeepDim should be used when the dimension from the original tensor is to be kept.
- Returns:
Cloned view representing the higher-dimension tensor
-
template<int M = RANK, std::enable_if_t<M == 0, bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(T const &val)# Rank-0 initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
val – 0 initializer list value
-
template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 1) || (is_cuda_complex_v<T> && M == 0), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<T> &vals)# Rank-1 non-complex or rank-0 initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
vals – 1D initializer list of values
-
template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 2) || (is_cuda_complex_v<T> && M == 1), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<T>> &vals)# Rank-2 non-complex or rank-1 initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
vals – 1D/2D initializer list of values
-
template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 3) || (is_cuda_complex_v<T> && M == 2), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>> vals)# Rank-3 non-complex or rank-2 complex initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
vals – 3D/2D initializer list of values
-
template<int M = RANK, std::enable_if_t<(!is_cuda_complex_v<T> && M == 4) || (is_cuda_complex_v<T> && M == 3), bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>>> &vals)# Rank-4 non-complex or rank-3 complex initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
vals – 3D/4D initializer list of values
-
template<int M = RANK, std::enable_if_t<is_cuda_complex_v<T> && M == 4, bool> = true>
__MATX_INLINE__ __MATX_HOST__ inline void SetVals(const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<const std::initializer_list<T>>>>> &vals)# Rank-4 complex initializer list setting
Note that for performance reasons only CUDA managed pointers are supported with SetVals at the moment.
- Parameters:
vals – 4D initializer list of values
-
template<int N = RANK, typename StrideType>
__MATX_INLINE__ inline auto Slice(const cuda::std::array<typename Desc::shape_type, RANK> &firsts, const cuda::std::array<typename Desc::shape_type, RANK> &ends, StrideType strides) const# Slice a tensor either within the same dimension or to a lower dimension
Slice() allows a copy-less method to extract a subset of data from one or more dimensions of a tensor. This includes completely dropping an unwanted dimension, or simply taking a piece of a wanted dimension. Slice() is very similar to indexing operations in both Python and MATLAB.
NOTE Users should not call Slice() directly anymore. Use the slice() operator instead.
2) matxDropDim is used to slice (drop) a dimension entirely. This results in a tensor with a smaller rank than the original
- Parameters:
firsts – List of starting index into each dimension. Indexing is 0-based
ends – List of ending index into each dimension. Indexing is 0-based Two special sentinel values can be used: 1) matxEnd is used to indicate the end of that particular dimension without specifying the size. This is similar to “end” in MATLAB and leaving off an end in Python “a[1:]”
strides – List of strides for each dimension. A special sentinel value of matxKeepStride is used to keep the existing stride of the dimension
- Returns:
Sliced view of tensor
-
template<int N = RANK>
__MATX_INLINE__ inline auto Slice(const cuda::std::array<typename Desc::shape_type, RANK> &firsts, const cuda::std::array<typename Desc::shape_type, RANK> &ends) const# Slice a tensor either within the same dimension or to a lower dimension
Slice() allows a copy-less method to extract a subset of data from one or more dimensions of a tensor. This includes completely dropping an unwanted dimension, or simply taking a piece of a wanted dimension. Slice() is very similar to indexing operations in both Python and MATLAB.
2) matxDropDim is used to slice (drop) a dimension entirely. This results in a tensor with a smaller rank than the original
- Parameters:
firsts – List of starting index into each dimension. Indexing is 0-based
ends – List of ending index into each dimension. Indexing is 0-based Two special sentinel values can be used: 1) matxEnd is used to indicate the end of that particular dimension without specifying the size. This is similar to “end” in MATLAB and leaving off an end in Python “a[1:]”
- Returns:
Sliced view of tensor
-
inline DLManagedTensor *GetDLPackTensor() const#
Get a DLPack v0.8 structure representing the tensor.
DLPack is a commonly-used tensor memory layout format for moving tensors between libraries. This function returns a DLPack structure based on a tensor_t. The caller is responsible for freeing the memory by calling ->deleter(self).
Note: This function will increment the reference count of the tensor. It is expected that once a tensor is converted to DLPack someone will eventually call deleter(). If that does not happen a memory leak will occur.
- Returns:
Pointer to new DLManagedTensorVersioned pointer. The caller must call the deleter function when finished.
Friends
-
inline friend void swap(self_type &lhs, self_type &rhs) noexcept#
Swaps two tensors
Swaps members of two tensors, including pointers, shapes, and descriptors
- Parameters:
lhs – Left argument
rhs – Right argument
-
using matxop = bool#