cuda.core.experimental.utils.StridedLayout#

class cuda.core.experimental.utils.StridedLayout(tuple shape: tuple[int], tuple strides: tuple[int] | None, int itemsize: int, bool divide_strides: bool = False)#

A class describing the layout of a multi-dimensional tensor with a shape, strides and itemsize.

Parameters:

shape (tuple) – A tuple of non-negative integers.
strides (tuple, optional) – If provided, must be a tuple of integers of the same length as shape. Otherwise, the strides are assumed to be implicitly C-contiguous and the resulting layout’s strides will be None.
itemsize (int) – The number of bytes per single element (dtype size). Must be a power of two.
divide_strides (bool, optional) – If True, the provided strides will be divided by the itemsize.

See also is_contiguous_any.

Type:: bool

is_dense#

A dense layout is contiguous (is_contiguous_any is True) and has no slice offset (slice_offset_in_bytes is 0).

In a dense layout, elements are mapped 1-to-1 to the [0, volume - 1] memory offset range.

Type:: bool

is_unique#

If True, each element of a tensor with this layout is mapped to a unique memory offset.

All contiguous layouts are unique and so are layouts that can be created by permuting, slicing, flattening, squeezing, repacking, or reshaping a contiguous layout. Conversely, broadcast layouts (layouts with a 0 stride for some extent greater than 1) are not unique.

For layouts resulting from manual stride manipulations (such as with numpy.lib.stride_tricks), the check may inaccurately report False, as the exact uniqueness check may be expensive.

Type:: bool

max_offset#

See offset_bounds for details.

Type:: int

min_offset#

See offset_bounds for details.

Type:: int

ndim#

The number of dimensions (length of the shape tuple).

Type:: int

offset_bounds#

The memory offset range [min_offset, max_offset] (in element counts, not bytes) that elements of a tensor with this layout are mapped to.

If the layout is empty (i.e. volume == 0), the returned tuple is (0, -1). Otherwise, min_offset <= max_offset and all elements of the tensor with this layout are mapped within the [min_offset, max_offset] range.

# Possible implementation of the offset_bounds
def offset_bounds(layout : StridedLayout):
    if layout.volume == 0:
        return 0, -1
    ndim = layout.ndim
    shape = layout.shape
    strides = layout.strides
    idx_min = [shape[i] - 1 if strides[i] < 0 else 0 for i in range(ndim)]
    idx_max = [shape[i] - 1 if strides[i] > 0 else 0 for i in range(ndim)]
    min_offset = sum(strides[i] * idx_min[i] for i in range(ndim)) + layout.slice_offset
    max_offset = sum(strides[i] * idx_max[i] for i in range(ndim)) + layout.slice_offset
    return min_offset, max_offset

Type:: tuple[int, int]

shape#

Shape of the tensor.

Type:: tuple[int]

slice_offset_in_bytes#

The memory offset (as a number of bytes) of the element at index (0,) * ndim. Equal to itemsize * slice_offset.

Note

The only way for the index (0,) * ndim to be mapped to a non-zero offset is slicing with sliced() method (or [] operator).

Type:: int

stride_order#

A permutation of tuple(range(ndim)) describing the relative order of the strides.

# C-contiguous layout
assert StridedLayout.dense((5, 3, 7), 1).stride_order == (0, 1, 2)
# F-contiguous layout
assert StridedLayout.dense((5, 3, 7), 1, stride_order="F").stride_order == (2, 1, 0)
# Permuted layout
assert StridedLayout.dense((5, 3, 7), 1, stride_order=(2, 0, 1)).stride_order == (2, 0, 1)

Type:: tuple[int]

strides#

Strides of the tensor (in counts, not bytes). If StridedLayout was created with strides=None, the returned value is None and layout is implicitly C-contiguous.

Type:: tuple[int] | None

strides_in_bytes#

Strides of the tensor (in bytes).

Type:: tuple[int] | None

volume#

The number of elements in the tensor, i.e. the product of the shape tuple.

Type:: int

Methods

__init__(*args, **kwargs)#

broadcast_to(self: StridedLayout, tuple shape: tuple[int]) → StridedLayout#

Returns a layout with the new shape, if the old shape can be broadcast to the new one.

The shapes are compatible if:

the new shape has the same or greater number of dimensions
starting from the right, each extent in the old shape must be 1 or equal to the corresponding extent in the new shape.

Strides of the added or modified extents are set to 0, the remaining ones are unchanged. If the shapes are not compatible, a ValueError is raised.

classmethod dense(cls, tuple shape: tuple[int], int itemsize: int, stride_order: str | tuple[int] = 'C') → StridedLayout#

Creates a new StridedLayout instance with dense strides.

Parameters:

shape (tuple) – A tuple of non-negative integers.
itemsize (int) – The number of bytes per single element of the tensor.
stride_order (str or tuple, optional) –
The order of the strides:
- ’C’ (default) - the strides are computed in C-order (increasing from the right to the left)
- ’F’ - the strides are computed in F-order (increasing from the left to the right)
- A tuple - it must be a permutation of tuple(range(len(shape))). The last element of the tuple is the axis with stride 1.
See also stride_order.

assert StridedLayout.dense((5, 3, 7), 1, "C") == StridedLayout((5, 3, 7), (21, 7, 1), 1)
assert StridedLayout.dense((5, 3, 7), 1, "F") == StridedLayout((5, 3, 7), (1, 5, 15), 1)
assert StridedLayout.dense((5, 3, 7), 1, (2, 0, 1)) == StridedLayout((5, 3, 7), (3, 1, 15), 1)

classmethod dense_like( cls, StridedLayout other: StridedLayout, stride_order: str | tuple[int] = 'K', ) → StridedLayout#

Creates a StridedLayout with the same shape and itemsize as the other layout, but with contiguous strides in the specified order and no slice offset.

See also is_dense.

Parameters:

other (StridedLayout) – The StridedLayout to copy the shape and itemsize from.
stride_order (str or tuple, optional) –
The order of the strides:
- ’K’ (default) - keeps the order of the strides as in the other layout.
- ’C’ - the strides are computed in C-order (increasing from the right to the left)
- ’F’ - the strides are computed in F-order (increasing from the left to the right)
- A tuple - it must be a permutation of tuple(range(len(shape))). The last element of the tuple is the axis with stride 1.
See also stride_order.

layout = StridedLayout.dense((5, 3, 7), 1).permuted((2, 0, 1))
assert layout == StridedLayout((7, 5, 3), (1, 21, 7), 1)

# dense_like with the default "K" stride_order
# keeps the same order of strides as in the original layout
assert StridedLayout.dense_like(layout) == layout
# "C", "F" recompute the strides accordingly
assert StridedLayout.dense_like(layout, "C") == StridedLayout((7, 5, 3), (15, 3, 1), 1)
assert StridedLayout.dense_like(layout, "F") == StridedLayout((7, 5, 3), (1, 7, 35), 1)

flattened( self: StridedLayout, int start_axis: int = 0, int end_axis: int = -1, int mask: int | None = None, ) → StridedLayout#

Merges consecutive extents into a single extent (equal to the product of merged extents) if the corresponding strides can be replaced with a single stride (assuming indices are iterated in C-order, i.e. the rightmost axis is incremented first).

# the two extents can be merged into a single extent
# because layout.strides[0] == layout.strides[1] * layout.shape[1]
layout = StridedLayout((3, 2), (2, 1), 1)
assert layout.flattened() == StridedLayout((6,), (1,), 1)

# the two extents cannot be merged into a single extent
# because layout.strides[0] != layout.strides[1] * layout.shape[1]
layout = StridedLayout((3, 2), (1, 3), 1)
assert layout.flattened() == layout

If start_axis and end_axis are provided, only the axes in the inclusive range [start_axis, end_axis] are considered for flattening.

Alternatively, a mask specifying which axes to consider can be provided. A mask of mergeable extents can be obtained using the flattened_axis_mask() method. Masks for layouts with the same number of dimensions can be combined using the logical & (bitwise AND) operator.

layout = StridedLayout.dense((4, 5, 3), 4)
layout2 = StridedLayout((4, 5, 3), (1, 12, 4), 4)
# Even though the two layouts have the same shape initially,
# their shapes differ after flattening.
assert layout.flattened() == StridedLayout((60,), (1,), 4)
assert layout2.flattened() == StridedLayout((4, 15), (1, 4), 4)
# With the mask, only extents that are mergeable in both layouts are flattened
# and the resulting shape is the same for both layouts.
mask = layout.flattened_axis_mask() & layout2.flattened_axis_mask()
assert layout.flattened(mask=mask) == StridedLayout((4, 15), (15, 1), 4)
assert layout2.flattened(mask=mask) == StridedLayout((4, 15), (1, 4), 4)

flattened_axis_mask(self: StridedLayout) → axes_mask_t#: A mask describing which axes of this layout are mergeable using the flattened() method.

max_compatible_itemsize( self: StridedLayout, int max_itemsize: int = 16, uintptr_t data_ptr: uintptr_t = 0, int axis: int = -1, ) → int#: Returns the maximum itemsize (but no greater than max_itemsize) that can be used with the repacked() method for the current layout.

permuted(self: StridedLayout, tuple axis_order: tuple[int]) → StridedLayout#: Returns a new layout where the shape and strides tuples are permuted according to the specified permutation of axes.

repacked( self: StridedLayout, int itemsize: int, uintptr_t data_ptr: uintptr_t = 0, int axis: int = -1, bool keep_dim: bool = True, ) → StridedLayout#

Converts the layout to match the specified itemsize. If new_itemsize < itemsize, each element of the tensor is unpacked into multiple elements, i.e. the extent at axis increases by the factor itemsize // new_itemsize. If new_itemsize > itemsize, the consecutive elements in the tensor are packed into a single element, i.e. the extent at axis decreases by the factor new_itemsize // itemsize. In either case, the volume * itemsize of the layout remains the same.

The conversion is subject to the following constraints:

The old and new itemsizes must be powers of two.
The extent at axis must be a positive integer.
The stride at axis must be 1.

Moreover, if the new_itemsize > itemsize:

The extent at axis must be divisible by new_itemsize // itemsize.
All other strides must be divisible by new_itemsize // itemsize.
The slice_offset must be divisible by new_itemsize // itemsize.
If data_ptr is provided, it must be aligned to the new itemsize.

The maximum itemsize that satisfies all the constraints can be obtained using the max_compatible_itemsize() method.

If the keep_dim is False and the extent at axis would be reduced to 1, it is omitted from the returned layout.

# Repacking the layout with itemsize = 4 bytes as 2, 8, and 16 sized layouts.
layout = StridedLayout.dense((5, 4), 4)
assert layout.repacked(2) == StridedLayout.dense((5, 8), 2)
assert layout.repacked(8) == StridedLayout.dense((5, 2), 8)
assert layout.repacked(16) == StridedLayout.dense((5, 1), 16)
assert layout.repacked(16, keep_dim=False) == StridedLayout.dense((5,), 16)

# Viewing (5, 6) float array as (5, 3) complex64 array.
a = numpy.ones((5, 6), dtype=numpy.float32)
float_view = StridedMemoryView(a, -1)
layout = float_view.layout
assert layout.shape == (5, 6)
assert layout.itemsize == 4
complex_view = float_view.view(layout.repacked(8), numpy.complex64)
assert complex_view.layout.shape == (5, 3)
assert complex_view.layout.itemsize == 8
b = numpy.from_dlpack(complex_view)
assert b.shape == (5, 3)

required_size_in_bytes(self: StridedLayout) → int#

The memory allocation size (in bytes) needed so that all elements of a tensor with this layout can be mapped within the allocated memory range.

The function raises an error if min_offset < 0. Otherwise, the returned value is equal to (max_offset + 1) * itemsize.

Hint

For dense layouts, the function always succeeds and the (max_offset + 1) * itemsize is equal to the volume * itemsize.

# Allocating memory on a device to copy a host tensor
def device_tensor_like(a : numpy.ndarray, device : ccx.Device) -> StridedMemoryView:
    a_view = StridedMemoryView(a, -1)
    # get the original layout of ``a`` and convert it to a dense layout
    # to avoid overallocating memory (e.g. if the ``a`` was sliced)
    layout = a_view.layout.to_dense()
    # get the required size in bytes to fit the tensor
    required_size = layout.required_size_in_bytes()
    # allocate the memory on the device
    device.set_current()
    mem = device.allocate(required_size)
    # create a view on the newly allocated device memory
    b_view = StridedMemoryView.from_buffer(mem, layout, a_view.dtype)
    return b_view

reshaped(self: StridedLayout, tuple shape: tuple[int]) → StridedLayout#

Returns a layout with the new shape, if the new shape is compatible with the current layout.

The new shape is compatible if:

the new and old shapes have the same volume
the old strides can be split or flattened to match the new shape, assuming indices are iterated in C-order

A single extent in the shape tuple can be set to -1 to indicate it should be inferred from the old volume and the other extents.

layout = StridedLayout.dense((5, 3, 4), 1)
assert layout.reshaped((20, 3)) == StridedLayout.dense((20, 3), 1)
assert layout.reshaped((4, -1)) == StridedLayout.dense((4, 15), 1)
assert layout.permuted((2, 0, 1)).reshaped((4, 15,)) == StridedLayout((4, 15), (1, 4), 1)
# layout.permuted((2, 0, 1)).reshaped((20, 3)) -> error

sliced( self: StridedLayout, slices: int | slice | tuple[int | slice], ) → StridedLayout#: Returns a sliced layout. The slices parameter can be a single integer, a single slice object or a tuple of integers/slices.

Hint

For convenience, instead of calling this method directly, please rely on the __getitem__() operator (i.e. bracket syntax), e.g.: layout[:, start:end:step].

Note

Slicing is purely a layout transformation and does not involve any data access.

squeezed(self: StridedLayout) → StridedLayout#: Returns a new layout where all the singleton dimensions (extents equal to 1) are removed. Additionally, if the layout volume is 0, the returned layout will be reduced to a 1-dim layout with shape (0,) and strides (0,).

to_dense( self: StridedLayout, stride_order='K', ) → StridedLayout#

Returns a dense layout with the same shape and itemsize, but with dense strides in the specified order.

See dense_like() method documentation for details.

unsqueezed( self: StridedLayout, axis: int | tuple[int], ) → StridedLayout#: Returns a new layout where the specified axis or axes are added as singleton extents. The axis can be either a single integer in range [0, ndim] or a tuple of unique integers in range [0, ndim + len(axis) - 1].