cuda::experimental::fill_bytes#

Overloads#

`fill_bytes(mdspan, byte_value, __stream)`#

template<typename _Tp, typename _Extents, typename _Layout, typename _Accessor, typename _ByteT> inline void cuda::experimental::fill_bytes( ::cuda::device_mdspan<_Tp, _Extents, _Layout, _Accessor> __mdspan, const _ByteT __byte_value, const ::cuda::stream_ref __stream )

Asynchronously fills a device mdspan with a 1-, 2-, or 4-byte pattern.

Asynchronous mdspan byte fill#

fill_bytes asynchronously fills the mapped elements of a device mdspan with a repeated byte pattern on the given CUDA stream. The pattern is the object representation of a 1-, 2-, or 4-byte value. This is a byte operation: it does not assign __byte_value as an object of the destination element type. For strided layouts, only bytes belonging to mapped destination elements are filled; padding bytes outside the mapping are left unchanged.

The operation is enqueued on __stream and may complete after fill_bytes returns. Synchronize the stream, or otherwise order dependent work on the same stream, before observing the filled data.

Destination element and fill value types must be trivially copyable.
The fill value type must have unique object representations and size 1, 2, or 4.
The destination element type must not be const.
The destination element size must be a multiple of the fill value size.
The destination element alignment must be at least the fill value size.
Layout policies must be one of the predefined cuda::std layout policies (layout_right, layout_left, layout_stride) or cuda::layout_stride_relaxed.
Accessor policies must be convertible to cuda::std::default_accessor.
The destination must not have an interleaved stride order.
Zero-size mdspans are no-ops and do not require a non-null data handle.

Integer literals use their usual type. For example, 0 is an int and requests a 4-byte pattern fill; use cuda::std::uint8_t{0} or cuda::std::byte{0} for a byte pattern fill. The implementation is optimized to maximize the contiguous memory regions to fill.

Validates the public preconditions, then dispatches asynchronous memset operations over the mapped destination elements.

Parameters:

__mdspan – [out] Destination device mdspan
__byte_value – [in] Value pattern to fill into the destination
__stream – [in] CUDA stream for the asynchronous fill

Throws:

std::invalid_argument – if __stream is the null stream, or if a non-empty destination has a null data handle, is insufficiently aligned, or has interleaved stride order.

`fill_bytes(pb, dst, __value)`#

template<typename _DstTy, ::cuda::std::enable_if_t<::cuda::__spannable<::cuda::transformed_device_argument_t<_DstTy>>, int> = 0> inline graph_node_ref cuda::experimental::fill_bytes( path_builder &__pb, _DstTy &&__dst, ::cuda::std::uint8_t __value )

Adds a memset node to a CUDA graph path that bytewise-fills the destination.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload is selected when the destination (after applying launch_transform) is a cuda::std::mdspan.

This overload is selected when the destination (after applying launch_transform) is a contiguous range convertible to cuda::std::span. The element type must be trivially copyable and non-const. The pointer captured in the node must remain valid until the graph executes.

The mdspan must be exhaustive. The element type must be trivially copyable and non-const. The pointer captured in the node must remain valid until the graph executes.

Parameters:

__pb – Path builder to insert the node into.
__dst – Destination memory to fill.
__value – Byte value to write to every byte of the destination.

Throws:

cuda::std::cuda_error – if node creation fails.

Returns:

A graph_node_ref for the newly added memset node.

cuda::experimental::fill_bytes#

Overloads#

fill_bytes(__mdspan, __byte_value, __stream)#

Asynchronous mdspan byte fill#

fill_bytes(__pb, __dst, __value)#

`fill_bytes(mdspan, byte_value, __stream)`#

`fill_bytes(pb, dst, __value)`#