cuda::experimental::fill_bytes#
Overloads#
fill_bytes(__mdspan, __byte_value, __stream)#
-
template<typename _Tp, typename _Extents, typename _Layout, typename _Accessor, typename _ByteT>
inline void cuda::experimental::fill_bytes( - ::cuda::device_mdspan<_Tp, _Extents, _Layout, _Accessor> __mdspan,
- const _ByteT __byte_value,
- const ::cuda::stream_ref __stream
Asynchronously fills a device mdspan with a 1-, 2-, or 4-byte pattern.
Asynchronous mdspan byte fill#
fill_bytesasynchronously fills the mapped elements of a devicemdspanwith a repeated byte pattern on the given CUDA stream. The pattern is the object representation of a 1-, 2-, or 4-byte value. This is a byte operation: it does not assign__byte_valueas an object of the destination element type. For strided layouts, only bytes belonging to mapped destination elements are filled; padding bytes outside the mapping are left unchanged.The operation is enqueued on
__streamand may complete afterfill_bytesreturns. Synchronize the stream, or otherwise order dependent work on the same stream, before observing the filled data.Destination element and fill value types must be trivially copyable.
The fill value type must have unique object representations and size 1, 2, or 4.
The destination element type must not be
const.The destination element size must be a multiple of the fill value size.
The destination element alignment must be at least the fill value size.
Layout policies must be one of the predefined
cuda::stdlayout policies (layout_right,layout_left,layout_stride) orcuda::layout_stride_relaxed.Accessor policies must be convertible to
cuda::std::default_accessor.The destination must not have an interleaved stride order.
Zero-size mdspans are no-ops and do not require a non-null data handle.
Integer literals use their usual type. For example,
0is anintand requests a 4-byte pattern fill; usecuda::std::uint8_t{0}orcuda::std::byte{0}for a byte pattern fill. The implementation is optimized to maximize the contiguous memory regions to fill.Validates the public preconditions, then dispatches asynchronous memset operations over the mapped destination elements.
- Parameters:
__mdspan – [out] Destination device mdspan
__byte_value – [in] Value pattern to fill into the destination
__stream – [in] CUDA stream for the asynchronous fill
- Throws:
std::invalid_argument – if
__streamis the null stream, or if a non-empty destination has a null data handle, is insufficiently aligned, or has interleaved stride order.
fill_bytes(__pb, __dst, __value)#
-
template<typename _DstTy, ::cuda::std::enable_if_t<::cuda::__spannable<::cuda::transformed_device_argument_t<_DstTy>>, int> = 0>
inline graph_node_ref cuda::experimental::fill_bytes( - path_builder &__pb,
- _DstTy &&__dst,
- ::cuda::std::uint8_t __value
Adds a memset node to a CUDA graph path that bytewise-fills the destination.
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload is selected when the destination (after applying
launch_transform) is acuda::std::mdspan.This overload is selected when the destination (after applying
launch_transform) is a contiguous range convertible tocuda::std::span. The element type must be trivially copyable and non-const. The pointer captured in the node must remain valid until the graph executes.The mdspan must be exhaustive. The element type must be trivially copyable and non-const. The pointer captured in the node must remain valid until the graph executes.
- Parameters:
__pb – Path builder to insert the node into.
__dst – Destination memory to fill.
__value – Byte value to write to every byte of the destination.
- Throws:
cuda::std::cuda_error – if node creation fails.
- Returns:
A
graph_node_reffor the newly added memset node.