cuDecomp Fortran API

These are all the types and functions available in the cuDecomp Fortran API.

Types

Internal types

cudecompHandle

type cudecompHandle: A cuDecomp internal handle structure.

cudecompGridDesc

type cudecompGridDesc: A cuDecomp internal grid descriptor structure.

Grid Descriptor Configuration

cudecompGridDescConfig

type cudecompGridDescConfig

A data structure defining configuration options for grid descriptor creation.

Type fields:

% gdims (3) [integer] :: dimensions of global data grid
% gdims_dist (3) [integer] :: dimensions of global data grid to use for distribution
% pdims (2) [integer] :: dimensions of process grid
% transpose_comm_backend [cudecompTransposeCommType] :: communication backend to use for transpose communication (default: CUDECOMP_TRANSPOSE_COMM_MPI_P2P)
% transpose_axis_contiguous (3) [logical] :: flag (by axis) indicating if memory should be contiguous along pencil axis (default: [false, false, false])
% transpose_mem_order (3, 3) [integer] :: user-specified memory ordering by axis, overrides transpose_axis_contiguous setting; second index specifies axis, first index specifies memory order (default: unset)
% halo_comm_backend [cudecompHaloCommType] :: communication backend to use for halo communication (default: CUDECOMP_HALO_COMM_MPI)

cudecompGridDescAutotuneOptions

type cudecompGridDescAutotuneOptions

A data structure defining autotuning options for grid descriptor creation.

Type fields:

% n_warmup_trials [integer] :: number of warmup trials to run for each tested configuration during autotuning
% n_trials [integer] :: number of timed trials to run for each tested configuration during autotuning
% grid_mode [cudecompAutotuneGridMode] :: which communication (transpose/halo) to use to autotune process grid (default: CUDECOMP_AUTOTUNE_GRID_TRANSPOSE)
% dtype [cudecompDataType] :: datatype to use during autotuning (default: CUDECOMP_DOUBLE)
% allow_uneven_distributions [logical] :: flag to control whether autotuning allows process grids that result in uneven distributions of elements across processes (default: true)
% disable_nccl_backends [logical] :: flag to disable NCCL backend options during autotuning (default: false)
% disable_nvshmem_backends [logical] :: flag to disable NVSHMEM backend options during autotuning (default: false)
% skip_threshold [real(c_double)] :: threshold used to skip testing slow configurations; skip configuration if skip_threshold * t > t_best, where t is the duration of the first timed trial for the configuration and t_best is the average trial time of the current best configuration (default: 0.0)
% autotune_transpose_backend [logical] :: flag to enable transpose backend autotuning (default: false)
% transpose_use_inplace_buffers (4) [logical] :: flag to control whether transpose autotuning uses in-place or out-of-place buffers by operation, considering the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X (default: [false, false, false, false])
% transpose_op_weights (4) [real(c_double)] :: multiplicative weight to apply to trial time contribution by transpose operation in the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X (default: [1.0, 1.0, 1.0, 1.0])
% transpose_input_halo_extents (3, 4) [integer] :: input_halo_extents argument to use during autotuning by transpose operation; second index specifies operation in the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X, first index specifies halo_extent argument (default: all zeros, no halos)
% transpose_output_halo_extents (3, 4) [integer] :: output_halo_extents argument to use during autotuning by transpose operation; second index specifies operation in the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X, first index specifies halo_extent argument (default: all zeros, no halos)
% transpose_input_padding (3, 4) [integer] :: input_padding argument to use during autotuning by transpose operation; second index specifies operation in the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X, first index specifies halo_extent argument (default: all zeros, no padding)
% transpose_output_padding (3, 4) [integer] :: output_padding argument to use during autotuning by transpose operation; second index specifies operation in the following order: X-to-Y, Y-to-Z, Z-to-Y, Y-to-X, first index specifies halo_extent argument (default: all zeros, no padding)
% autotune_halo_backend [logical] :: flag to enable halo backend autotuning (default: false)
% halo_extents (3) [integer] :: extents for halo autotuning (default: [0, 0, 0])
% halo_periods (3) [logical] :: periodicity for halo autotuning (default: [false, false, false])
% halo_axis [integer] :: which axis pencils to use for halo autotuning (default: 1, X-pencils)
% halo_padding (3) [integer] :: padding argument for halo autotuning (default: [0, 0, 0])

Pencil Information

cudecompPencilInfo

type cudecompPencilInfo

A data structure containing geometry information about a pencil data buffer.

Type fields:

% shape (3) [integer] :: pencil shape (in local order, including halo and padding elements)
% lo (3) [integer] :: lower bound coordinates (in local order, excluding halo and padding elements)
% hi (3) [integer] :: upper bound coordinates (in local order, excluding halo and padding elements)
% order (3) [integer] :: data layout order (e.g. 3,2,1 means memory is ordered Z,Y,X)
% halo_extents (3) [integer] :: halo extents by dimension (in global order)
% padding (3) [integer] :: padding by dimension (in global order)
% size [int64] :: number of elements in pencil (including halo and padding elements)

Communication Backends

cudecompTranposeCommBackend

See documention for equivalent C enumerator, cudecompTranposeCommBackend_t.

cudecompHaloCommBackend

See documention for equivalent C enumerator, cudecompHaloCommBackend_t.

Additional Enumerators

cudecompDataType

See documention for equivalent C enumerator, cudecompDataType_t.

cudecompAutotuneGridMode

See documention for equivalent C enumerator, cudecompAutotuneGridMode_t.

cudecompResult

See documention for equivalent C enumerator, cudecompResult_t.

Functions

Library Initialization/Finalization

cudecompInit

function cudecompInit(handle, mpi_comm)

Initializes the cuDecomp library from an existing MPI communicator.

Parameters:

handle [cudecompHandle,out] :: An uninitilzied cuDecomp library handle.
mpi_comm [MPI_Comm,in] :: MPI communicator containing ranks to use with cuDecomp.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompFinalize

function cudecompFinalize(handle)

Finalizes the cuDecomp library and frees associated resources.

Parameters:: handle [cudecompHandle,in] :: The initialized cuDecomp library handle
Return:: res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

Grid Descriptor Management

cudecompGridDescCreate

function cudecompGridDescCreate(handle, grid_desc, config[, options])

Creates a cuDecomp grid descriptor for use with cuDecomp functions.

This function creates a grid descriptor that cuDecomp requires for most library operations that perform communication or query decomposition information. This grid descriptor contains information about how the global data grid is distributed and other internal resources to facilitate communication.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,out] :: An uninitalized cuDecomp grid descriptor.
config [cudecompGridDescConfig,inout] :: A populated cuDecomp grid descriptor configuration structure. This structure defines the required attributes of the decomposition. On successful exit, fields in this structure may be updated to reflect autotuning results.
cudecompGridDescAutotuneOptions [in,optional] :: A populated cuDeomp grid descriptor autotune options structure. This options structure is used to control the behavior of the process grid and communication backend autotuning.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGridDescDestroy

function cudecompGridDescDestroy(handle, grid_desc)

Destroys a cuDecomp grid descriptor and frees associated resources.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGridDescConfigSetDefaults

function cudecompGridDescConfigSetDefaults(config)

Initializes a cuDecomp grid descriptor configuration structure with default values.

This function initializes entries in a cuDecomp grid descriptor configuration structure to default values.

Parameters:: config [cudecompGridDescConfig,out] :: A cuDecomp grid descriptor configuration structure.
Return:: res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGridDescAutotuneOptionsSetDefaults

function cudecompGridDescAutotuneOptionsSetDefaults(options)

Initializes a cuDecomp grid descriptor autotune options structure with default values.

This function initializes entries in a cuDecomp grid descriptor autotune options structure to default values.

Parameters:: options [cudecompGridDescAutotuneOptions,out] :: A cuDecomp grid descriptor autotune options structure.
Return:: res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

Workspace Management

cudecompGetTransposeWorkspaceSize

function cudecompGetTransposeWorkspaceSize(handle, grid_desc, workspace_size)

Queries the required transpose workspace size, in elements, for a provided grid descriptor.

This function queries the required workspace size, in elements, for transposition communication using a provided grid descriptor. This workspace is required to faciliate local transposition/packing/unpacking operations, or for use as a staging buffer.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
workspace_size [int64,out] :: the required workspace size.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGetHaloWorkspaceSize

function cudecompGetHaloWorkspaceSize(handle, grid_desc, axis, halo_extents, workspace_size)

Queries the required transpose workspace size, in elements, for a provided grid descriptor.

This function queries the required workspace size, in elements, for transposition communication using a provided grid descriptor. This workspace is required to faciliate local transposition/packing/unpacking operations, or for use as a staging buffer.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
axis [integer,in] :: The domain axis the desired pencil is aligned with.
halo_extents (3) [integer,in] :: An array of three integers to define halo region extents of the pencil, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side).
workspace_size [int64,out] :: the required workspace size.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGetDataTypeSize

function cudecompGetDataTypeSize(dtype, dtype_size)

Function to get size (in bytes) of a cuDecomp data type.

Parameters:

dtype [cudecompDataType,in] :: A cudecompDataType value.
dtype_size [int64,out] :: the data type size in bytes.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompMalloc

function cudecompMalloc(handle, grid_desc, buffer, buffer_size)

Allocation function for cuDecomp workspaces.

This function should be used to allocate cuDecomp workspaces. It will select an appropriate allocator based on the communication backend information found in the provided grid descriptor. At the current time, only NVSHMEM-enabled backends require a special allocation (using nvshmem_malloc). This function is collective and should be called on all workers to avoid deadlocks. Additionally, any memory allocated using this function is invalidated if the provided grid descriptor is destroyed and care are should be taken free memory allocated using this function before the provided grid descriptor is destroyed.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
buffer (*) [T,out] :: A Fortran pointer to device memory of type T, where T is one of real(real32), real(real64), complex(real32), complex(real64).
buffer_size [int64,in] :: size of requested allocation, in number of elements of type T.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompFree

function cudecompFree(handle, grid_desc, buffer)

Deallocation function for cuDecomp workspaces.

This function should be used to deallocate memory allocate with cudecompMalloc. It will select an appropriate deallocation function based on the communication backend information found in the provided grid descriptor. At the current time, only NVSHMEM-enabled backends require a special deallocation (using nvshmem_free). This function is collective and should be called on all workers to avoid deadlocks.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
buffer (*) [T,out] :: A Fortran pointer to device memory of type T, where T is one of real(real32), real(real64), complex(real32), complex(real64), pointing to memory allocated with cudecompMalloc.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

Helper Functions

cudecompGetPencilInfo

function cudecompGetPencilInfo(handle, grid_desc, pencil_info, axis[, halo_extents, padding])

Collects geometry information about assigned pencils, by domain axis.

This function queries information about the pencil assigned to the calling worker for the given axis. This information is collected in a cuDecomp pencil information structure, which can be used to access and manipuate data within the user-allocated memory buffer.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
pencil_info [cudecompPencilInfo,out] :: A cuDecomp pencil information structure.
axis [integer,in] :: The domain axis the desired pencil is aligned with.
halo_extents (3) [integer,in,optional] :: An array of three integers to define halo region extents of the pencil, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side).
padding (3) [integer,in,optional] :: An array of three integers to define padding of the pencil, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompTranposeCommBackendToString

function cudecompTransposeCommBackendToString(comm_backend)

Function to get string name of transpose communication backend.

Parameters:: comm_backend [cudecompTransposeCommBackend,in] :: A cuDecompTranposeCommBackend value.
Return:: res [character(:)] :: A string representation of the transpose communication backend. Will return string “ERROR” if invalid backend value is provided.

cudecompHaloCommBackendToString

function cudecompHaloCommBackendToString(comm_backend)

Function to get string name of transpose communication backend.

Parameters:: comm_backend [cudecompHaloCommBackend,in] :: A cuDecompHaloCommBackend value.
Return:: res [character(:)] :: A string representation of the halo communication backend. Will return string “ERROR” if invalid backend value is provided.

cudecompGetGridDescConfig

function cudecompGetGridDescConfig(handle, grid_desc, config)

Queries the configuration used to create a grid descriptor.

This function queries information about the pencil assigned to the calling worker for the given axis. This information is collected in a cuDecomp pencil information structure, which can be used to access and manipuate data within the user-allocated memory buffer.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
config [cudecompGridDescConfig,out] :: A cuDecomp grid descriptor configuration structure.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompGetShiftedRank

function cudecompGetShiftedRank(handle, grid_desc, axis, dim, displacement, periodic, shifted_rank)

Function to retrieve the global rank of neighboring processes.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
axis [integer,in] :: The domain axis the pencil is aligned with.
dim [integer,in] :: Which pencil dimension (global indexed) to retrieve neighboring rank
displacement [integer,in] :: Displacement of neighboring rank to retrieve. For example, 1 will retrieve the +1-th neighbor rank along dim, while -1 will retrieve the -1-th neighbor rank.
periodic [logical,in] :: A boolean flag to indicate whether dim should be treated periodically
shifted_rank [integer,out] :: The global rank of the requested neighbor. For non-periodic cases, a value of -1 will be written if the displacement results in a position outside the global domain.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

Transposition Functions

cudecompTransposeXToY

function cudecompTransposeXToY(handle, grid_desc, input, output, work, dtype[, input_halo_extents, output_halo_extents, input_padding, output_padding, stream])

Function to transpose data from X-axis aligned pencils to a Y-axis aligned pencils.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in] :: Device array containing input X-axis aligned pencil data.
output (*) [T,out] :: Device array to write output Y-axis aligned pencil data. If input and output are the same, operation is performed in-place
work (*) [T,in] :: Device array to use for transpose workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
input_halo_extents (3) [integer,in,optional] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side). If not provided, input data is assumed to have no halos.
output_halo_extents (3) [integer,in,optional] :: Similar to input_halo_extents but for the output data. If not provided, output data is assumed to have no halos.
input_padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
output_padding (3) [integer,in,optional] :: Similar to input_padding, but for the output data.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompTransposeYtoZ

function cudecompTransposeYToZ(handle, grid_desc, input, output, work, dtype[, input_halo_extents, output_halo_extents, input_padding, output_padding, stream])

Function to transpose data from Y-axis aligned pencils to a Z-axis aligned pencils.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in] :: Device array containing input Y-axis aligned pencil data.
output (*) [T,out] :: Device array to write output Z-axis aligned pencil data. If input and output are the same, operation is performed in-place
work (*) [T,in] :: Device array to use for transpose workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
input_halo_extents (3) [integer,in,optional] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side). If not provided, input data is assumed to have no halos.
output_halo_extents (3) [integer,in,optional] :: Similar to intput_halo_extents but for the output data. If not provided, output data is assumed to have no halos.
input_padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
output_padding (3) [integer,in,optional] :: Similar to input_padding, but for the output data.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompTransposeZToY

function cudecompTransposeZToY(handle, grid_desc, input, output, work, dtype[, input_halo_extents, output_halo_extents, input_padding, output_padding, stream])

Function to transpose data from Z-axis aligned pencils to a Y-axis aligned pencils.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in] :: Device array containing input Z-axis aligned pencil data.
output (*) [T,out] :: Device array to write output Y-axis aligned pencil data. If input and output are the same, operation is performed in-place
work (*) [T,in] :: Device array to use for transpose workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
input_halo_extents (3) [integer,in,optional] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side). If not provided, input data is assumed to have no halos.
output_halo_extents (3) [integer,in,optional] :: Similar to intput_halo_extents but for the output data. If not provided, output data is assumed to have no halos.
input_padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
output_padding (3) [integer,in,optional] :: Similar to input_padding, but for the output data.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompTransposeYToX

function cudecompTransposeYToX(handle, grid_desc, input, output, work, dtype[, input_halo_extents, output_halo_extents, input_padding, output_padding, stream])

Function to transpose data from Y-axis aligned pencils to a X-axis aligned pencils.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in] :: Device array containing input Y-axis aligned pencil data.
output (*) [T,out] :: Device array to write output X-axis aligned pencil data. If input and output are the same, operation is performed in-place
work (*) [T,in] :: Device array to use for transpose workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
input_halo_extents (3) [integer,in,optional] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side). If not provided, input data is assumed to have no halos.
output_halo_extents (3) [integer,in,optional] :: Similar to intput_halo_extents but for the output data. If not provided, output data is assumed to have no halos.
input_padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
output_padding (3) [integer,in,optional] :: Similar to input_padding, but for the output data.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

Halo Exchange Functions

cudecompUpdateHalosX

function cudecompUpdateHalosX(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim[, padding, stream])

Function to perform halo communication of X-axis aligned pencil data.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in,out] :: Device array containing input X-axis aligned pencil data. On successful completion, this buffer will contain the input X-axis aligned pencil data with the specified halo regions updated.
work (*) [T,in] :: Device array to use for halo workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
halo_extents (3) [integer,in] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side).
halo_periods (3) [logical,in] :: An array of three boolean values to define halo periodicity of the input data, in global order. If the i-th entry in this array is true, the domain is treated periodically along the i-th global domain axis.
dim [integer,in] :: Which pencil dimension (global indexed) to perform the halo update.
padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompUpdateHalosY

function cudecompUpdateHalosY(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim[, padding, stream])

Function to perform halo communication of Y-axis aligned pencil data.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in,out] :: Device array containing input Y-axis aligned pencil data. On successful completion, this buffer will contain the input X-axis aligned pencil data with the specified halo regions updated.
work (*) [T,in] :: Device array to use for halo workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
halo_extents (3) [integer,in] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side).
halo_periods (3) [logical,in] :: An array of three boolean values to define halo periodicity of the input data, in global order. If the i-th entry in this array is true, the domain is treated periodically along the i-th global domain axis.
dim [integer,in] :: Which pencil dimension (global indexed) to perform the halo update.
padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.

cudecompUpdateHalosZ

function cudecompUpdateHalosZ(handle, grid_desc, input, work, dtype, halo_extents, halo_periods, dim[, padding, stream])

Function to perform halo communication of Z-axis aligned pencil data.

For this operation, T can be one of real(real32), real(real64), complex(real32), complex(real64). The data access for this operation is controlled via dtype, irrespective of T.

Parameters:

handle [cudecompHandle,in] :: The initialized cuDecomp library handle
grid_desc [cudecompGridDesc,in] :: A cuDecomp grid descriptor.
input (*) [T,in,out] :: Device array containing input Z-axis aligned pencil data. On successful completion, this buffer will contain the input X-axis aligned pencil data with the specified halo regions updated.
work (*) [T,in] :: Device array to use for halo workspace.
dtype [cudecompDataType,in] :: The cudecompDataType to use for the operation.
halo_extents (3) [integer,in] :: An array of three integers to define halo region extents of the input data, in global order. The i-th entry in this array should contain the number of halo elements (per direction) expected in the along the i-th global domain axis. Symmetric halos are assumed (e.g. a value of one in halo_extents means there are 2 halo elements, one element on each side).
halo_periods (3) [logical,in] :: An array of three boolean values to define halo periodicity of the input data, in global order. If the i-th entry in this array is true, the domain is treated periodically along the i-th global domain axis.
dim [integer,in] :: Which pencil dimension (global indexed) to perform the halo update.
padding (3) [integer,in,optional] :: An array of three integers to define padding of the input data, in global order. The i-th entry in this array should contain the number of elements to treat as padding in the i-th global domain axis.
stream [integer(cuda_stream_kind),in,optional] :: CUDA stream to enqueue GPU operations into. If not provided, operations are enqueued in the default stream.

Return:

res [cudecompResult] :: CUDECOMP_RESULT_SUCCESS on success or error code on failure.