transpose.h¶
Functions handling transposes.
Functions
- 
void nvte_cast_transpose(const NVTETensor input, NVTETensor cast_output, NVTETensor transposed_output, cudaStream_t stream)¶
- Cast and transpose the input. - This function casts the input and produces 2 results: - cast_outputis the result of the cast
- transposed_outputis the transposed result of the cast.
 - Parameters
- input – [in] Input tensor of shape [N, H]. 
- cast_output – [inout] Result of the cast. Shape: [N, H]. 
- transposed_output – [inout] Result of the cast and transpose. Shape: [H, N]. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_transpose(const NVTETensor input, NVTETensor transposed_output, cudaStream_t stream)¶
- Transpose the input. - Parameters
- input – [in] Input tensor of shape [N, H]. 
- transposed_output – [out] Result of the transpose. Shape: [H, N]. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_cast_transpose_dbias(const NVTETensor input, NVTETensor cast_output, NVTETensor transposed_output, NVTETensor dbias, NVTETensor workspace, cudaStream_t stream)¶
- Cast and transpose the input. Additionally, reduce the input along the first dimension. - This function casts the input and produces 3 results: - cast_outputis the result of the cast
- transposed_outputis the transposed result of the cast.
- dbiasis the result of the reduction of the input along the first dimension.
 - Calling this function with workspace being an empty tensor will not perform the operation, but instead set the shape and type of the workspace tensor to the required values. - Parameters
- input – [in] Input tensor of shape [N, H]. 
- cast_output – [inout] Result of the cast. Shape: [N, H]. 
- transposed_output – [inout] Result of the cast and transpose. Shape: [H, N]. 
- dbias – [out] Result of the reduction of the input along the first dimension. Shape: [H]. 
- workspace – [out] Workspace tensor. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_fp8_transpose_dbias(const NVTETensor input, NVTETensor transposed_output, NVTETensor dbias, NVTETensor workspace, cudaStream_t stream)¶
- Transpose the FP8 input. Additionally, reduce the input along the first dimension. - This function takes FP8 input and produces 2 results: - transposed_outputis the transposed result of the input.
- dbiasis the result of the reduction of the input along the first dimension.
 - Calling this function with workspace being an empty tensor will not perform the operation, but instead set the shape and type of the workspace tensor to the required values. - Parameters
- input – [in] Input tensor of shape [N, H]. 
- transposed_output – [inout] Result of the transpose. Shape: [H, N]. 
- dbias – [out] Result of the reduction of the input along the first dimension. Shape: [H]. 
- workspace – [out] Workspace tensor. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_cast_transpose_dbias_dgelu(const NVTETensor input, const NVTETensor gelu_input, NVTETensor cast_output, NVTETensor transposed_output, NVTETensor dbias, NVTETensor workspace, cudaStream_t stream)¶
- Compute backward of GELU operation on the input, then cast and transpose. Additionally, reduce the result of the GELU backward along the first dimension. - This function produces 3 results: - cast_outputis equal to- cast(dGELU(input))
- transposed_outputis equal to- transpose(cast(dGELU(input)))
- dbiasis equal to- reduce(dGELU(input), axis=0)
 - Calling this function with workspace being an empty tensor will not perform the operation, but instead set the shape and type of the workspace tensor to the required values. - Parameters
- input – [in] Input tensor of shape [N, H]. 
- gelu_input – [in] Tensor used as input to the forward of GELU operation. Shape [N, H]. 
- cast_output – [inout] Result of the cast. Shape: [N, H]. 
- transposed_output – [inout] Result of the cast and transpose. Shape: [H, N]. 
- dbias – [out] Result of the reduction of the dGELU(input) along the first dimension. Shape: [H]. 
- workspace – [out] Workspace tensor. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_multi_cast_transpose(size_t num_tensors, const NVTETensor *input_list, NVTETensor *cast_output_list, NVTETensor *transposed_output_list, cudaStream_t stream)¶
- Cast and transpose multiple tensors. - This function casts each input tensor and produces 2 results: - cast_outputis the result of the cast
- transposed_outputis the transposed result of the cast.
 - Parameters
- num_tensors – [in] Number of tensors. 
- input_list – [in] List of 2D input tensors. 
- cast_output_list – [inout] List of casted tensors. Dimensions match tensors in input_list. 
- transposed_output_list – [inout] List of casted and transposed tensors. Dimensions are transpose of tensors in input_list. 
- stream – [in] CUDA stream used for the operation. 
 
 
- 
void nvte_dgeglu_cast_transpose(const NVTETensor input, const NVTETensor geglu_input, NVTETensor cast_output, NVTETensor transposed_output, cudaStream_t stream)¶
- Compute dgeglu of the input, additionally does cast and transpose the dgeglu output. - This function produces 2 results: - cast_outputis the result of the cast
- transposed_outputis the transposed result of the cast.
 - Parameters
- input – [in] Input tensor of shape [N, H]. 
- geglu_input – [in] Tensor used as input to the forward of GeGLU operation. Shape [N, H * 2]. 
- cast_output – [inout] Result of the cast. Shape: [N, H * 2]. 
- transposed_output – [inout] Result of the cast and transpose. Shape: [H * 2, N]. 
- stream – [in] CUDA stream used for the operation.