transformer_engine.h
Base classes and functions of Transformer Engine API.
Typedefs
-
typedef void *NVTETensor
TE Tensor type.
NVTETensor is a contiguous tensor type storing a pointer to data of a given shape and type. It does not own the memory it points to.
-
typedef void *NVTEQuantizationConfig
Configuration for tensor quantization.
-
typedef void *NVTEGroupedTensor
TE Grouped Tensor type.
NVTEGroupedTensor is a collection of tensors with potentially different shapes but the same dtype and scaling mode. It does not own the memory it points to.
Enums
-
enum NVTEDType
TE datatype.
Values:
-
enumerator kNVTEByte
Byte
-
enumerator kNVTEInt16
16-bit integer
-
enumerator kNVTEInt32
32-bit integer
-
enumerator kNVTEInt64
64-bit integer
-
enumerator kNVTEFloat32
32-bit float
-
enumerator kNVTEFloat16
16-bit float (E5M10)
-
enumerator kNVTEBFloat16
16-bit bfloat (E8M7)
-
enumerator kNVTEFloat8E4M3
8-bit float (E4M3)
-
enumerator kNVTEFloat8E5M2
8-bit float (E5M2)
-
enumerator kNVTEFloat8E8M0
8-bit float (E8M0)
-
enumerator kNVTEFloat4E2M1
4-bit float (E2M1)
-
enumerator kNVTENumTypes
Number of supported types
-
enumerator kNVTEByte
-
enum NVTETensorParam
Indicates the kind of the tensor parameter to set/get.
Values:
-
enumerator kNVTERowwiseData
Data usable in rowwise manner
-
enumerator kNVTEColumnwiseData
Data usable in columnwise manner
-
enumerator kNVTEScale
Scale tensor
-
enumerator kNVTEAmax
Amax tensor
-
enumerator kNVTERowwiseScaleInv
Scale inverse tensor for decoding Rowwise Data
-
enumerator kNVTEColumnwiseScaleInv
Scale inverse tensor for decoding Columnwise Data
-
enumerator kNVTEColumnwiseAmax
Columnwise Amax tensor
-
enumerator kNVTEWithGEMMSwizzledScales
Whether scaling factors are in format expected by GEMM
-
enumerator kNVTERowScaledNVFP4
Whether an NVFP4 tensor uses row scaling instead of tensor scaling.
Column-wise data is not supported with row scaling.
Row scaling affects the interpretation of the amax tensor. With tensor scaling, the amax tensor is a single FP32 that must be computed prior to quantization. With row scaling, the amax tensor size is the number of tensor rows (flattened to 2D), and its values are populated during quantization.
-
enumerator kNVTENumTensorParams
-
enumerator kNVTERowwiseData
-
enum NVTEScalingMode
Tensor data format.
Values:
-
enumerator NVTE_DELAYED_TENSOR_SCALING
Either an unquantized tensor or an FP8 tensor with per-tensor scaling
Not necessary used for delayed tensor scaling. The unintuitive name reflects legacy usage.
-
enumerator NVTE_MXFP8_1D_SCALING
Single scale per block of 32 elements consecutive in either rowwise or columnwise direction
-
enumerator NVTE_BLOCK_SCALING_1D
Tensor is split into NxN quantization tiles or 1xN quantization tiles, which each yield a scale. The block_scaling_dim property of the quantizer selects the granularity.
-
enumerator NVTE_BLOCK_SCALING_2D
-
enumerator NVTE_NVFP4_1D_SCALING
Single scale per block of 16 elements consecutive in either rowwise or columnwise direction
-
enumerator NVTE_INVALID_SCALING
-
enumerator NVTE_DELAYED_TENSOR_SCALING
-
enum NVTEQuantizationConfigAttribute
Type of option for tensor quantization.
Values:
-
enumerator kNVTEQuantizationConfigForcePow2Scales
Whether to force power of 2 scales
-
enumerator kNVTEQuantizationConfigAmaxEpsilon
Small value to add to amax for numerical stability
-
enumerator kNVTEQuantizationConfigNoopTensor
Noop tensor (containing a scalar). If the scalar element value = 1, quantization kernel will early exit. This is a tensor because the flag must be on GPU in order to enable conditional early even when captured in a static CUDA graph.
-
enumerator kNVTEQuantizationConfigFloat8BlockScaleTensorFormat
Warning
Deprecated
-
enumerator kNVTEQuantizationConfigRNGState
RNG state (NVTETensor with 2 elements - seed and offset
-
enumerator kNVTEQuantizationConfigNVFP42DQuantization
Whether to use 2D block scaling for NVFP4
-
enumerator kNVTEQuantizationConfigStochasticRounding
Whether to enable stochastic rounding
-
enumerator kNVTEQuantizationConfigUseFastMath
Whether to enable fast math operations with reduced accuracy.
Optimizations are kernel-specific and they may be applied inconsistently between kernels.
-
enumerator kNVTEQuantizationConfigNumAttributes
-
enumerator kNVTEQuantizationConfigForcePow2Scales
-
enum NVTEGroupedTensorParam
Indicates the kind of the grouped tensor parameter to set/get.
Values:
-
enumerator kNVTEGroupedRowwiseData
Data usable in rowwise manner
-
enumerator kNVTEGroupedColumnwiseData
Data usable in columnwise manner
-
enumerator kNVTEGroupedScale
Scale tensor
-
enumerator kNVTEGroupedAmax
Amax tensor
-
enumerator kNVTEGroupedRowwiseScaleInv
Scale inverse tensor for decoding Rowwise Data
-
enumerator kNVTEGroupedColumnwiseScaleInv
Scale inverse tensor for decoding Columnwise Data
-
enumerator kNVTEGroupedColumnwiseAmax
Columnwise Amax tensor
-
enumerator kNVTEGroupedFirstDims
First dimension sizes (device pointer to int64_t array)
-
enumerator kNVTEGroupedLastDims
Last dimension sizes (device pointer to int64_t array)
-
enumerator kNVTEGroupedTensorOffsets
Tensor offsets for contiguous layout (device pointer to int64_t array)
-
enumerator kNVTEGroupedWithGEMMSwizzledScales
Whether scaling factors are in format expected by GEMM
-
enumerator kNVTENumGroupedTensorParams
-
enumerator kNVTEGroupedRowwiseData
Functions
-
NVTETensor nvte_create_tensor(NVTEScalingMode scaling_mode)
Create a new TE tensor.
Create a new TE tensor. Before use its parameters need to be set. TE tensors are just wrappers on top of raw data and do not own memory.
- Parameters:
scaling_mode – [in] Scaling mode of the tensor.
- Returns:
A new TE tensor.
-
void nvte_destroy_tensor(NVTETensor tensor)
Destroy a TE tensor.
Since the TE tensor does not own memory, the underlying data is not freed during this operation.
- Parameters:
tensor – [in] Tensor to be destroyed.
-
void *nvte_tensor_data(const NVTETensor tensor)
Get a raw pointer to the tensor’s rowwise data.
- Parameters:
tensor – [in] Tensor.
- Returns:
A raw pointer to tensor’s rowwise data.
-
void *nvte_tensor_columnwise_data(const NVTETensor tensor)
Get a raw pointer to the tensor’s columnwise data.
- Parameters:
tensor – [in] Tensor.
- Returns:
A raw pointer to tensor’s columnwise data.
-
NVTEShape nvte_make_shape(const size_t *data, size_t ndim)
Construct a shape from an array of dimension sizes.
- Parameters:
[data] – Pointer to start of shape array. If NULL, the shape will be filled with zeros.
[ndim] – Number of dimensions (must be <= 14)
- Returns:
A shape. The shape will own its own copy of the data.
-
NVTEShape nvte_tensor_shape(const NVTETensor tensor)
Get a tensor’s data shape.
- Parameters:
tensor – [in] Tensor.
- Returns:
A shape of the input tensor.
-
NVTEShape nvte_tensor_columnwise_shape(const NVTETensor tensor)
Get a tensor’s data shape.
- Parameters:
tensor – [in] Tensor.
- Returns:
A shape of the input tensor.
-
size_t nvte_tensor_ndims(const NVTETensor tensor)
Get a tensor’s number of dimensions.
- Parameters:
tensor – [in] Tensor.
- Returns:
Number of tensor dimensions.
-
size_t nvte_tensor_size(const NVTETensor tensor, const size_t dim)
Get the size of a specific tensor dimension.
- Parameters:
tensor – [in] Tensor.
dim – [in] Dimension index.
- Returns:
Size of the tensor at the specified dimension.
-
size_t nvte_tensor_size_bytes(const NVTETensor tensor)
Get the byte size for the tensor.
- Parameters:
tensor – [in] Tensor.
- Returns:
Byte size of the tensor.
-
size_t nvte_tensor_numel(const NVTETensor tensor)
Get a tensor’s total number of elements.
- Parameters:
tensor – [in] Tensor.
- Returns:
Number of elements in the tensor.
-
size_t nvte_tensor_element_size(const NVTETensor tensor)
Get the byte size for the tensor’s data type.
- Parameters:
tensor – [in] Tensor.
- Returns:
Byte size of the tensor’s data type.
-
size_t nvte_tensor_element_size_bits(const NVTETensor tensor)
Get the bit size for the tensor’s data type.
- Parameters:
tensor – [in] Tensor.
- Returns:
Bit size of the tensor’s data type.
-
NVTEDType nvte_tensor_type(const NVTETensor tensor)
Get a tensor’s data type.
- Parameters:
tensor – [in] Tensor.
- Returns:
A data type of the input tensor.
-
float *nvte_tensor_amax(const NVTETensor tensor)
Get a pointer to the tensor’s amax data.
- Parameters:
tensor – [in] Tensor.
- Returns:
A pointer to tensor’s amax data.
-
float *nvte_tensor_scale(const NVTETensor tensor)
Get a pointer to the tensor’s scale data.
- Parameters:
tensor – [in] Tensor.
- Returns:
A pointer to tensor’s scale data.
-
float *nvte_tensor_scale_inv(const NVTETensor tensor)
Get a pointer to the tensor’s inverse of scale data.
- Parameters:
tensor – [in] Tensor.
- Returns:
A pointer to tensor’s inverse of scale data.
-
NVTEShape nvte_tensor_scale_inv_shape(const NVTETensor tensor)
Get a tensor’s scale_inv shape.
- Parameters:
tensor – [in] Tensor.
- Returns:
A scale_inv shape of the input tensor.
-
void nvte_zero_tensor(const NVTETensor tensor, cudaStream_t stream)
Reset tensor value to zero.
- Parameters:
tensor – [in] Tensor.
stream – [in] CUDA stream to use for the operation.
-
void nvte_set_tensor_param(NVTETensor *tensor, NVTETensorParam param_name, const NVTEBasicTensor *param)
Set a parameter of the tensor.
Warning
Deprecated in favor of nvte_set_tensor_param_v2.
- Parameters:
tensor – [inout] Tensor.
param_name – [in] The parameter to be set.
param – [in] The value to be set.
-
NVTEBasicTensor nvte_get_tensor_param(const NVTETensor tensor, NVTETensorParam param_name)
Get a value of the parameter of the tensor.
Warning
Deprecated in favor of nvte_set_tensor_param_v2.
- Parameters:
tensor – [in] Tensor.
param_name – [in] The parameter to be set.
-
void nvte_set_tensor_param_v2(NVTETensor tensor, NVTETensorParam param, const void *buf, size_t size_in_bytes)
Set a tensor parameter.
- Parameters:
tensor – [inout] Tensor.
param – [in] Tensor parameter type.
buf – [in] Memory address to read parameter value.
size_in_bytes – [in] Size of buf.
-
void nvte_get_tensor_param_v2(const NVTETensor tensor, NVTETensorParam param, void *buf, size_t size_in_bytes, size_t *size_written)
Query a tensor parameter.
- Parameters:
tensor – [in] Tensor.
param – [in] Tensor parameter type.
buf – [out] Memory address to write parameter value. Ignored if NULL.
size_in_bytes – [in] Size of buf.
size_written – [out] Number of bytes that have been written to buf. If buf is NULL, then the number of bytes that would have been written.
-
NVTEScalingMode nvte_tensor_scaling_mode(const NVTETensor tensor)
Get the granularity of scaling of this tensor.
- Parameters:
tensor – [in] Tensor.
- Returns:
A struct containing the granularity of tensor’s scaling.
-
void nvte_tensor_pack_create(NVTETensorPack *pack)
Create
tensorsin NVTETensorPack.
-
void nvte_tensor_pack_destroy(NVTETensorPack *pack)
Destroy
tensorsin NVTETensorPack.
-
NVTEQuantizationConfig nvte_create_quantization_config()
Create a new quantization config.
- Returns:
A new quantization config.
-
void nvte_get_quantization_config_attribute(NVTEQuantizationConfig config, NVTEQuantizationConfigAttribute attr, void *buf, size_t size_in_bytes, size_t *size_written)
Query an option in quantization config.
- Parameters:
config – [in] Quantization config.
attr – [in] Option type.
buf – [out] Memory address to write option value. Ignored if NULL.
size_in_bytes – [in] Size of buf.
size_written – [out] Number of bytes that have been written to buf. If buf is NULL, then the number of bytes that would have been written.
-
void nvte_set_quantization_config_attribute(NVTEQuantizationConfig config, NVTEQuantizationConfigAttribute attr, const void *buf, size_t size_in_bytes)
Set an option in quantization config.
- Parameters:
config – [inout] Quantization config.
attr – [in] Option type.
buf – [in] Memory address to read option value.
size_in_bytes – [in] Size of buf.
-
void nvte_destroy_quantization_config(NVTEQuantizationConfig config)
Destroy a quantization config.
- Parameters:
config – [in] Config to be destroyed.
-
int nvte_is_non_tn_fp8_gemm_supported()
Check if non-TN FP8 Gemm is supported.
- Returns:
A flag which indicates whether non-TN FP8 Gemm is supported or not.
-
void nvte_memset(void *ptr, int value, size_t size_in_bytes, cudaStream_t stream)
Performs a memset of the data at the given pointer and size in bytes.
This function calls a fill kernel for small sizes and calls cudaMemsetAsync for larger sizes.
- Parameters:
ptr – [in] Pointer to the memory to be set.
value – [in] Value to set the memory to.
size_in_bytes – [in] Size of the memory in bytes.
stream – [in] CUDA stream to use for the operation.
-
void nvte_splits_to_offsets(const int64_t *first_dims, int64_t *output, size_t num_tensors, int64_t logical_last_dim, cudaStream_t stream)
Compute scaled prefix-sum offsets for grouped tensors.
Computes: output[0] = 0 output[i + 1] = sum_{j=0..i}(first_dims[j] * logical_last_dim) for i in [0, num_tensors - 1].
- Parameters:
first_dims – [in] Pointer to device int64 array of size num_tensors.
output – [out] Pointer to device int64 array of size num_tensors + 1.
num_tensors – [in] Number of entries in first_dims.
logical_last_dim – [in] Scale factor applied to each first_dims entry.
stream – [in] CUDA stream to use for the operation.
-
NVTEGroupedTensor nvte_create_grouped_tensor(NVTEScalingMode scaling_mode, size_t num_tensors, NVTEShape logical_shape)
Create a new TE grouped tensor.
Create a new TE grouped tensor. Before use its parameters need to be set. TE grouped tensors are just wrappers on top of raw data and do not own memory.
- Parameters:
scaling_mode – [in] Scaling mode of the grouped tensor.
num_tensors – [in] Number of tensors in the group (must be > 0).
logical_shape – [in] Logical 2D shape of the grouped data.
- Returns:
A new TE grouped tensor.
-
void nvte_destroy_grouped_tensor(NVTEGroupedTensor tensor)
Destroy a TE grouped tensor.
Since the TE grouped tensor does not own memory, the underlying data is not freed during this operation.
- Parameters:
tensor – [in] Grouped tensor to be destroyed.
-
void nvte_set_grouped_tensor_param(NVTEGroupedTensor tensor, NVTEGroupedTensorParam param, const void *buf, size_t size_in_bytes)
Set a grouped tensor parameter.
- Parameters:
tensor – [inout] Grouped tensor.
param – [in] Grouped tensor parameter type.
buf – [in] Memory address to read parameter value.
size_in_bytes – [in] Size of buf.
-
void nvte_get_grouped_tensor_param(const NVTEGroupedTensor tensor, NVTEGroupedTensorParam param, void *buf, size_t size_in_bytes, size_t *size_written)
Query a grouped tensor parameter.
- Parameters:
tensor – [in] Grouped tensor.
param – [in] Grouped tensor parameter type.
buf – [out] Memory address to write parameter value. Ignored if NULL.
size_in_bytes – [in] Size of buf.
size_written – [out] Number of bytes that have been written to buf. If buf is NULL, then the number of bytes that would have been written.
-
size_t nvte_grouped_tensor_num_tensors(const NVTEGroupedTensor tensor)
Get the number of tensors in a grouped tensor.
- Parameters:
tensor – [in] Grouped tensor.
- Returns:
Number of tensors in the group.
-
NVTEDType nvte_grouped_tensor_type(const NVTEGroupedTensor tensor)
Get a grouped tensor’s data type.
- Parameters:
tensor – [in] Grouped tensor.
- Returns:
A data type of the grouped tensor.
-
NVTEScalingMode nvte_grouped_tensor_scaling_mode(const NVTEGroupedTensor tensor)
Get a scaling mode of the grouped tensor.
- Parameters:
tensor – [in] Grouped tensor.
- Returns:
Scaling mode of the grouped tensor.
-
NVTEShape nvte_get_grouped_tensor_logical_shape(const NVTEGroupedTensor tensor)
Get the logical shape of a grouped tensor.
- Parameters:
tensor – [in] Grouped tensor.
- Returns:
Logical 2D shape.
-
struct NVTEShape
- #include <transformer_engine.h>
Shape of the tensor.
-
struct NVTEBasicTensor
- #include <transformer_engine.h>
A basic tensor type used to populate parameters of NVTETensor. It does not own the memory it points to.
-
struct NVTETensorPack
- #include <transformer_engine.h>
Pack of tensors, generally used for auxiliary outputs.
Public Members
-
NVTETensor tensors[MAX_SIZE]
Wrappers of tensors. They do not hold the associated memory.
-
size_t size = 0
Actual number of tensors in the pack, 0 <= size <= MAX_SIZE.
Public Static Attributes
-
static const int MAX_SIZE = 10
Max number of tensors in the pack. Assumed <= 10.
-
NVTETensor tensors[MAX_SIZE]
-
namespace transformer_engine
Namespace containing C++ API of Transformer Engine.
Enums
Functions
-
inline bool is_fp8_dtype(const DType t)
Check if TE datatype is FP8.
Return true if TE datatype is FP8
- Parameters:
t – [in] TE Datatype of interest
-
struct GroupedTensorWrapper
- #include <transformer_engine.h>
C++ wrapper for the NVTEGroupedTensor class.
Public Functions
-
inline GroupedTensorWrapper(const size_t num_tensors, const NVTEShape &logical_shape, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
Constructs new GroupedTensorWrapper.
Create a new TE grouped tensor with a given logical shape. TE grouped tensors are just wrappers on top of raw data and do not own memory.
- Parameters:
num_tensors – [in] Number of tensors in the group (must be > 0).
logical_shape – [in] Logical 2D shape of the grouped data.
scaling_mode – [in] Tensor data format.
-
inline GroupedTensorWrapper(const size_t num_tensors, const std::vector<size_t> &logical_shape, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
Constructs new GroupedTensorWrapper.
Create a new TE grouped tensor with a given logical shape.
- Parameters:
num_tensors – [in] Number of tensors in the group (must be > 0).
logical_shape – [in] Logical 2D shape of the grouped data.
scaling_mode – [in] Tensor data format.
-
inline ~GroupedTensorWrapper()
GroupedTensorWrapper destructor.
-
GroupedTensorWrapper &operator=(const GroupedTensorWrapper &other) = delete
-
GroupedTensorWrapper(const GroupedTensorWrapper &other) = delete
-
inline GroupedTensorWrapper(GroupedTensorWrapper &&other)
Constructs new GroupedTensorWrapper from existing GroupedTensorWrapper.
-
inline GroupedTensorWrapper &operator=(GroupedTensorWrapper &&other)
Assign the data from existing GroupedTensorWrapper.
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_parameter(const NVTEGroupedTensorParam param, void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_rowwise_data(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_columnwise_data(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_scale(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_amax(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_rowwise_scale_inv(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_columnwise_scale_inv(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_columnwise_amax(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_first_dims(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_last_dims(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline GroupedTensorWrapper &set_tensor_offsets(void *dptr, DType type, const ShapeType &shape) noexcept
-
inline void set_with_gemm_swizzled_scales(bool with_gemm_swizzled_scales)
-
inline NVTEBasicTensor get_parameter(const NVTEGroupedTensorParam param) const noexcept
-
inline NVTEBasicTensor get_rowwise_data() const noexcept
-
inline NVTEBasicTensor get_columnwise_data() const noexcept
-
inline NVTEBasicTensor get_scale() const noexcept
-
inline NVTEBasicTensor get_amax() const noexcept
-
inline NVTEBasicTensor get_rowwise_scale_inv() const noexcept
-
inline NVTEBasicTensor get_columnwise_scale_inv() const noexcept
-
inline NVTEBasicTensor get_columnwise_amax() const noexcept
-
inline NVTEBasicTensor get_first_dims() const noexcept
-
inline NVTEBasicTensor get_last_dims() const noexcept
-
inline NVTEBasicTensor get_tensor_offsets() const noexcept
-
inline bool get_with_gemm_swizzled_scales() const
-
inline NVTEGroupedTensor data() const noexcept
Get an underlying NVTEGroupedTensor.
- Returns:
NVTEGroupedTensor held by this GroupedTensorWrapper.
-
inline size_t num_tensors() const noexcept
Get the number of tensors in this GroupedTensorWrapper.
-
inline DType dtype() const noexcept
Get the data type of this GroupedTensorWrapper.
-
inline NVTEScalingMode scaling_mode() const noexcept
Get a scaling mode of the grouped tensor.
-
inline const NVTEShape logical_shape() const noexcept
Get the logical shape of this GroupedTensorWrapper.
Public Static Attributes
-
static constexpr size_t defaultData = 1
-
static constexpr NVTEShape defaultShape = {{defaultData, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 1}
Private Functions
Private Members
-
NVTEGroupedTensor tensor_ = nullptr
Wrapped NVTEGroupedTensor.
-
inline GroupedTensorWrapper(const size_t num_tensors, const NVTEShape &logical_shape, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
-
struct QuantizationConfigWrapper
- #include <transformer_engine.h>
C++ wrapper for NVTEQuantizationConfigWrapper.
Public Functions
-
inline QuantizationConfigWrapper()
-
QuantizationConfigWrapper(const QuantizationConfigWrapper&) = delete
-
QuantizationConfigWrapper &operator=(const QuantizationConfigWrapper&) = delete
-
inline QuantizationConfigWrapper(QuantizationConfigWrapper &&other)
Move constructor.
-
inline QuantizationConfigWrapper &operator=(QuantizationConfigWrapper &&other)
Move-assignment operator.
-
inline ~QuantizationConfigWrapper()
-
inline operator NVTEQuantizationConfig() const noexcept
Get the underlying NVTEQuantizationConfig.
- Returns:
NVTEQuantizationConfig held by this QuantizationConfigWrapper.
-
inline void set_force_pow_2_scales(bool force_pow_2_scales)
Set whether to force power of 2 scales.
-
inline void set_amax_epsilon(float amax_epsilon)
Set small value to add to amax.
-
inline void set_noop_tensor(NVTETensor noop_tensor)
Set noop tensor pointer.
-
inline void set_float8_block_scale_tensor_format(Float8BlockScaleTensorFormat format)
Warning
Deprecated
-
inline void set_rng_state(NVTETensor rng_state)
Set stochastic rounding state.
-
inline void set_nvfp4_2d_quantization(bool nvfp4_2d_quantization)
Set whether to use 2D block scaling for NVFP4.
-
inline void set_stochastic_rounding(bool stochastic_rounding)
Set whether to use stochastic rounding.
-
inline void set_use_fast_math(bool use_fast_math)
Set whether to enable fast math operations.
Private Members
-
NVTEQuantizationConfig config_ = nullptr
Wrapped NVTEQuantizationConfig.
-
inline QuantizationConfigWrapper()
-
struct TensorWrapper
- #include <transformer_engine.h>
C++ wrapper for the NVTETensor class.
Public Functions
-
inline TensorWrapper(void *dptr, const NVTEShape &shape, const DType dtype, float *amax_dptr = nullptr, float *scale_dptr = nullptr, float *scale_inv_dptr = nullptr, NVTEShape scale_inv_shape = defaultShape, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
Constructs new TensorWrapper.
Create a new TE tensor with a given shape, datatype and data. TE tensors are just wrappers on top of raw data and do not own memory.
- Parameters:
dptr – [in] Pointer to the tensor data.
shape – [in] Shape of the tensor.
dtype – [in] Data type of the tensor.
amax_dptr – [in] Pointer to the AMAX value.
scale_dptr – [in] Pointer to the scale value.
scale_inv_shape – [in] Shape of scale_inv
scale_inv_dptr – [in] Pointer to the inverse of scale value.
scaling_mode – [in] Tensor data format.
-
inline TensorWrapper(void *dptr, const std::vector<size_t> &shape, const DType dtype, float *amax_dptr = nullptr, float *scale_dptr = nullptr, float *scale_inv_dptr = nullptr, const std::vector<size_t> &scale_inv_shape = {1}, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
Constructs new TensorWrapper.
Create a new TE tensor with a given shape, datatype and data. TE tensors are just wrappers on top of raw data and do not own memory.
- Parameters:
dptr – [in] Pointer to the tensor data.
shape – [in] Shape of the tensor.
dtype – [in] Data type of the tensor.
amax_dptr – [in] Pointer to the AMAX value.
scale_dptr – [in] Pointer to the scale value.
scale_inv_shape – [in] Shape of scale_inv
scale_inv_dptr – [in] Pointer to the inverse of scale value.
scaling_mode – [in] Tensor data format.
-
inline explicit TensorWrapper(const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
Constructs new empty TensorWrapper.
Create a new empty TE tensor which holds nothing.
-
inline ~TensorWrapper()
TensorWrapper destructor.
-
TensorWrapper &operator=(const TensorWrapper &other) = delete
-
TensorWrapper(const TensorWrapper &other) = delete
-
inline TensorWrapper(TensorWrapper &&other)
Constructs new TensorWrapper from existing TensorWrapper.
Pass an existing TE tensor to a new TensorWrapper.
- Parameters:
other – [inout] The source of the data.
-
inline TensorWrapper &operator=(TensorWrapper &&other)
Assign the data from existing TensorWrapper.
Change ownership of an existing TE tensor.
- Parameters:
other – [inout] The source of the data.
-
template<typename ShapeType>
inline TensorWrapper &set_parameter(const NVTETensorParam param, void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_rowwise_data(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_columnwise_data(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_scale(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_amax(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_rowwise_scale_inv(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_columnwise_scale_inv(void *dptr, DType type, const ShapeType &shape) noexcept
-
template<typename ShapeType>
inline TensorWrapper &set_columnwise_amax(void *dptr, DType type, const ShapeType &shape) noexcept
-
inline void set_with_gemm_swizzled_scales(bool with_gemm_swizzled_scales)
-
inline void set_row_scaled_nvfp4(bool row_scaled_nvfp4)
-
inline NVTEBasicTensor get_parameter(const NVTETensorParam param) const noexcept
-
inline NVTEBasicTensor get_rowwise_data() const noexcept
-
inline NVTEBasicTensor get_columnwise_data() const noexcept
-
inline NVTEBasicTensor get_scale() const noexcept
-
inline NVTEBasicTensor get_amax() const noexcept
-
inline NVTEBasicTensor get_rowwise_scale_inv() const noexcept
-
inline NVTEBasicTensor get_columnwise_scale_inv() const noexcept
-
inline NVTEBasicTensor get_columnwise_amax() const noexcept
-
inline bool get_with_gemm_swizzled_scales() const
-
inline bool get_row_scaled_nvfp4() const
-
inline NVTETensor data() const noexcept
Get an underlying NVTETensor.
- Returns:
NVTETensor held by this TensorWrapper.
-
inline const NVTEShape shape() const noexcept
Get the shape of this TensorWrapper.
- Returns:
Shape of this TensorWrapper.
-
inline const NVTEShape columnwise_shape() const noexcept
Get the shape of this TensorWrapper.
- Returns:
Shape of this TensorWrapper.
-
inline size_t size(const size_t dim) const
Get the size of this TensorWrapper in the given dimension.
- Parameters:
dim – [in] Dimension index.
- Returns:
Size of this TensorWrapper in given dimension.
-
inline size_t ndim() const noexcept
Get the number of dimensions for this TensorWrapper.
- Returns:
Number of dimensions for this TensorWrapper.
-
inline size_t numel() const noexcept
Get the number of allocated elements in the tensor. This will return 0 for tensors with nullptr data even if the TensorWrapper has a non-zero shape.
- Returns:
Number of elements in the tensor.
-
inline size_t element_size() const noexcept
Get the tensor’s element size in bytes.
- Returns:
Element size in bytes.
-
inline size_t element_size_bits() const noexcept
Get the tensor’s element size in bits.
- Returns:
Element size in bits.
-
inline size_t bytes() const noexcept
Get the tensor’s allocated size in bytes. This will return 0 for tensors with nullptr data even if the TensorWrapper has a non-zero shape and valid dtype.
- Returns:
Total tensor size in bytes.
-
inline DType dtype() const noexcept
Get the data type of this TensorWrapper.
- Returns:
Data type of this TensorWrapper.
-
inline void *dptr() const noexcept
Get a raw pointer to the tensor’s data.
- Returns:
A raw pointer to tensor’s data.
-
inline void *columnwise_dptr() const noexcept
Get a raw pointer to the tensor’s data.
- Returns:
A raw pointer to tensor’s data.
-
inline float *amax() const noexcept
Get a pointer to the tensor’s amax data.
- Returns:
A pointer to tensor’s amax data.
-
inline float *scale() const noexcept
Get a pointer to the tensor’s scale data.
- Returns:
A pointer to tensor’s scale data.
-
inline float *scale_inv() const noexcept
Get a pointer to the tensor’s inverse of scale data.
- Returns:
A pointer to tensor’s inverse of scale data.
-
inline const NVTEShape scale_inv_shape() const noexcept
Get the scale_inv_shape of this TensorWrapper.
- Returns:
scale_inv_shape of this TensorWrapper.
-
inline NVTEScalingMode scaling_mode() const noexcept
Get a scaling mode of the tensor.
- Returns:
Scaling mode of the tensor.
-
inline void zero_(cudaStream_t stream)
Public Static Attributes
-
static constexpr size_t defaultData = 1
-
static constexpr NVTEShape defaultShape = {{defaultData, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 1}
Private Functions
Private Members
-
NVTETensor tensor_ = nullptr
Wrapped NVTETensor.
-
inline TensorWrapper(void *dptr, const NVTEShape &shape, const DType dtype, float *amax_dptr = nullptr, float *scale_dptr = nullptr, float *scale_inv_dptr = nullptr, NVTEShape scale_inv_shape = defaultShape, const NVTEScalingMode scaling_mode = NVTE_DELAYED_TENSOR_SCALING)
-
inline bool is_fp8_dtype(const DType t)