LayerNorm¶
- class tripy.LayerNorm(normalized_shape: int | Tuple[int], dtype: dtype = float32, eps: float = 1e-05)[source]¶
Bases:
Module
Applies layer normalization over the input tensor:
\(\text{LayerNorm}(x) = \Large \frac{x - \bar{x}}{ \sqrt{\sigma^2 + \epsilon}} \normalsize * \gamma + \beta\)
where \(\bar{x}\) is the mean and \(\sigma^2\) is the variance.
The mean and standard deviation are calculated over the last \(D\) dimensions, where \(D\) is the dimension of \(\text{normalized_shape}\).
- Parameters:
normalized_shape (Tuple[int]) – The size of the feature dimension of the input over which normalization is performed. If a single integer is provided, it will be unsqueezed to a 1 dimensional shape.
dtype (dtype) – The data type to use for the weight and bias parameters.
eps (float) – \(\epsilon\) value to prevent division by zero.
Example
1layer_norm = tp.LayerNorm(3) 2 3input = tp.iota((2, 3), dim=1) 4output = layer_norm(input)
>>> layer_norm.state_dict() { weight: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)), bias: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)), } >>> input tensor( [[0.0000, 1.0000, 2.0000], [0.0000, 1.0000, 2.0000]], dtype=float32, loc=gpu:0, shape=(2, 3)) >>> output tensor( [[0.0000, 1.0000, 4.4495], [0.0000, 1.0000, 4.4495]], dtype=float32, loc=gpu:0, shape=(2, 3))
- normalized_shape: Tuple[int]¶
Defines the shape of the input tensor that is to be normalized over.
- eps: float¶
A value added to the denominator to prevent division by zero.