LayerNorm

class tripy.LayerNorm(normalized_shape: int | Tuple[int], dtype: dtype = float32, eps: float = 1e-05)[source]

Bases: Module

Applies layer normalization over the input tensor:

\(\text{LayerNorm}(x) = \Large \frac{x - \bar{x}}{ \sqrt{\sigma^2 + \epsilon}} \normalsize * \gamma + \beta\)

where \(\bar{x}\) is the mean and \(\sigma^2\) is the variance.

The mean and standard deviation are calculated over the last \(D\) dimensions, where \(D\) is the dimension of \(\text{normalized_shape}\).

Parameters:
  • normalized_shape (Tuple[int]) – The size of the feature dimension of the input over which normalization is performed. If a single integer is provided, it will be unsqueezed to a 1 dimensional shape.

  • dtype (dtype) – The data type to use for the weight and bias parameters.

  • eps (float) – \(\epsilon\) value to prevent division by zero.

Example
Example
1layer_norm = tp.LayerNorm(3)
2
3input = tp.iota((2, 3), dim=1)
4output = layer_norm(input)
>>> layer_norm.state_dict()
{
    weight: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)),
    bias: tensor([0.0000, 1.0000, 2.0000], dtype=float32, loc=gpu:0, shape=(3,)),
}
>>> input
tensor(
    [[0.0000, 1.0000, 2.0000],
     [0.0000, 1.0000, 2.0000]], 
    dtype=float32, loc=gpu:0, shape=(2, 3))
>>> output
tensor(
    [[0.0000, 1.0000, 4.4495],
     [0.0000, 1.0000, 4.4495]], 
    dtype=float32, loc=gpu:0, shape=(2, 3))
dtype: dtype

The data type used to perform the operation.

normalized_shape: Tuple[int]

Defines the shape of the input tensor that is to be normalized over.

weight: Parameter

The \(\gamma\) parameter of shape \(\text{normalized_shape}\).

bias: Parameter

The \(\beta\) parameter of shape \(\text{normalized_shape}\).

eps: float

A value added to the denominator to prevent division by zero.

__call__(x: Tensor) Tensor[source]
Parameters:

x (Tensor) – The input tensor.

Returns:

A tensor of the same shape as the input.

Return type:

Tensor