apex.normalization.fused_layer_norm¶
-
class
apex.normalization.
FusedLayerNorm
(normalized_shape, eps=1e-05, elementwise_affine=True)[source]¶ Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization .
Currently only runs on cuda() tensors.
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by
normalized_shape
. \(\gamma\) and \(\beta\) are learnable affine transform parameters ofnormalized_shape
ifelementwise_affine
isTrue
.Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
normalized_shape (int or list or torch.Size) –
input shape from an expected input of size
\[[* \times \text{normalized}\_\text{shape}[0] \times \text{normalized}\_\text{shape}[1] \times \ldots \times \text{normalized}\_\text{shape}[-1]]\]If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
eps – a value added to the denominator for numerical stability. Default: 1e-5
elementwise_affine – a boolean value that when set to
True
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
- Shape:
Input: \((N, *)\)
Output: \((N, *)\) (same shape as input)
Examples:
>>> input = torch.randn(20, 5, 10, 10) >>> # With Learnable Parameters >>> m = apex.normalization.FusedLayerNorm(input.size()[1:]) >>> # Without Learnable Parameters >>> m = apex.normalization.FusedLayerNorm(input.size()[1:], elementwise_affine=False) >>> # Normalize over last two dimensions >>> m = apex.normalization.FusedLayerNorm([10, 10]) >>> # Normalize over last dimension of size 10 >>> m = apex.normalization.FusedLayerNorm(10) >>> # Activating the module >>> output = m(input)
-
extra_repr
()[source]¶ Set the extra representation of the module
To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
forward
(input)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.