Gated Recurrent Unit Cell (GRUCell)#
API#
- class warp_nn.modules.layers.GRUCell(input_size: int, hidden_size: int, *, bias: bool = True)[source]#
Bases:
ModuleApply a Gated Recurrent Unit (GRU) cell.
\[\text{GRUCell}(x, h) = h'\]where
\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} \, x + b_{ir} + W_{hr} \, h + b_{hr}) \\ z = \sigma(W_{iz} \, x + b_{iz} + W_{hz} \, h + b_{hz}) \\ n = \tanh(W_{in} \, x + b_{in} + r \odot (W_{hn} \, h + b_{hn})) \\ h' = (1 - z) \odot n + z \odot h \end{array}\end{split}\]and \(\sigma\) is the sigmoid function and \(\odot\) is the element-wise product.
Learnable parameters:
Name
Shape
Description
\(W_{ir}, W_{iz}, W_{in}\)
weight_ih(3 * hidden_size, input_size)Input-to-hidden weights
\(W_{hr}, W_{hz}, W_{hn}\)
weight_hh(3 * hidden_size, hidden_size)Hidden-to-hidden weights
\(b_{ir}, b_{iz}, b_{in}\)
bias_ih(3 * hidden_size, 1)Input-to-hidden bias. Only if
biasis true\(b_{hr}, b_{hz}, b_{hn}\)
bias_hh(3 * hidden_size, 1)Hidden-to-hidden bias. Only if
biasis trueThe parameters are initialized from the uniform distribution \(u(-k, k)\) where \(k = \frac{1}{\sqrt{\text{hidden\_size}}}\).
- Parameters:
input_size – The number of input features.
hidden_size – The number of hidden features.
bias – Whether to include a bias term.