Gated Recurrent Unit Cell (GRUCell)#

API#

class warp_nn.modules.layers.GRUCell(input_size: int, hidden_size: int, *, bias: bool = True)[source]#

Bases: Module

Apply a Gated Recurrent Unit (GRU) cell.

\[\text{GRUCell}(x, h) = h'\]

where

\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} \, x + b_{ir} + W_{hr} \, h + b_{hr}) \\ z = \sigma(W_{iz} \, x + b_{iz} + W_{hz} \, h + b_{hz}) \\ n = \tanh(W_{in} \, x + b_{in} + r \odot (W_{hn} \, h + b_{hn})) \\ h' = (1 - z) \odot n + z \odot h \end{array}\end{split}\]

and \(\sigma\) is the sigmoid function and \(\odot\) is the element-wise product.


Learnable parameters:

Name

Shape

Description

\(W_{ir}, W_{iz}, W_{in}\)

weight_ih

(3 * hidden_size, input_size)

Input-to-hidden weights

\(W_{hr}, W_{hz}, W_{hn}\)

weight_hh

(3 * hidden_size, hidden_size)

Hidden-to-hidden weights

\(b_{ir}, b_{iz}, b_{in}\)

bias_ih

(3 * hidden_size, 1)

Input-to-hidden bias. Only if bias is true

\(b_{hr}, b_{hz}, b_{hn}\)

bias_hh

(3 * hidden_size, 1)

Hidden-to-hidden bias. Only if bias is true

The parameters are initialized from the uniform distribution \(u(-k, k)\) where \(k = \frac{1}{\sqrt{\text{hidden\_size}}}\).


Parameters:
  • input_size – The number of input features.

  • hidden_size – The number of hidden features.

  • bias – Whether to include a bias term.

__call__(
input: array,
hidden: array,
) array[source]#

Forward pass of the module.

Parameters:
  • input – The input array, with shape (batch_size, input_size).

  • hidden – The initial hidden state array, with shape (batch_size, hidden_size).

Returns:

The next hidden state array, with shape (batch_size, hidden_size).