rnns

attention_wrapper

A powerful dynamic attention wrapper object.

Modified by blisc to add support for LocationSensitiveAttention and changed the AttentionWrapper class to output both the cell_output and attention context concatenated together.

New classes:
LocationSensitiveAttention LocationLayer
New functions:
_bahdanau_score_with_location
class parts.rnns.attention_wrapper.AttentionMechanism[source]

Bases: object

alignments_size
state_size
class parts.rnns.attention_wrapper.AttentionWrapper(cell, attention_mechanism, attention_layer_size=None, alignment_history=False, cell_input_fn=None, output_attention=True, initial_cell_state=None, name=None)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

Wraps another RNNCell with attention.

__init__(cell, attention_mechanism, attention_layer_size=None, alignment_history=False, cell_input_fn=None, output_attention=True, initial_cell_state=None, name=None)[source]

Construct the AttentionWrapper.

NOTE If you are using the BeamSearchDecoder with a cell wrapped in AttentionWrapper, then you must ensure that:

  • The encoder output has been tiled to beam_width via @{tf.contrib.seq2seq.tile_batch} (NOT tf.tile).
  • The batch_size argument passed to the zero_state method of this wrapper is equal to true_batch_size * beam_width.
  • The initial state created with zero_state above contains a cell_state value containing properly tiled final state from the encoder.

An example:

``` tiled_encoder_outputs = tf.contrib.seq2seq.tile_batch(

encoder_outputs, multiplier=beam_width)
tiled_encoder_final_state = tf.conrib.seq2seq.tile_batch(
encoder_final_state, multiplier=beam_width)
tiled_sequence_length = tf.contrib.seq2seq.tile_batch(
sequence_length, multiplier=beam_width)
attention_mechanism = MyFavoriteAttentionMechanism(
num_units=attention_depth, memory=tiled_inputs, memory_sequence_length=tiled_sequence_length)

attention_cell = AttentionWrapper(cell, attention_mechanism, …) decoder_initial_state = attention_cell.zero_state(

dtype, batch_size=true_batch_size * beam_width)
decoder_initial_state = decoder_initial_state.clone(
cell_state=tiled_encoder_final_state)

```

Parameters:
  • cell – An instance of RNNCell.
  • attention_mechanism – A list of AttentionMechanism instances or a single instance.
  • attention_layer_size – A list of Python integers or a single Python integer, the depth of the attention (output) layer(s). If None (default), use the context as attention at each time step. Otherwise, feed the context and cell output into the attention layer to generate attention at each time step. If attention_mechanism is a list, attention_layer_size must be a list of the same length.
  • alignment_history – Python boolean, whether to store alignment history from all time steps in the final output state (currently stored as a time major TensorArray on which you must call stack()).
  • cell_input_fn – (optional) A callable. The default is: lambda inputs, attention: array_ops.concat([inputs, attention], -1).
  • output_attention – bool or “both”. If True (default), the output at each time step is the attention value. This is the behavior of Luong-style attention mechanisms. If False, the output at each time step is the output of cell. This is the beahvior of Bhadanau-style attention mechanisms. If “both”, the attention value and cell output are concatenated together and set as the output. In all cases, the attention tensor is propagated to the next time step via the state and is used there. This flag only controls whether the attention mechanism is propagated up to the next cell in an RNN stack or to the top RNN output.
  • initial_cell_state – The initial state value to use for the cell when the user calls zero_state(). Note that if this value is provided now, and the user uses a batch_size argument of zero_state which does not match the batch size of initial_cell_state, proper behavior is not guaranteed.
  • name – Name to use when creating ops.
Raises:
  • TypeErrorattention_layer_size is not None and (attention_mechanism is a list but attention_layer_size is not; or vice versa).
  • ValueError – if attention_layer_size is not None, attention_mechanism is a list, and its length does not match that of attention_layer_size.
_item_or_tuple(seq)[source]

Returns seq as tuple or the singular element.

Which is returned is determined by how the AttentionMechanism(s) were passed to the constructor.

Parameters:seq – A non-empty sequence of items or generator.
Returns:Either the values in the sequence as a tuple if AttentionMechanism(s) were passed to the constructor as a sequence or the singular element.
call(inputs, state)[source]

Perform a step of attention-wrapped RNN.

  • Step 1: Mix the inputs and previous step’s attention output via cell_input_fn.
  • Step 2: Call the wrapped cell with this input and its previous state.
  • Step 3: Score the cell’s output with attention_mechanism.
  • Step 4: Calculate the alignments by passing the score through the normalizer.
  • Step 5: Calculate the context vector as the inner product between the alignments and the attention_mechanism’s values (memory).
  • Step 6: Calculate the attention output by concatenating the cell output and context through the attention layer (a linear layer with attention_layer_size outputs).
Parameters:
  • inputs – (Possibly nested tuple of) Tensor, the input at this time step.
  • state – An instance of AttentionWrapperState containing tensors from the previous time step.
Returns:

  • attention_or_cell_output depending on output_attention.
  • next_state is an instance of AttentionWrapperState
    containing the state calculated at this time step.

Return type:

A tuple (attention_or_cell_output, next_state), where

Raises:

TypeError – If state is not an instance of AttentionWrapperState.

output_size

Integer or TensorShape – size of outputs produced by this cell.

state_size

The state_size property of AttentionWrapper.

Returns:An AttentionWrapperState tuple containing shapes used by this object.
zero_state(batch_size, dtype)[source]

Return an initial (zero) state tuple for this AttentionWrapper.

NOTE Please see the initializer documentation for details of how to call zero_state if using an AttentionWrapper with a BeamSearchDecoder.

Parameters:
  • batch_size0D integer tensor: the batch size.
  • dtype – The internal state data type.
Returns:

An AttentionWrapperState tuple containing zeroed out tensors and, possibly, empty TensorArray objects.

Raises:

ValueError – (or, possibly at runtime, InvalidArgument), if batch_size does not match the output size of the encoder passed to the wrapper object at initialization time.

class parts.rnns.attention_wrapper.AttentionWrapperState[source]

Bases: parts.rnns.attention_wrapper.AttentionWrapperState

namedtuple storing the state of a AttentionWrapper.

Contains:

  • cell_state: The state of the wrapped RNNCell at the previous time step.
  • attention: The attention emitted at the previous time step.
  • time: int32 scalar containing the current time step.
  • alignments: A single or tuple of `Tensor`(s) containing the alignments
    emitted at the previous time step for each attention mechanism.
  • alignment_history: (if enabled) a single or tuple of `TensorArray`(s)
    containing alignment matrices from all time steps for each attention mechanism. Call stack() on each to convert to a Tensor.
  • attention_state: A single or tuple of nested objects
    containing attention mechanism state for each attention mechanism. The objects may contain Tensors or TensorArrays.
clone(**kwargs)[source]

Clone this object, overriding components provided by kwargs.

The new state fields’ shape must match original state fields’ shape. This will be validated, and original fields’ shape will be propagated to new fields.

Example:

`python initial_state = attention_wrapper.zero_state(dtype=..., batch_size=...) initial_state = initial_state.clone(cell_state=encoder_state) `

Parameters:**kwargs – Any properties of the state object to replace in the returned AttentionWrapperState.
Returns:A new AttentionWrapperState whose properties are the same as this one, except any overridden properties as provided in kwargs.
class parts.rnns.attention_wrapper.LuongAttention(num_units, memory, memory_sequence_length=None, scale=False, probability_fn=None, score_mask_value=None, dtype=None, name='LuongAttention')[source]

Bases: parts.rnns.attention_wrapper._BaseAttentionMechanism

Implements Luong-style (multiplicative) attention scoring.

This attention has two forms. The first is standard Luong attention, as described in:

Minh-Thang Luong, Hieu Pham, Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation.” EMNLP 2015. https://arxiv.org/abs/1508.04025

The second is the scaled form inspired partly by the normalized form of Bahdanau attention.

To enable the second form, construct the object with parameter scale=True.

__init__(num_units, memory, memory_sequence_length=None, scale=False, probability_fn=None, score_mask_value=None, dtype=None, name='LuongAttention')[source]

Construct the AttentionMechanism mechanism.

Parameters:
  • num_units – The depth of the attention mechanism.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length – (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • scale – Python boolean. Whether to scale the energy term.
  • probability_fn – (optional) A callable. Converts the score to probabilities. The default is @{tf.nn.softmax}. Other options include @{tf.contrib.seq2seq.hardmax} and @{tf.contrib.sparsemax.sparsemax}. Its signature should be: probabilities = probability_fn(score).
  • score_mask_value – (optional) The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • dtype – The data type for the memory layer of the attention mechanism.
  • name – Name to use when creating ops.
class parts.rnns.attention_wrapper.BahdanauAttention(num_units, memory, memory_sequence_length=None, normalize=False, probability_fn=None, score_mask_value=None, dtype=None, name='BahdanauAttention')[source]

Bases: parts.rnns.attention_wrapper._BaseAttentionMechanism

Implements Bahdanau-style (additive) attention.

This attention has two forms. The first is Bahdanau attention, as described in:

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate.” ICLR 2015. https://arxiv.org/abs/1409.0473

The second is the normalized form. This form is inspired by the weight normalization article:

Tim Salimans, Diederik P. Kingma. “Weight Normalization: A Simple Reparameterization to Accelerate

Training of Deep Neural Networks.”

https://arxiv.org/abs/1602.07868

To enable the second form, construct the object with parameter normalize=True.

__init__(num_units, memory, memory_sequence_length=None, normalize=False, probability_fn=None, score_mask_value=None, dtype=None, name='BahdanauAttention')[source]

Construct the Attention mechanism.

Parameters:
  • num_units – The depth of the query mechanism.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length (optional) – Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • normalize – Python boolean. Whether to normalize the energy term.
  • probability_fn – (optional) A callable. Converts the score to probabilities. The default is @{tf.nn.softmax}. Other options include @{tf.contrib.seq2seq.hardmax} and @{tf.contrib.sparsemax.sparsemax}. Its signature should be: probabilities = probability_fn(score).
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • dtype – The data type for the query and memory layers of the attention mechanism.
  • name – Name to use when creating ops.
parts.rnns.attention_wrapper.hardmax(logits, name=None)[source]

Returns batched one-hot vectors.

The depth index containing the 1 is that of the maximum logit value.

Parameters:
  • logits – A batch tensor of logit values.
  • name – Name to use when creating ops.
Returns:

A batched one-hot tensor.

parts.rnns.attention_wrapper.safe_cumprod(x, *args, **kwargs)[source]

Computes cumprod of x in logspace using cumsum to avoid underflow.

The cumprod function and its gradient can result in numerical instabilities when its argument has very small and/or zero values. As long as the argument is all positive, we can instead compute the cumulative product as exp(cumsum(log(x))). This function can be called identically to tf.cumprod.

Parameters:
  • x – Tensor to take the cumulative product of.
  • *args – Passed on to cumsum; these are identical to those in cumprod.
  • **kwargs – Passed on to cumsum; these are identical to those in cumprod.
Returns:

Cumulative product of x.

parts.rnns.attention_wrapper.monotonic_attention(p_choose_i, previous_attention, mode)[source]

Compute monotonic attention distribution from choosing probabilities.

Monotonic attention implies that the input sequence is processed in an explicitly left-to-right manner when generating the output sequence. In addition, once an input sequence element is attended to at a given output timestep, elements occurring before it cannot be attended to at subsequent output timesteps. This function generates attention distributions according to these assumptions. For more information, see ``Online and Linear-Time Attention by Enforcing Monotonic Alignments’‘.

Parameters:
  • p_choose_i – Probability of choosing input sequence/memory element i. Should be of shape (batch_size, input_sequence_length), and should all be in the range [0, 1].
  • previous_attention – The attention distribution from the previous output timestep. Should be of shape (batch_size, input_sequence_length). For the first output timestep, preevious_attention[n] should be [1, 0, 0, …, 0] for all n in [0, … batch_size - 1].
  • mode

    How to compute the attention distribution. Must be one of ‘recursive’, ‘parallel’, or ‘hard’.

    • ’recursive’ uses tf.scan to recursively compute the distribution. This is slowest but is exact, general, and does not suffer from numerical instabilities.
    • ’parallel’ uses parallelized cumulative-sum and cumulative-product operations to compute a closed-form solution to the recurrence relation defining the attention distribution. This makes it more efficient than ‘recursive’, but it requires numerical checks which make the distribution non-exact. This can be a problem in particular when input_sequence_length is long and/or p_choose_i has entries very close to 0 or 1.
    • ’hard’ requires that the probabilities in p_choose_i are all either 0 or 1, and subsequently uses a more efficient and exact solution.
Returns:

A tensor of shape (batch_size, input_sequence_length) representing the attention distributions for each sequence in the batch.

Raises:

ValueError – mode is not one of ‘recursive’, ‘parallel’, ‘hard’.

class parts.rnns.attention_wrapper.BahdanauMonotonicAttention(num_units, memory, memory_sequence_length=None, normalize=False, score_mask_value=None, sigmoid_noise=0.0, sigmoid_noise_seed=None, score_bias_init=0.0, mode='parallel', dtype=None, name='BahdanauMonotonicAttention')[source]

Bases: parts.rnns.attention_wrapper._BaseMonotonicAttentionMechanism

Monotonic attention mechanism with Bahadanau-style energy function.

This type of attention encorces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in

Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017. https://arxiv.org/abs/1704.00784

__init__(num_units, memory, memory_sequence_length=None, normalize=False, score_mask_value=None, sigmoid_noise=0.0, sigmoid_noise_seed=None, score_bias_init=0.0, mode='parallel', dtype=None, name='BahdanauMonotonicAttention')[source]

Construct the Attention mechanism.

Parameters:
  • num_units – The depth of the query mechanism.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length (optional) – Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • normalize – Python boolean. Whether to normalize the energy term.
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • sigmoid_noise – Standard deviation of pre-sigmoid noise. See the docstring for _monotonic_probability_fn for more information.
  • sigmoid_noise_seed – (optional) Random seed for pre-sigmoid noise.
  • score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
  • mode – How to compute the attention distribution. Must be one of ‘recursive’, ‘parallel’, or ‘hard’. See the docstring for tf.contrib.seq2seq.monotonic_attention for more information.
  • dtype – The data type for the query and memory layers of the attention mechanism.
  • name – Name to use when creating ops.
class parts.rnns.attention_wrapper.LuongMonotonicAttention(num_units, memory, memory_sequence_length=None, scale=False, score_mask_value=None, sigmoid_noise=0.0, sigmoid_noise_seed=None, score_bias_init=0.0, mode='parallel', dtype=None, name='LuongMonotonicAttention')[source]

Bases: parts.rnns.attention_wrapper._BaseMonotonicAttentionMechanism

Monotonic attention mechanism with Luong-style energy function.

This type of attention encorces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output timesteps. It achieves this by using the _monotonic_probability_fn instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in

Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017. https://arxiv.org/abs/1704.00784

__init__(num_units, memory, memory_sequence_length=None, scale=False, score_mask_value=None, sigmoid_noise=0.0, sigmoid_noise_seed=None, score_bias_init=0.0, mode='parallel', dtype=None, name='LuongMonotonicAttention')[source]

Construct the Attention mechanism.

Parameters:
  • num_units – The depth of the query mechanism.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length (optional) – Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • scale – Python boolean. Whether to scale the energy term.
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • sigmoid_noise – Standard deviation of pre-sigmoid noise. See the docstring for _monotonic_probability_fn for more information.
  • sigmoid_noise_seed – (optional) Random seed for pre-sigmoid noise.
  • score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
  • mode – How to compute the attention distribution. Must be one of ‘recursive’, ‘parallel’, or ‘hard’. See the docstring for tf.contrib.seq2seq.monotonic_attention for more information.
  • dtype – The data type for the query and memory layers of the attention mechanism.
  • name – Name to use when creating ops.
class parts.rnns.attention_wrapper.LocationSensitiveAttention(num_units, memory, query_dim=None, memory_sequence_length=None, probability_fn=None, score_mask_value=None, dtype=None, use_bias=False, use_coverage=True, location_attn_type='chorowski', location_attention_params=None, name='LocationSensitiveAttention')[source]

Bases: parts.rnns.attention_wrapper._BaseAttentionMechanism

Implements Bahdanau-style (additive) scoring function with cumulative location information.

The implementation is described in:

Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, KyungHyun Cho, Yoshua Bengio “Attention-Based Models for Speech Recognition” https://arxiv.org/abs/1506.07503

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions” https://arxiv.org/abs/1712.05884

__init__(num_units, memory, query_dim=None, memory_sequence_length=None, probability_fn=None, score_mask_value=None, dtype=None, use_bias=False, use_coverage=True, location_attn_type='chorowski', location_attention_params=None, name='LocationSensitiveAttention')[source]

Construct the Attention mechanism.

Parameters:
  • num_units – The depth of the query mechanism.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length (optional) – Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
  • normalize – Python boolean. Whether to normalize the energy term.
  • probability_fn – (optional) A callable. Converts the score to probabilities. The default is @{tf.nn.softmax}. Other options include @{tf.contrib.seq2seq.hardmax} and @{tf.contrib.sparsemax.sparsemax}. Its signature should be: probabilities = probability_fn(score).
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • dtype – The data type for the query and memory layers of the attention mechanism.
  • use_bias (bool) – Whether to use a bias when computing alignments.
  • location_attn_type (String) – Accepts [“chorowski”, “zhaopeng”].
  • location_attention_params (dict) – Params required for location attention.
  • name – Name to use when creating ops.

flstm

Module for constructing RNN Cells.

class parts.rnns.flstm.FLSTMCell(num_units, fact_size, initializer=None, num_proj=None, forget_bias=1.0, activation=<function tanh>, reuse=None)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

Group LSTM cell (G-LSTM). The implementation is based on:

O. Kuchaiev and B. Ginsburg “Factorization Tricks for LSTM Networks”, ICLR 2017 workshop.

__init__(num_units, fact_size, initializer=None, num_proj=None, forget_bias=1.0, activation=<function tanh>, reuse=None)[source]

Initialize the parameters of G-LSTM cell. :param num_units: int, The number of units in the G-LSTM cell :param initializer: (optional) The initializer to use for the weight and

projection matrices.
Parameters:
  • num_proj – (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.
  • forget_bias – Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
  • activation – Activation function of the inner states.
  • reuse – (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
Raises:

ValueError – If num_units or num_proj is not divisible by number_of_groups.

call(inputs, state)[source]
output_size

Integer or TensorShape – size of outputs produced by this cell.

state_size

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

glstm

Module for constructing RNN Cells.

class parts.rnns.glstm.GLSTMCell(num_units, initializer=None, num_proj=None, number_of_groups=1, forget_bias=1.0, activation=<function tanh>, reuse=None)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

Group LSTM cell (G-LSTM). The implementation is based on:

O. Kuchaiev and B. Ginsburg “Factorization Tricks for LSTM Networks”, ICLR 2017 workshop.

__init__(num_units, initializer=None, num_proj=None, number_of_groups=1, forget_bias=1.0, activation=<function tanh>, reuse=None)[source]

Initialize the parameters of G-LSTM cell. :param num_units: int, The number of units in the G-LSTM cell :param initializer: (optional) The initializer to use for the weight and

projection matrices.
Parameters:
  • num_proj – (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.
  • number_of_groups – (optional) int, number of groups to use. If number_of_groups is 1, then it should be equivalent to LSTM cell
  • forget_bias – Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
  • activation – Activation function of the inner states.
  • reuse – (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
Raises:

ValueError – If num_units or num_proj is not divisible by number_of_groups.

_get_input_for_group(inputs, group_id, group_size)[source]

Slices inputs into groups to prepare for processing by cell’s groups :param inputs: cell input or it’s previous state,

a Tensor, 2D, [batch x num_units]
Parameters:
  • group_id – group id, a Scalar, for which to prepare input
  • group_size – size of the group
Returns:

subset of inputs corresponding to group “group_id”, a Tensor, 2D, [batch x num_units/number_of_groups]

call(inputs, state)[source]

Run one step of G-LSTM. :param inputs: input Tensor, 2D, [batch x num_units]. :param state: this must be a tuple of state Tensors, both 2-D, :param with column sizes c_state and m_state.:

Returns:
  • A 2-D, [batch x output_dim], Tensor representing the output of the G-LSTM after reading inputs when previous state was state. Here output_dim is:
    num_proj if num_proj was set, num_units otherwise.
  • LSTMStateTuple representing the new state of G-LSTM cell after reading inputs when the previous state was state.
Return type:A tuple containing
Raises:ValueError – If input size cannot be inferred from inputs via static shape inference.
output_size

Integer or TensorShape – size of outputs produced by this cell.

state_size

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

gnmt

GNMT attention sequence-to-sequence model with dynamic RNN support.

class parts.rnns.gnmt.GNMTAttentionMultiCell(attention_cell, cells, use_new_attention=False)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.MultiRNNCell

A MultiCell with GNMT attention style.

__init__(attention_cell, cells, use_new_attention=False)[source]

Creates a GNMTAttentionMultiCell.

Parameters:
  • attention_cell – An instance of AttentionWrapper.
  • cells – A list of RNNCell wrapped with AttentionInputWrapper.
  • use_new_attention – Whether to use the attention generated from current step bottom layer’s output. Default is False.
parts.rnns.gnmt.gnmt_residual_fn(inputs, outputs)[source]

Residual function that handles different inputs and outputs inner dims.

Parameters:
  • inputs – cell inputs, this is actual inputs concatenated with the attention vector.
  • outputs – cell outputs
Returns:

outputs + actual inputs

rnn_beam_search_decoder

A decoder that performs beam search.

class parts.rnns.rnn_beam_search_decoder.BeamSearchDecoderOutput[source]

Bases: parts.rnns.rnn_beam_search_decoder.BeamSearchDecoderOutput

class parts.rnns.rnn_beam_search_decoder.BeamSearchDecoderState[source]

Bases: parts.rnns.rnn_beam_search_decoder.BeamSearchDecoderState

class parts.rnns.rnn_beam_search_decoder.BeamSearchDecoder(cell, embedding, start_tokens, end_token, initial_state, beam_width, output_layer=None, length_penalty_weight=0.0, positional_embedding=None)[source]

Bases: tensorflow.contrib.seq2seq.python.ops.decoder.Decoder

BeamSearch sampling decoder.

NOTE If you are using the BeamSearchDecoder with a cell wrapped in AttentionWrapper, then you must ensure that:

  • The encoder output has been tiled to beam_width via @{tf.contrib.seq2seq.tile_batch} (NOT tf.tile).
  • The batch_size argument passed to the zero_state method of this wrapper is equal to true_batch_size * beam_width.
  • The initial state created with zero_state above contains a cell_state value containing properly tiled final state from the encoder.

An example:

``` tiled_encoder_outputs = tf.contrib.seq2seq.tile_batch(

encoder_outputs, multiplier=beam_width)
tiled_encoder_final_state = tf.conrib.seq2seq.tile_batch(
encoder_final_state, multiplier=beam_width)
tiled_sequence_length = tf.contrib.seq2seq.tile_batch(
sequence_length, multiplier=beam_width)
attention_mechanism = MyFavoriteAttentionMechanism(
num_units=attention_depth, memory=tiled_inputs, memory_sequence_length=tiled_sequence_length)

attention_cell = AttentionWrapper(cell, attention_mechanism, …) decoder_initial_state = attention_cell.zero_state(

dtype, batch_size=true_batch_size * beam_width)
decoder_initial_state = decoder_initial_state.clone(
cell_state=tiled_encoder_final_state)

```

__init__(cell, embedding, start_tokens, end_token, initial_state, beam_width, output_layer=None, length_penalty_weight=0.0, positional_embedding=None)[source]

Initialize the BeamSearchDecoder.

Parameters:
  • cell – An RNNCell instance.
  • embedding – A callable that takes a vector tensor of ids (argmax ids), or the params argument for embedding_lookup.
  • start_tokensint32 vector shaped [batch_size], the start tokens.
  • end_tokenint32 scalar, the token that marks end of decoding.
  • initial_state – A (possibly nested tuple of…) tensors and TensorArrays.
  • beam_width – Python integer, the number of beams.
  • output_layer – (Optional) An instance of tf.layers.Layer, i.e., tf.layers.Dense. Optional layer to apply to the RNN output prior to storing the result or sampling.
  • length_penalty_weight – Float weight to penalize length. Disabled with 0.0.
  • positional_embedding – A callable to use decoder positional embedding.
  • is None in which case positional embedding is disabled (Default) –
Raises:
  • TypeError – if cell is not an instance of RNNCell, or output_layer is not an instance of tf.layers.Layer.
  • ValueError – If start_tokens is not a vector or end_token is not a scalar.
_maybe_merge_batch_beams(t, s)[source]

Splits the tensor from a batch by beams into a batch of beams.

More exactly, t is a tensor of dimension [batch_size * beam_width] + s, then we reshape it to [batch_size, beam_width] + s.

Parameters:
  • tTensor of dimension [batch_size * beam_width] + s.
  • sTensor, Python int, or TensorShape.
Returns:

A reshaped version of t with shape [batch_size, beam_width] + s.

Raises:
  • TypeError – If t is an instance of TensorArray.
  • ValueError – If the rank of t is not statically known.
_maybe_split_batch_beams(t, s)[source]

Maybe splits the tensor from a batch by beams into a batch of beams.

We do this so that we can use nest and not run into problems with shapes.

Parameters:
  • tTensor, either scalar or shaped [batch_size * beam_width] + s.
  • sTensor, Python int, or TensorShape.
Returns:

If t is a matrix or higher order tensor, then the return value is t reshaped to [batch_size, beam_width] + s. Otherwise t is returned unchanged.

Raises:
  • TypeError – If t is an instance of TensorArray.
  • ValueError – If the rank of t is not statically known.
_merge_batch_beams(t, s=None)[source]

Merges the tensor from a batch of beams into a batch by beams.

More exactly, t is a tensor of dimension [batch_size, beam_width, s]. We reshape this into [batch_size*beam_width, s]

Parameters:
  • t – Tensor of dimension [batch_size, beam_width, s]
  • s – (Possibly known) depth shape.
Returns:

A reshaped version of t with dimension [batch_size * beam_width, s].

_split_batch_beams(t, s=None)[source]

Splits the tensor from a batch by beams into a batch of beams.

More exactly, t is a tensor of dimension [batch_size*beam_width, s]. We reshape this into [batch_size, beam_width, s]

Parameters:
  • t – Tensor of dimension [batch_size*beam_width, s].
  • s – (Possibly known) depth shape.
Returns:

A reshaped version of t with dimension [batch_size, beam_width, s].

Raises:

ValueError – If, after reshaping, the new tensor is not shaped [batch_size, beam_width, s] (assuming batch_size and beam_width are known statically).

batch_size

The batch size of input values.

finalize(outputs, final_state, sequence_lengths)[source]

Finalize and return the predicted_ids.

Parameters:
  • outputs – An instance of BeamSearchDecoderOutput.
  • final_state – An instance of BeamSearchDecoderState. Passed through to the output.
  • sequence_lengths – An int64 tensor shaped [batch_size, beam_width]. The sequence lengths determined for each beam during decode. NOTE These are ignored; the updated sequence lengths are stored in final_state.lengths.
Returns:

An instance of FinalBeamSearchDecoderOutput where the

predicted_ids are the result of calling _gather_tree.

final_state: The same input instance of BeamSearchDecoderState.

Return type:

outputs

initialize(name=None)[source]

Initialize the decoder.

Parameters:name – Name scope for any created operations.
Returns:(finished, start_inputs, initial_state).
output_dtype

A (possibly nested tuple of…) dtype[s].

output_size

A (possibly nested tuple of…) integer[s] or TensorShape object[s].

step(time, inputs, state, name=None)[source]

Perform a decoding step.

Parameters:
  • time – scalar int32 tensor.
  • inputs – A (structure of) input tensors.
  • state – A (structure of) state tensors and TensorArrays.
  • name – Name scope for any created operations.
Returns:

(outputs, next_state, next_inputs, finished).

tracks_own_finished

The BeamSearchDecoder shuffles its beams and their finished state.

For this reason, it conflicts with the dynamic_decode function’s tracking of finished states. Setting this property to true avoids early stopping of decoding due to mismanagement of the finished state in dynamic_decode.

Returns:True.
class parts.rnns.rnn_beam_search_decoder.FinalBeamSearchDecoderOutput[source]

Bases: parts.rnns.rnn_beam_search_decoder.FinalBeamDecoderOutput

Final outputs returned by the beam search after all decoding is finished.

Parameters:
  • predicted_ids – The final prediction. A tensor of shape [batch_size, T, beam_width] (or [T, batch_size, beam_width] if output_time_major is True). Beams are ordered from best to worst.
  • beam_search_decoder_output – An instance of BeamSearchDecoderOutput that describes the state of the beam search.
parts.rnns.rnn_beam_search_decoder.tile_batch(t, multiplier, name=None)[source]

Tile the batch dimension of a (possibly nested structure of) tensor(s) t.

For each tensor t in a (possibly nested structure) of tensors, this function takes a tensor t shaped [batch_size, s0, s1, …] composed of minibatch entries t[0], …, t[batch_size - 1] and tiles it to have a shape [batch_size * multiplier, s0, s1, …] composed of minibatch entries t[0], t[0], …, t[1], t[1], … where each minibatch entry is repeated multiplier times.

Parameters:
  • tTensor shaped [batch_size, …].
  • multiplier – Python int.
  • name – Name scope for any created operations.
Returns:

A (possibly nested structure of) Tensor shaped [batch_size * multiplier, …].

Raises:
  • ValueError – if tensor(s) t do not have a statically known rank or
  • the rank is < 1.

slstm

Implement https://arxiv.org/abs/1709.02755

Copy from LSTM, and make it functionally correct with minimum code change

class parts.rnns.slstm.BasicSLSTMCell(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

Basic SLSTM recurrent network cell.

The implementation is based on: https://arxiv.org/abs/1709.02755.

__init__(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None)[source]

Initialize the basic SLSTM cell.

Parameters:
  • num_units – int, The number of units in the SLSTM cell.
  • forget_bias – float, The bias added to forget gates (see above). Must set to 0.0 manually when restoring from CudnnLSTM-trained checkpoints.
  • state_is_tuple – If True, accepted and returned states are 2-tuples of the c_state and m_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.
  • activation – Activation function of the inner states. Default: tanh.
  • reuse – (optional) Python boolean describing whether to reuse variables in an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
call(inputs, state)[source]

Long short-term memory cell (LSTM).

Parameters:
  • inputs2-D tensor with shape [batch_size x input_size].
  • state – An LSTMStateTuple of state tensors, each shaped [batch_size x self.state_size], if state_is_tuple has been set to True. Otherwise, a Tensor shaped [batch_size x 2 * self.state_size].
Returns:

A pair containing the new hidden state, and the new state (either a

LSTMStateTuple or a concatenated state, depending on state_is_tuple).

output_size

Integer or TensorShape – size of outputs produced by this cell.

state_size

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

parts.rnns.slstm._linear(args, output_size, bias, bias_initializer=None, kernel_initializer=None)[source]

Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.

Parameters:
  • args – a 2D Tensor or a list of 2D, batch x n, Tensors.
  • output_size – int, second dimension of W[i].
  • bias – boolean, whether to add a bias term or not.
  • bias_initializer – starting value to initialize the bias (default is all zeros).
  • kernel_initializer – starting value to initialize the weight.
Returns:

A 2D Tensor with shape [batch x output_size] equal to sum_i(args[i] * W[i]), where W[i]s are newly created matrices.

Raises:

ValueError – if some of the arguments has unspecified or wrong shape.

utils

parts.rnns.utils.single_cell(cell_class, cell_params, dp_input_keep_prob=1.0, dp_output_keep_prob=1.0, recurrent_keep_prob=1.0, input_weight_keep_prob=1.0, recurrent_weight_keep_prob=1.0, weight_variational=False, dropout_seed=None, zoneout_prob=0.0, training=True, residual_connections=False, awd_initializer=False, variational_recurrent=False, dtype=None)[source]

Creates an instance of the rnn cell. Such cell describes one step one layer and can include residual connection and/or dropout

Parameters:
  • cell_class – Tensorflow RNN cell class
  • cell_params (dict) – cell parameters
  • dp_input_keep_prob (float) – (default: 1.0) input dropout keep probability.
  • dp_output_keep_prob (float) – (default: 1.0) output dropout keep probability.
  • zoneout_prob (float) – zoneout probability. Applying both zoneout and droupout is currently not supported
  • residual_connections (bool) – whether to add residual connection
Returns:

TF RNN instance

zoneout

class parts.rnns.zoneout.ZoneoutWrapper(cell, zoneout_prob, is_training=True, seed=None)[source]

Bases: tensorflow.python.ops.rnn_cell_impl.RNNCell

Operator adding zoneout to all states (states+cells) of the given cell. Code taken from https://github.com/teganmaharaj/zoneout applying zoneout as described in https://arxiv.org/pdf/1606.01305.pdf

output_size

Integer or TensorShape – size of outputs produced by this cell.

state_size

size(s) of state(s) used by this cell.

It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.