decoders

This package contains various decoder. A Decoder typically takes representation and produces data.

decoder

class decoders.decoder.Decoder(params, model, name='decoder', mode='train')[source]

Bases: object

Abstract class from which all decoders must inherit.

__init__(params, model, name='decoder', mode='train')[source]

Decoder constructor. Note that decoder constructors should not modify TensorFlow graph, all graph construction should happen in the self._decode() method.

Parameters:
  • params (dict) – parameters describing the decoder. All supported parameters are listed in get_required_params(), get_optional_params() functions.
  • model (instance of a class derived from Model) – parent model that created this decoder. Could be None if no model access is required for the use case.
  • name (str) – name for decoder variable scope.
  • mode (str) – mode decoder is going to be run in. Could be “train”, “eval” or “infer”.

Config parameters:

  • initializer — any valid TensorFlow initializer. If no initializer is provided, model initializer will be used.
  • initializer_params (dict) — dictionary that will be passed to initializer __init__ method.
  • regularizer — and valid TensorFlow regularizer. If no regularizer is provided, model regularizer will be used.
  • regularizer_params (dict) — dictionary that will be passed to regularizer __init__ method.
  • dtype — model dtype. Could be either tf.float16, tf.float32 or “mixed”. For details see mixed precision training section in docs. If no dtype is provided, model dtype will be used.
_cast_types(input_dict)[source]

This function performs automatic cast of all inputs to decoder dtype.

Parameters:input_dict (dict) – dictionary passed to self._decode() method.
Returns:same as input_dict, but with all Tensors cast to decoder dtype.
Return type:dict
_decode(input_dict)[source]

This is the main function which should construct decoder graph. Typically, decoder will take hidden representation from encoder as an input and produce some output sequence as an output.

Parameters:input_dict (dict) –

dictionary containing decoder inputs. If the decoder is used with models.encoder_decoder class, input_dict will have the following content:

{
  "encoder_output": dictionary returned from encoder.encode() method
  "target_tensors": data_layer.input_tensors['target_tensors']
}
Returns:dictionary of decoder outputs. Typically this will be just:
{
  "logits": logits that will be passed to Loss
  "outputs": list with actual decoded outputs, e.g. characters
             instead of logits
}
Return type:dict
decode(input_dict)[source]

Wrapper around self._decode() method. Here name, initializer and dtype are set in the variable scope and then self._decode() method is called.

Parameters:input_dict (dict) – see self._decode() docs.
Returns:see self._decode() docs.
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
mode

Mode decoder is run in.

name

Decoder name.

params

Parameters used to construct the decoder (dictionary)

fc_decoders

This module defines various fully-connected decoders (consisting of one fully connected layer).

These classes are usually used for models that are not really sequence-to-sequence and thus should be artificially split into encoder and decoder by cutting, for example, on the last fully-connected layer.

class decoders.fc_decoders.FullyConnectedCTCDecoder(params, model, name='fully_connected_ctc_decoder', mode='train')[source]

Bases: decoders.fc_decoders.FullyConnectedTimeDecoder

Fully connected time decoder that provides a CTC-based text generation (either with or without language model). If language model is not used, tf.nn.ctc_greedy_decoder will be used as text generation method.

__init__(params, model, name='fully_connected_ctc_decoder', mode='train')[source]

Fully connected CTC decoder constructor.

See parent class for arguments description.

Config parameters:

  • use_language_model (bool) — whether to use language model for output text generation. If False, other config parameters are not used.
  • decoder_library_path (string) — path to the ctc decoder with language model library.
  • lm_path (string) — path to the language model file.
  • trie_path (string) — path to the prefix trie file.
  • alphabet_config_path (string) — path to the alphabet file.
  • beam_width (int) — beam width for beam search.
  • alpha (float) — weight that is assigned to language model probabilities.
  • beta (float) — weight that is assigned to the word count.
  • trie_weight (float) — weight for prefix tree vocabulary based character level rescoring.
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
class decoders.fc_decoders.FullyConnectedDecoder(params, model, name='fully_connected_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

Simple decoder consisting of one fully-connected layer.

__init__(params, model, name='fully_connected_decoder', mode='train')[source]

Fully connected decoder constructor.

See parent class for arguments description.

Config parameters:

  • output_dim (int) — output dimension.
_decode(input_dict)[source]

This method performs linear transformation of input.

Parameters:input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    'outputs': output of encoder (shape=[batch_size, num_features])
  }
}
Returns:dictionary with the following tensors:
{
  'logits': logits with the shape=[batch_size, output_dim]
  'outputs': [logits] (same as logits but wrapped in list)
}
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
class decoders.fc_decoders.FullyConnectedSCDecoder(params, model, name='fully_connected_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

Fully connected decoder constructor for speech commands.

__init__(params, model, name='fully_connected_decoder', mode='train')[source]

Fully connected decoder constructor.

See parent class for arguments description.

Config parameters:

  • output_dim (int) — output dimension.
_decode(input_dict)[source]

This method performs linear transformation of input.

Parameters:input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    'outputs': output of encoder (shape=[batch_size, num_features])
  }
}
Returns:dictionary with the following tensors:
{
  'logits': logits with the shape=[batch_size, output_dim]
  'outputs': [logits] (same as logits but wrapped in list)
}
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
class decoders.fc_decoders.FullyConnectedTimeDecoder(params, model, name='fully_connected_time_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

Fully connected decoder that operates on inputs with time dimension. That is, input shape should be [batch size, time length, num features].

__init__(params, model, name='fully_connected_time_decoder', mode='train')[source]

Fully connected time decoder constructor.

See parent class for arguments description.

Config parameters:

  • tgt_vocab_size (int) — target vocabulary size, i.e. number of output features.
  • logits_to_outputs_func — function that maps produced logits to decoder outputs, i.e. actual text sequences.
_decode(input_dict)[source]

Creates TensorFlow graph for fully connected time decoder.

Parameters:input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    "outputs": tensor with shape [batch_size, time length, hidden dim]
    "src_length": tensor with shape [batch_size]
  }
}
Returns:dictionary with the following tensors:
{
  'logits': logits with the shape=[time length, batch_size, tgt_vocab_size]
  'outputs': logits_to_outputs_func(logits, input_dict)
}
Return type:dict
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict

rnn_decoders

RNN-based decoders.

class decoders.rnn_decoders.BeamSearchRNNDecoderWithAttention(params, model, name='rnn_decoder_with_attention', mode='train')[source]

Bases: decoders.rnn_decoders.RNNDecoderWithAttention

Beam search version of RNN-based decoder with attention. Can be used only during Inference (mode=infer)

__init__(params, model, name='rnn_decoder_with_attention', mode='train')[source]

Initializes beam search decoder.

Parameters:params (dict) – dictionary with decoder parameters

Config parameters:

  • batch_size — batch size
  • GO_SYMBOL — GO symbol id, must be the same as used in data layer
  • END_SYMBOL — END symbol id, must be the same as used in data layer
  • tgt_vocab_size — vocabulary size of target
  • tgt_emb_size — embedding to use
  • decoder_cell_units — number of units in RNN
  • decoder_cell_type — RNN type: lstm, gru, glstm, etc.
  • decoder_dp_input_keep_prob
  • decoder_dp_output_keep_prob
  • decoder_use_skip_connections — use residual connections or not
  • attention_type — bahdanau, luong, gnmt, gnmt_v2
  • bahdanau_normalize — (optional)
  • luong_scale — (optional)
  • mode — train or infer

… add any cell-specific parameters here as well

_decode(input_dict)[source]

Decodes representation into data.

Parameters:input_dict (dict) – Python dictionary with inputs to decoder
Must define:
  • src_inputs - decoder input Tensor of shape [batch_size, time, dim]
    or [time, batch_size, dim]
  • src_lengths - decoder input lengths Tensor of shape [batch_size]

Does not need tgt_inputs and tgt_lengths

Returns:a Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim] or
[time, batch_size, dim]
  • final_state - tensor with decoder final state
  • final_sequence_lengths - tensor of shape [batch_size, time] or
    [time, batch_size]
Return type:dict
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
class decoders.rnn_decoders.RNNDecoderWithAttention(params, model, name='rnn_decoder_with_attention', mode='train')[source]

Bases: decoders.decoder.Decoder

Typical RNN decoder with attention mechanism.

__init__(params, model, name='rnn_decoder_with_attention', mode='train')[source]

Initializes RNN decoder with embedding.

See parent class for arguments description.

Config parameters:

  • batch_size (int) — batch size.
  • GO_SYMBOL (int) — GO symbol id, must be the same as used in data layer.
  • END_SYMBOL (int) — END symbol id, must be the same as used in data layer.
  • tgt_emb_size (int) — embedding size to use.
  • core_cell_params (dict) - parameters for RNN class
  • core_cell (string) - RNN class.
  • decoder_dp_input_keep_prob (float) - dropout input keep probability.
  • decoder_dp_output_keep_prob (float) - dropout output keep probability.
  • decoder_use_skip_connections (bool) - use residual connections or not.
  • attention_type (string) - bahdanau, luong, gnmt or gnmt_v2.
  • bahdanau_normalize (bool, optional) - whether to use normalization in bahdanau attention.
  • luong_scale (bool, optional) - whether to use scale in luong attention
  • … add any cell-specific parameters here as well.
_build_attention(encoder_outputs, encoder_sequence_length)[source]

Builds Attention part of the graph. Currently supports “bahdanau” and “luong”.

_decode(input_dict)[source]

Decodes representation into data.

Parameters:input_dict (dict) – Python dictionary with inputs to decoder.

Config parameters:

  • src_inputs — Decoder input Tensor of shape [batch_size, time, dim] or [time, batch_size, dim]
  • src_lengths — Decoder input lengths Tensor of shape [batch_size]
  • tgt_inputs — Only during training. labels Tensor of the shape [batch_size, time] or [time, batch_size].
  • tgt_lengths — Only during training. labels lengths Tensor of the shape [batch_size].
Returns:Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim]
or [time, batch_size, dim]
  • final_state - tensor with decoder final state
  • final_sequence_lengths - tensor of shape [batch_size, time]
    or [time, batch_size]
Return type:dict
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict

transformer_decoder

class decoders.transformer_decoder.TransformerDecoder(params, model, name='transformer_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

_get_symbols_to_logits_fn(max_decode_length)[source]

Returns a decoding function that calculates logits of the next tokens.

decode_pass(targets, encoder_outputs, inputs_attention_bias)[source]

Generate logits for each value in the target sequence.

Parameters:
  • targets – target values for the output sequence. int tensor with shape [batch_size, target_length]
  • encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size]
  • inputs_attention_bias – float tensor with shape [batch_size, 1, 1, input_length]
Returns:

float32 tensor with shape [batch_size, target_length, vocab_size]

static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
predict(encoder_outputs, encoder_decoder_attention_bias)[source]

Return predicted sequence.

convs2s_decoder

class decoders.convs2s_decoder.ConvS2SDecoder(params, model, name='convs2s_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

_get_symbols_to_logits_fn()[source]

Returns a decoding function that calculates logits of the next tokens.

decode_pass(targets, encoder_outputs, encoder_outputs_b, inputs_attention_bias)[source]

Generate logits for each value in the target sequence.

Parameters:
  • targets – target values for the output sequence. int tensor with shape [batch_size, target_length]
  • encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size] float tensor with shape [batch_size, input_length, hidden_size]
  • encoder_outputs_b – continuous representation of input sequence which includes the source embeddings. float tensor with shape [batch_size, input_length, hidden_size]
  • inputs_attention_bias – float tensor with shape [batch_size, 1, input_length]
Returns:

float32 tensor with shape [batch_size, target_length, vocab_size]

static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict
predict(encoder_outputs, encoder_outputs_b, inputs_attention_bias)[source]

Return predicted sequence.

tacotron2_decoder

Tacotron2 decoder

class decoders.tacotron2_decoder.Prenet(num_units, num_layers, activation_fn=None, dtype=None)[source]

Bases: object

Fully connected prenet used in the decoder

__init__(num_units, num_layers, activation_fn=None, dtype=None)[source]

Prenet initializer

Parameters:
  • num_units (int) – number of units in the fully connected layer
  • num_layers (int) – number of fully connected layers
  • activation_fn (callable) – any valid activation function
  • dtype (dtype) – the data format for this layer
add_regularization(regularizer)[source]

Adds regularization to all prenet kernels

output_size
class decoders.tacotron2_decoder.Tacotron2Decoder(params, model, name='tacotron_2_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

Tacotron 2 Decoder

__init__(params, model, name='tacotron_2_decoder', mode='train')[source]

Tacotron-2 like decoder constructor. A lot of optional configurations are currently for testing. Not all configurations are supported. Use of thed efault config is recommended.

See parent class for arguments description.

Config parameters:

  • attention_layer_size (int) — size of attention layer.

  • attention_type (string) — Determines whether attention mechanism to use, should be one of ‘bahdanau’, ‘location’, or None. Use of ‘location’-sensitive attention is strongly recommended.

  • bahdanau_normalize (bool) — Whether to enable weight norm on the attention parameters. Defaults to False.

  • decoder_cell_units (int) — dimension of decoder RNN cells.

  • decoder_layers (int) — number of decoder RNN layers to use.

  • decoder_cell_type (callable) — could be “lstm”, “gru”, “glstm”, or “slstm”. Currently, only ‘lstm’ has been tested. Defaults to ‘lstm’.

  • time_major (bool) — whether to output as time major or batch major. Default is False for batch major.

  • use_swap_memory (bool) — default is False.

  • enable_prenet (bool) — whether to use the fully-connected prenet in the decoder. Defaults to True

  • prenet_layers (int) — number of fully-connected layers to use. Defaults to 2.

  • prenet_units (int) — number of units in each layer. Defaults to 256.

  • prenet_activation (callable) — activation function to use for the prenet lyaers. Defaults to relu

  • enable_postnet (bool) — whether to use the convolutional postnet in the decoder. Defaults to True

  • postnet_conv_layers (bool) — list with the description of convolutional layers. Must be passed if postnet is enabled For example:

    "postnet_conv_layers": [
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "SAME",
        "activation_fn": tf.nn.tanh
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "SAME",
        "activation_fn": tf.nn.tanh
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "SAME",
        "activation_fn": tf.nn.tanh
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "SAME",
        "activation_fn": tf.nn.tanh
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 80, "padding": "SAME",
        "activation_fn": None
      }
    ]
    
  • postnet_bn_momentum (float) — momentum for batch norm. Defaults to 0.1.

  • postnet_bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-5.

  • postnet_data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.

  • postnet_keep_dropout_prob (float) — keep probability for dropout in the postnet conv layers. Default to 0.5.

  • mask_decoder_sequence (bool) — Defaults to True.

  • attention_bias (bool) — Wether to use a bias term when calculating the attention. Only works for “location” attention. Defaults to False.

  • zoneout_prob (float) — zoneout probability for rnn layers. Defaults to 0.

  • dropout_prob (float) — dropout probability for rnn layers. Defaults to 0.1

  • parallel_iterations (int) — Number of parallel_iterations for tf.while loop inside dynamic_decode. Defaults to 32.

_build_attention(encoder_outputs, encoder_sequence_length, attention_bias)[source]

Builds Attention part of the graph. Currently supports “bahdanau”, and “location”

_decode(input_dict)[source]

Decodes representation into data

Parameters:input_dict (dict) –

Python dictionary with inputs to decoder. Must define: * src_inputs - decoder input Tensor of shape [batch_size, time, dim]

or [time, batch_size, dim]
  • src_lengths - decoder input lengths Tensor of shape [batch_size]
  • tgt_inputs - Only during training. labels Tensor of the shape [batch_size, time, num_features] or [time, batch_size, num_features]
  • stop_token_inputs - Only during training. labels Tensor of the shape [batch_size, time, 1] or [time, batch_size, 1]
  • tgt_lengths - Only during training. labels lengths Tensor of the shape [batch_size]
Returns:A python dictionary containing:
  • outputs - array containing:
    • decoder_output - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram representation learned by the decoder rnn
    • spectrogram_prediction - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram containing the residual corrections from the postnet if enabled
    • alignments - tensor of shape [batch_size, time, memory_size] or [time, batch_size, memory_size]. The alignments learned by the attention layer
    • stop_token_prediction - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions
    • final_sequence_lengths - tensor of shape [batch_size]
  • stop_token_predictions - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions for use inside the loss function.
Return type:dict
static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict

centaur_decoder

class decoders.centaur_decoder.CentaurDecoder(params, model, name='centaur_decoder', mode='train')[source]

Bases: decoders.decoder.Decoder

Centaur decoder that consists of attention blocks followed by convolutional layers.

__init__(params, model, name='centaur_decoder', mode='train')[source]

Centaur decoder constructor.

See parent class for arguments description.

Config parameters:

  • prenet_layers (int) — number of fully-connected layers to use.

  • prenet_hidden_size (int) — number of units in each pre-net layer.

  • hidden_size (int) — dimensionality of hidden embeddings.

  • conv_layers (list) — list with the description of convolutional layers. For example:

    "conv_layers": [
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "VALID", "is_causal": True
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "VALID", "is_causal": True
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "VALID", "is_causal": True
      },
      {
        "kernel_size": [5], "stride": [1],
        "num_channels": 512, "padding": "VALID", "is_causal": True
      }
    ]
    
  • mag_conv_layers (list) — list with the description of convolutional layers to reconstruct magnitude.

  • attention_dropout (float) — dropout rate for attention layers.

  • layer_postprocess_dropout (float) — dropout rate for transformer block sublayers.

  • prenet_activation_fn (callable) — activation function to use for the prenet lyaers. Defaults to relu.

  • prenet_dropout (float) — dropout rate for the pre-net. Defaults to 0.5.

  • prenet_use_inference_dropout (bool) — whether to use dropout during the inference. Defaults to False.

  • cnn_dropout_prob (float) — dropout probabilty for cnn layers. Defaults to 0.5.

  • bn_momentum (float) — momentum for batch norm. Defaults to 0.95.

  • bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-8.

  • reduction_factor (int) — number of frames to predict in a time. Defaults to 1.

  • attention_layers (int) — number of attention blocks. Defaults to 4.

  • self_attention_conv_params (dict) — description of convolutional layer inside attention blocks. Defaults to None.

  • attention_heads (int) — number of attention heads. Defaults to 1.

  • attention_cnn_dropout_prob (float) — dropout rate for convolutional layers inside attention blocks. Defaults to 0.5.

  • window_size (int) — size of attention window for forcing monotonic attention during the inference. Defaults to None.

  • back_step_size (int) — number of steps attention is allowed to go back during the inference. Defaults to 0.

  • force_layers (list) — indices of layers where forcing of monotonic attention should be enabled. Defaults to all layers.

static _convert_outputs(outputs, reduction_factor, batch_size)[source]

Convert output of the decoder to appropriate format.

static _expand(values, reduction_factor)[source]

Expand the given input by reduction_factor.

_inference_cond(state)[source]

Check if it’s time to stop inference.

_inference_initial_state(encoder_outputs, encoder_decoder_attention_bias)[source]

Create initial state for inference.

_inference_step(state)[source]

Make one inference step.

static _positional_encoding(x, dtype)[source]

Add positional encoding to the given input.

static _shrink(values, last_dim, reduction_factor)[source]

Shrink the given input by reduction_factor.

static get_optional_params()[source]

Static method with description of optional parameters.

Returns:Dictionary containing all the parameters that can be included into the params parameter of the class __init__() method.
Return type:dict
static get_required_params()[source]

Static method with description of required parameters.

Returns:Dictionary containing all the parameters that have to be included into the params parameter of the class __init__() method.
Return type:dict