decoders¶

This package contains various decoder. A Decoder typically takes representation and produces data.

decoder¶

class decoders.decoder.Decoder(params, model, name='decoder', mode='train')[source]¶

Bases: object

Abstract class from which all decoders must inherit.

__init__(params, model, name='decoder', mode='train')[source]¶

Decoder constructor. Note that decoder constructors should not modify TensorFlow graph, all graph construction should happen in the self._decode() method.

Parameters:

params (dict) – parameters describing the decoder. All supported parameters are listed in get_required_params(), get_optional_params() functions.
model (instance of a class derived from Model) – parent model that created this decoder. Could be None if no model access is required for the use case.
name (str) – name for decoder variable scope.
mode (str) – mode decoder is going to be run in. Could be “train”, “eval” or “infer”.

Config parameters:

initializer — any valid TensorFlow initializer. If no initializer is provided, model initializer will be used.
initializer_params (dict) — dictionary that will be passed to initializer __init__ method.
regularizer — and valid TensorFlow regularizer. If no regularizer is provided, model regularizer will be used.
regularizer_params (dict) — dictionary that will be passed to regularizer __init__ method.
dtype — model dtype. Could be either tf.float16, tf.float32 or “mixed”. For details see mixed precision training section in docs. If no dtype is provided, model dtype will be used.

_cast_types(input_dict)[source]¶

This function performs automatic cast of all inputs to decoder dtype.

Parameters:	input_dict (dict) – dictionary passed to `self._decode()` method.
Returns:	same as input_dict, but with all Tensors cast to decoder dtype.
Return type:	dict

_decode(input_dict)[source]¶

This is the main function which should construct decoder graph. Typically, decoder will take hidden representation from encoder as an input and produce some output sequence as an output.

Parameters:	input_dict (dict) – dictionary containing decoder inputs. If the decoder is used with `models.encoder_decoder` class, `input_dict` will have the following content: { "encoder_output": dictionary returned from encoder.encode() method "target_tensors": data_layer.input_tensors['target_tensors'] }
Returns:	dictionary of decoder outputs. Typically this will be just: { "logits": logits that will be passed to Loss "outputs": list with actual decoded outputs, e.g. characters instead of logits }
Return type:	dict

decode(input_dict)[source]¶

Wrapper around self._decode() method. Here name, initializer and dtype are set in the variable scope and then self._decode() method is called.

Parameters:	input_dict (dict) – see `self._decode()` docs.
Returns:	see `self._decode()` docs.

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

mode¶: Mode decoder is run in.

name¶: Decoder name.

params¶: Parameters used to construct the decoder (dictionary)

fc_decoders¶

This module defines various fully-connected decoders (consisting of one fully connected layer).

These classes are usually used for models that are not really sequence-to-sequence and thus should be artificially split into encoder and decoder by cutting, for example, on the last fully-connected layer.

class decoders.fc_decoders.FullyConnectedCTCDecoder(params, model, name='fully_connected_ctc_decoder', mode='train')[source]¶

Bases: decoders.fc_decoders.FullyConnectedTimeDecoder

Fully connected time decoder that provides a CTC-based text generation (either with or without language model). If language model is not used, tf.nn.ctc_greedy_decoder will be used as text generation method.

__init__(params, model, name='fully_connected_ctc_decoder', mode='train')[source]¶

Fully connected CTC decoder constructor.

See parent class for arguments description.

Config parameters:

use_language_model (bool) — whether to use language model for output text generation. If False, other config parameters are not used.
decoder_library_path (string) — path to the ctc decoder with language model library.
lm_path (string) — path to the language model file.
trie_path (string) — path to the prefix trie file.
alphabet_config_path (string) — path to the alphabet file.
beam_width (int) — beam width for beam search.
alpha (float) — weight that is assigned to language model probabilities.
beta (float) — weight that is assigned to the word count.
trie_weight (float) — weight for prefix tree vocabulary based character level rescoring.

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

class decoders.fc_decoders.FullyConnectedDecoder(params, model, name='fully_connected_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Simple decoder consisting of one fully-connected layer.

__init__(params, model, name='fully_connected_decoder', mode='train')[source]¶

Fully connected decoder constructor.

See parent class for arguments description.

Config parameters:

output_dim (int) — output dimension.

_decode(input_dict)[source]¶

This method performs linear transformation of input.

Parameters:

input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    'outputs': output of encoder (shape=[batch_size, num_features])
  }
}

Returns:

dictionary with the following tensors:

{
  'logits': logits with the shape=[batch_size, output_dim]
  'outputs': [logits] (same as logits but wrapped in list)
}

Return type: dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

class decoders.fc_decoders.FullyConnectedSCDecoder(params, model, name='fully_connected_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Fully connected decoder constructor for speech commands.

__init__(params, model, name='fully_connected_decoder', mode='train')[source]¶

Fully connected decoder constructor.

See parent class for arguments description.

Config parameters:

output_dim (int) — output dimension.

_decode(input_dict)[source]¶

This method performs linear transformation of input.

Parameters:

input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    'outputs': output of encoder (shape=[batch_size, num_features])
  }
}

Returns:

dictionary with the following tensors:

{
  'logits': logits with the shape=[batch_size, output_dim]
  'outputs': [logits] (same as logits but wrapped in list)
}

Return type: dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

class decoders.fc_decoders.FullyConnectedTimeDecoder(params, model, name='fully_connected_time_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Fully connected decoder that operates on inputs with time dimension. That is, input shape should be [batch size, time length, num features].

__init__(params, model, name='fully_connected_time_decoder', mode='train')[source]¶

Fully connected time decoder constructor.

See parent class for arguments description.

Config parameters:

tgt_vocab_size (int) — target vocabulary size, i.e. number of output features.
logits_to_outputs_func — function that maps produced logits to decoder outputs, i.e. actual text sequences.

_decode(input_dict)[source]¶

Creates TensorFlow graph for fully connected time decoder.

Parameters:

input_dict (dict) –

input dictionary that has to contain the following fields:

input_dict = {
  'encoder_output': {
    "outputs": tensor with shape [batch_size, time length, hidden dim]
    "src_length": tensor with shape [batch_size]
  }
}

Returns:

dictionary with the following tensors:

{
  'logits': logits with the shape=[time length, batch_size, tgt_vocab_size]
  'outputs': logits_to_outputs_func(logits, input_dict)
}

Return type: dict

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

rnn_decoders¶

RNN-based decoders.

class decoders.rnn_decoders.BeamSearchRNNDecoderWithAttention(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶

Bases: decoders.rnn_decoders.RNNDecoderWithAttention

Beam search version of RNN-based decoder with attention. Can be used only during Inference (mode=infer)

__init__(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶

Initializes beam search decoder.

Parameters:	params (dict) – dictionary with decoder parameters

Config parameters:

batch_size — batch size
GO_SYMBOL — GO symbol id, must be the same as used in data layer
END_SYMBOL — END symbol id, must be the same as used in data layer
tgt_vocab_size — vocabulary size of target
tgt_emb_size — embedding to use
decoder_cell_units — number of units in RNN
decoder_cell_type — RNN type: lstm, gru, glstm, etc.
decoder_dp_input_keep_prob —
decoder_dp_output_keep_prob —
decoder_use_skip_connections — use residual connections or not
attention_type — bahdanau, luong, gnmt, gnmt_v2
bahdanau_normalize — (optional)
luong_scale — (optional)
mode — train or infer

… add any cell-specific parameters here as well

_decode(input_dict)[source]¶

Decodes representation into data.

Parameters:	input_dict (dict) – Python dictionary with inputs to decoder

Must define:

src_inputs - decoder input Tensor of shape [batch_size, time, dim]

or [time, batch_size, dim]
src_lengths - decoder input lengths Tensor of shape [batch_size]

Does not need tgt_inputs and tgt_lengths

Returns:

a Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim] or

[time, batch_size, dim]

final_state - tensor with decoder final state
final_sequence_lengths - tensor of shape [batch_size, time] or

[time, batch_size]

Return type: dict

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

class decoders.rnn_decoders.RNNDecoderWithAttention(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Typical RNN decoder with attention mechanism.

__init__(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶

Initializes RNN decoder with embedding.

See parent class for arguments description.

Config parameters:

batch_size (int) — batch size.
GO_SYMBOL (int) — GO symbol id, must be the same as used in data layer.
END_SYMBOL (int) — END symbol id, must be the same as used in data layer.
tgt_emb_size (int) — embedding size to use.
core_cell_params (dict) - parameters for RNN class
core_cell (string) - RNN class.
decoder_dp_input_keep_prob (float) - dropout input keep probability.
decoder_dp_output_keep_prob (float) - dropout output keep probability.
decoder_use_skip_connections (bool) - use residual connections or not.
attention_type (string) - bahdanau, luong, gnmt or gnmt_v2.
bahdanau_normalize (bool, optional) - whether to use normalization in bahdanau attention.
luong_scale (bool, optional) - whether to use scale in luong attention
… add any cell-specific parameters here as well.

_build_attention(encoder_outputs, encoder_sequence_length)[source]¶: Builds Attention part of the graph. Currently supports “bahdanau” and “luong”.

_decode(input_dict)[source]¶

Decodes representation into data.

Parameters:	input_dict (dict) – Python dictionary with inputs to decoder.

Config parameters:

src_inputs — Decoder input Tensor of shape [batch_size, time, dim] or [time, batch_size, dim]
src_lengths — Decoder input lengths Tensor of shape [batch_size]
tgt_inputs — Only during training. labels Tensor of the shape [batch_size, time] or [time, batch_size].
tgt_lengths — Only during training. labels lengths Tensor of the shape [batch_size].

Returns:

Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim]

or [time, batch_size, dim]

final_state - tensor with decoder final state
final_sequence_lengths - tensor of shape [batch_size, time]

or [time, batch_size]

Return type: dict

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

transformer_decoder¶

class decoders.transformer_decoder.TransformerDecoder(params, model, name='transformer_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

_get_symbols_to_logits_fn(max_decode_length)[source]¶: Returns a decoding function that calculates logits of the next tokens.

decode_pass(targets, encoder_outputs, inputs_attention_bias)[source]¶

Generate logits for each value in the target sequence.

Parameters:	targets – target values for the output sequence. int tensor with shape [batch_size, target_length] encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size] inputs_attention_bias – float tensor with shape [batch_size, 1, 1, input_length]
Returns:	float32 tensor with shape [batch_size, target_length, vocab_size]

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

predict(encoder_outputs, encoder_decoder_attention_bias)[source]¶: Return predicted sequence.

convs2s_decoder¶

class decoders.convs2s_decoder.ConvS2SDecoder(params, model, name='convs2s_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

_get_symbols_to_logits_fn()[source]¶: Returns a decoding function that calculates logits of the next tokens.

decode_pass(targets, encoder_outputs, encoder_outputs_b, inputs_attention_bias)[source]¶

Generate logits for each value in the target sequence.

Parameters:

targets – target values for the output sequence. int tensor with shape [batch_size, target_length]
encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size] float tensor with shape [batch_size, input_length, hidden_size]
encoder_outputs_b – continuous representation of input sequence which includes the source embeddings. float tensor with shape [batch_size, input_length, hidden_size]
inputs_attention_bias – float tensor with shape [batch_size, 1, input_length]

Returns:

float32 tensor with shape [batch_size, target_length, vocab_size]

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

predict(encoder_outputs, encoder_outputs_b, inputs_attention_bias)[source]¶: Return predicted sequence.

tacotron2_decoder¶

Tacotron2 decoder

class decoders.tacotron2_decoder.Prenet(num_units, num_layers, activation_fn=None, dtype=None)[source]¶

Bases: object

Fully connected prenet used in the decoder

__init__(num_units, num_layers, activation_fn=None, dtype=None)[source]¶

Prenet initializer

Parameters:	num_units (int) – number of units in the fully connected layer num_layers (int) – number of fully connected layers activation_fn (callable) – any valid activation function dtype (dtype) – the data format for this layer

add_regularization(regularizer)[source]¶: Adds regularization to all prenet kernels

output_size¶

class decoders.tacotron2_decoder.Tacotron2Decoder(params, model, name='tacotron_2_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Tacotron 2 Decoder

__init__(params, model, name='tacotron_2_decoder', mode='train')[source]¶

Tacotron-2 like decoder constructor. A lot of optional configurations are currently for testing. Not all configurations are supported. Use of thed efault config is recommended.

See parent class for arguments description.

Config parameters:

attention_layer_size (int) — size of attention layer.
attention_type (string) — Determines whether attention mechanism to use, should be one of ‘bahdanau’, ‘location’, or None. Use of ‘location’-sensitive attention is strongly recommended.
bahdanau_normalize (bool) — Whether to enable weight norm on the attention parameters. Defaults to False.
decoder_cell_units (int) — dimension of decoder RNN cells.
decoder_layers (int) — number of decoder RNN layers to use.
decoder_cell_type (callable) — could be “lstm”, “gru”, “glstm”, or “slstm”. Currently, only ‘lstm’ has been tested. Defaults to ‘lstm’.
time_major (bool) — whether to output as time major or batch major. Default is False for batch major.
use_swap_memory (bool) — default is False.
enable_prenet (bool) — whether to use the fully-connected prenet in the decoder. Defaults to True
prenet_layers (int) — number of fully-connected layers to use. Defaults to 2.
prenet_units (int) — number of units in each layer. Defaults to 256.
prenet_activation (callable) — activation function to use for the prenet lyaers. Defaults to relu
enable_postnet (bool) — whether to use the convolutional postnet in the decoder. Defaults to True

postnet_conv_layers (bool) — list with the description of convolutional layers. Must be passed if postnet is enabled For example:

"postnet_conv_layers": [
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "SAME",
    "activation_fn": tf.nn.tanh
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "SAME",
    "activation_fn": tf.nn.tanh
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "SAME",
    "activation_fn": tf.nn.tanh
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "SAME",
    "activation_fn": tf.nn.tanh
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 80, "padding": "SAME",
    "activation_fn": None
  }
]

postnet_bn_momentum (float) — momentum for batch norm. Defaults to 0.1.
postnet_bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-5.
postnet_data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.
postnet_keep_dropout_prob (float) — keep probability for dropout in the postnet conv layers. Default to 0.5.
mask_decoder_sequence (bool) — Defaults to True.
attention_bias (bool) — Wether to use a bias term when calculating the attention. Only works for “location” attention. Defaults to False.
zoneout_prob (float) — zoneout probability for rnn layers. Defaults to 0.
dropout_prob (float) — dropout probability for rnn layers. Defaults to 0.1
parallel_iterations (int) — Number of parallel_iterations for tf.while loop inside dynamic_decode. Defaults to 32.

_build_attention(encoder_outputs, encoder_sequence_length, attention_bias)[source]¶: Builds Attention part of the graph. Currently supports “bahdanau”, and “location”

_decode(input_dict)[source]¶

Decodes representation into data

Parameters:

input_dict (dict) –

Python dictionary with inputs to decoder. Must define: * src_inputs - decoder input Tensor of shape [batch_size, time, dim]

or [time, batch_size, dim]

src_lengths - decoder input lengths Tensor of shape [batch_size]
tgt_inputs - Only during training. labels Tensor of the shape [batch_size, time, num_features] or [time, batch_size, num_features]
stop_token_inputs - Only during training. labels Tensor of the shape [batch_size, time, 1] or [time, batch_size, 1]
tgt_lengths - Only during training. labels lengths Tensor of the shape [batch_size]

Returns:

A python dictionary containing:

outputs - array containing:

decoder_output - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram representation learned by the decoder rnn

spectrogram_prediction - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram containing the residual corrections from the postnet if enabled

alignments - tensor of shape [batch_size, time, memory_size] or [time, batch_size, memory_size]. The alignments learned by the attention layer

stop_token_prediction - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions

final_sequence_lengths - tensor of shape [batch_size]

stop_token_predictions - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions for use inside the loss function.

Return type: dict

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

centaur_decoder¶

class decoders.centaur_decoder.CentaurDecoder(params, model, name='centaur_decoder', mode='train')[source]¶

Bases: decoders.decoder.Decoder

Centaur decoder that consists of attention blocks followed by convolutional layers.

__init__(params, model, name='centaur_decoder', mode='train')[source]¶

Centaur decoder constructor.

See parent class for arguments description.

Config parameters:

prenet_layers (int) — number of fully-connected layers to use.
prenet_hidden_size (int) — number of units in each pre-net layer.
hidden_size (int) — dimensionality of hidden embeddings.

conv_layers (list) — list with the description of convolutional layers. For example:

"conv_layers": [
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "VALID", "is_causal": True
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "VALID", "is_causal": True
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "VALID", "is_causal": True
  },
  {
    "kernel_size": [5], "stride": [1],
    "num_channels": 512, "padding": "VALID", "is_causal": True
  }
]

mag_conv_layers (list) — list with the description of convolutional layers to reconstruct magnitude.
attention_dropout (float) — dropout rate for attention layers.
layer_postprocess_dropout (float) — dropout rate for transformer block sublayers.
prenet_activation_fn (callable) — activation function to use for the prenet lyaers. Defaults to relu.
prenet_dropout (float) — dropout rate for the pre-net. Defaults to 0.5.
prenet_use_inference_dropout (bool) — whether to use dropout during the inference. Defaults to False.
cnn_dropout_prob (float) — dropout probabilty for cnn layers. Defaults to 0.5.
bn_momentum (float) — momentum for batch norm. Defaults to 0.95.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-8.
reduction_factor (int) — number of frames to predict in a time. Defaults to 1.
attention_layers (int) — number of attention blocks. Defaults to 4.
self_attention_conv_params (dict) — description of convolutional layer inside attention blocks. Defaults to None.
attention_heads (int) — number of attention heads. Defaults to 1.
attention_cnn_dropout_prob (float) — dropout rate for convolutional layers inside attention blocks. Defaults to 0.5.
window_size (int) — size of attention window for forcing monotonic attention during the inference. Defaults to None.
back_step_size (int) — number of steps attention is allowed to go back during the inference. Defaults to 0.
force_layers (list) — indices of layers where forcing of monotonic attention should be enabled. Defaults to all layers.

static _convert_outputs(outputs, reduction_factor, batch_size)[source]¶: Convert output of the decoder to appropriate format.

static _expand(values, reduction_factor)[source]¶: Expand the given input by reduction_factor.

_inference_cond(state)[source]¶: Check if it’s time to stop inference.

_inference_initial_state(encoder_outputs, encoder_decoder_attention_bias)[source]¶: Create initial state for inference.

_inference_step(state)[source]¶: Make one inference step.

static _positional_encoding(x, dtype)[source]¶: Add positional encoding to the given input.

static _shrink(values, last_dim, reduction_factor)[source]¶: Shrink the given input by reduction_factor.

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict