decoders¶
This package contains various decoder. A Decoder typically takes representation and produces data.
decoder¶
-
class
decoders.decoder.
Decoder
(params, model, name='decoder', mode='train')[source]¶ Bases:
object
Abstract class from which all decoders must inherit.
-
__init__
(params, model, name='decoder', mode='train')[source]¶ Decoder constructor. Note that decoder constructors should not modify TensorFlow graph, all graph construction should happen in the
self._decode()
method.Parameters: - params (dict) – parameters describing the decoder.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - model (instance of a class derived from
Model
) – parent model that created this decoder. Could be None if no model access is required for the use case. - name (str) – name for decoder variable scope.
- mode (str) – mode decoder is going to be run in. Could be “train”, “eval” or “infer”.
Config parameters:
- initializer — any valid TensorFlow initializer. If no initializer is provided, model initializer will be used.
- initializer_params (dict) — dictionary that will be passed to
initializer
__init__
method. - regularizer — and valid TensorFlow regularizer. If no regularizer is provided, model regularizer will be used.
- regularizer_params (dict) — dictionary that will be passed to
regularizer
__init__
method. - dtype — model dtype. Could be either
tf.float16
,tf.float32
or “mixed”. For details see mixed precision training section in docs. If no dtype is provided, model dtype will be used.
- params (dict) – parameters describing the decoder.
All supported parameters are listed in
-
_cast_types
(input_dict)[source]¶ This function performs automatic cast of all inputs to decoder dtype.
Parameters: input_dict (dict) – dictionary passed to self._decode()
method.Returns: same as input_dict, but with all Tensors cast to decoder dtype. Return type: dict
-
_decode
(input_dict)[source]¶ This is the main function which should construct decoder graph. Typically, decoder will take hidden representation from encoder as an input and produce some output sequence as an output.
Parameters: input_dict (dict) – dictionary containing decoder inputs. If the decoder is used with
models.encoder_decoder
class,input_dict
will have the following content:{ "encoder_output": dictionary returned from encoder.encode() method "target_tensors": data_layer.input_tensors['target_tensors'] }
Returns: dictionary of decoder outputs. Typically this will be just: { "logits": logits that will be passed to Loss "outputs": list with actual decoded outputs, e.g. characters instead of logits }
Return type: dict
-
decode
(input_dict)[source]¶ Wrapper around
self._decode()
method. Here name, initializer and dtype are set in the variable scope and thenself._decode()
method is called.Parameters: input_dict (dict) – see self._decode()
docs.Returns: see self._decode()
docs.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
mode
¶ Mode decoder is run in.
-
name
¶ Decoder name.
-
params
¶ Parameters used to construct the decoder (dictionary)
-
fc_decoders¶
This module defines various fully-connected decoders (consisting of one fully connected layer).
These classes are usually used for models that are not really sequence-to-sequence and thus should be artificially split into encoder and decoder by cutting, for example, on the last fully-connected layer.
-
class
decoders.fc_decoders.
FullyConnectedCTCDecoder
(params, model, name='fully_connected_ctc_decoder', mode='train')[source]¶ Bases:
decoders.fc_decoders.FullyConnectedTimeDecoder
Fully connected time decoder that provides a CTC-based text generation (either with or without language model). If language model is not used,
tf.nn.ctc_greedy_decoder
will be used as text generation method.-
__init__
(params, model, name='fully_connected_ctc_decoder', mode='train')[source]¶ Fully connected CTC decoder constructor.
See parent class for arguments description.
Config parameters:
- use_language_model (bool) — whether to use language model for output text generation. If False, other config parameters are not used.
- decoder_library_path (string) — path to the ctc decoder with language model library.
- lm_path (string) — path to the language model file.
- trie_path (string) — path to the prefix trie file.
- alphabet_config_path (string) — path to the alphabet file.
- beam_width (int) — beam width for beam search.
- alpha (float) — weight that is assigned to language model probabilities.
- beta (float) — weight that is assigned to the word count.
- trie_weight (float) — weight for prefix tree vocabulary based character level rescoring.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
class
decoders.fc_decoders.
FullyConnectedDecoder
(params, model, name='fully_connected_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Simple decoder consisting of one fully-connected layer.
-
__init__
(params, model, name='fully_connected_decoder', mode='train')[source]¶ Fully connected decoder constructor.
See parent class for arguments description.
Config parameters:
- output_dim (int) — output dimension.
-
_decode
(input_dict)[source]¶ This method performs linear transformation of input.
Parameters: input_dict (dict) – input dictionary that has to contain the following fields:
input_dict = { 'encoder_output': { 'outputs': output of encoder (shape=[batch_size, num_features]) } }
Returns: dictionary with the following tensors: { 'logits': logits with the shape=[batch_size, output_dim] 'outputs': [logits] (same as logits but wrapped in list) }
Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
class
decoders.fc_decoders.
FullyConnectedSCDecoder
(params, model, name='fully_connected_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Fully connected decoder constructor for speech commands.
-
__init__
(params, model, name='fully_connected_decoder', mode='train')[source]¶ Fully connected decoder constructor.
See parent class for arguments description.
Config parameters:
- output_dim (int) — output dimension.
-
_decode
(input_dict)[source]¶ This method performs linear transformation of input.
Parameters: input_dict (dict) – input dictionary that has to contain the following fields:
input_dict = { 'encoder_output': { 'outputs': output of encoder (shape=[batch_size, num_features]) } }
Returns: dictionary with the following tensors: { 'logits': logits with the shape=[batch_size, output_dim] 'outputs': [logits] (same as logits but wrapped in list) }
Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
class
decoders.fc_decoders.
FullyConnectedTimeDecoder
(params, model, name='fully_connected_time_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Fully connected decoder that operates on inputs with time dimension. That is, input shape should be
[batch size, time length, num features]
.-
__init__
(params, model, name='fully_connected_time_decoder', mode='train')[source]¶ Fully connected time decoder constructor.
See parent class for arguments description.
Config parameters:
- tgt_vocab_size (int) — target vocabulary size, i.e. number of output features.
- logits_to_outputs_func — function that maps produced logits to decoder outputs, i.e. actual text sequences.
-
_decode
(input_dict)[source]¶ Creates TensorFlow graph for fully connected time decoder.
Parameters: input_dict (dict) – input dictionary that has to contain the following fields:
input_dict = { 'encoder_output': { "outputs": tensor with shape [batch_size, time length, hidden dim] "src_length": tensor with shape [batch_size] } }
Returns: dictionary with the following tensors: { 'logits': logits with the shape=[time length, batch_size, tgt_vocab_size] 'outputs': logits_to_outputs_func(logits, input_dict) }
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
rnn_decoders¶
RNN-based decoders.
-
class
decoders.rnn_decoders.
BeamSearchRNNDecoderWithAttention
(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶ Bases:
decoders.rnn_decoders.RNNDecoderWithAttention
Beam search version of RNN-based decoder with attention. Can be used only during Inference (mode=infer)
-
__init__
(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶ Initializes beam search decoder.
Parameters: params (dict) – dictionary with decoder parameters Config parameters:
- batch_size — batch size
- GO_SYMBOL — GO symbol id, must be the same as used in data layer
- END_SYMBOL — END symbol id, must be the same as used in data layer
- tgt_vocab_size — vocabulary size of target
- tgt_emb_size — embedding to use
- decoder_cell_units — number of units in RNN
- decoder_cell_type — RNN type: lstm, gru, glstm, etc.
- decoder_dp_input_keep_prob —
- decoder_dp_output_keep_prob —
- decoder_use_skip_connections — use residual connections or not
- attention_type — bahdanau, luong, gnmt, gnmt_v2
- bahdanau_normalize — (optional)
- luong_scale — (optional)
- mode — train or infer
… add any cell-specific parameters here as well
-
_decode
(input_dict)[source]¶ Decodes representation into data.
Parameters: input_dict (dict) – Python dictionary with inputs to decoder - Must define:
- src_inputs - decoder input Tensor of shape [batch_size, time, dim]
- or [time, batch_size, dim]
- src_lengths - decoder input lengths Tensor of shape [batch_size]
Does not need tgt_inputs and tgt_lengths
Returns: a Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim] or [time, batch_size, dim]- final_state - tensor with decoder final state
- final_sequence_lengths - tensor of shape [batch_size, time] or
- [time, batch_size]
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
class
decoders.rnn_decoders.
RNNDecoderWithAttention
(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Typical RNN decoder with attention mechanism.
-
__init__
(params, model, name='rnn_decoder_with_attention', mode='train')[source]¶ Initializes RNN decoder with embedding.
See parent class for arguments description.
Config parameters:
- batch_size (int) — batch size.
- GO_SYMBOL (int) — GO symbol id, must be the same as used in data layer.
- END_SYMBOL (int) — END symbol id, must be the same as used in data layer.
- tgt_emb_size (int) — embedding size to use.
- core_cell_params (dict) - parameters for RNN class
- core_cell (string) - RNN class.
- decoder_dp_input_keep_prob (float) - dropout input keep probability.
- decoder_dp_output_keep_prob (float) - dropout output keep probability.
- decoder_use_skip_connections (bool) - use residual connections or not.
- attention_type (string) - bahdanau, luong, gnmt or gnmt_v2.
- bahdanau_normalize (bool, optional) - whether to use normalization in bahdanau attention.
- luong_scale (bool, optional) - whether to use scale in luong attention
- … add any cell-specific parameters here as well.
-
_build_attention
(encoder_outputs, encoder_sequence_length)[source]¶ Builds Attention part of the graph. Currently supports “bahdanau” and “luong”.
-
_decode
(input_dict)[source]¶ Decodes representation into data.
Parameters: input_dict (dict) – Python dictionary with inputs to decoder. Config parameters:
- src_inputs — Decoder input Tensor of shape [batch_size, time, dim] or [time, batch_size, dim]
- src_lengths — Decoder input lengths Tensor of shape [batch_size]
- tgt_inputs — Only during training. labels Tensor of the shape [batch_size, time] or [time, batch_size].
- tgt_lengths — Only during training. labels lengths Tensor of the shape [batch_size].
Returns: Python dictionary with: * final_outputs - tensor of shape [batch_size, time, dim] or [time, batch_size, dim]- final_state - tensor with decoder final state
- final_sequence_lengths - tensor of shape [batch_size, time]
- or [time, batch_size]
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
transformer_decoder¶
-
class
decoders.transformer_decoder.
TransformerDecoder
(params, model, name='transformer_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
-
_get_symbols_to_logits_fn
(max_decode_length)[source]¶ Returns a decoding function that calculates logits of the next tokens.
-
decode_pass
(targets, encoder_outputs, inputs_attention_bias)[source]¶ Generate logits for each value in the target sequence.
Parameters: - targets – target values for the output sequence. int tensor with shape [batch_size, target_length]
- encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size]
- inputs_attention_bias – float tensor with shape [batch_size, 1, 1, input_length]
Returns: float32 tensor with shape [batch_size, target_length, vocab_size]
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
convs2s_decoder¶
-
class
decoders.convs2s_decoder.
ConvS2SDecoder
(params, model, name='convs2s_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
-
_get_symbols_to_logits_fn
()[source]¶ Returns a decoding function that calculates logits of the next tokens.
-
decode_pass
(targets, encoder_outputs, encoder_outputs_b, inputs_attention_bias)[source]¶ Generate logits for each value in the target sequence.
Parameters: - targets – target values for the output sequence. int tensor with shape [batch_size, target_length]
- encoder_outputs – continuous representation of input sequence. float tensor with shape [batch_size, input_length, hidden_size] float tensor with shape [batch_size, input_length, hidden_size]
- encoder_outputs_b – continuous representation of input sequence which includes the source embeddings. float tensor with shape [batch_size, input_length, hidden_size]
- inputs_attention_bias – float tensor with shape [batch_size, 1, input_length]
Returns: float32 tensor with shape [batch_size, target_length, vocab_size]
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
tacotron2_decoder¶
Tacotron2 decoder
-
class
decoders.tacotron2_decoder.
Prenet
(num_units, num_layers, activation_fn=None, dtype=None)[source]¶ Bases:
object
Fully connected prenet used in the decoder
-
__init__
(num_units, num_layers, activation_fn=None, dtype=None)[source]¶ Prenet initializer
Parameters: - num_units (int) – number of units in the fully connected layer
- num_layers (int) – number of fully connected layers
- activation_fn (callable) – any valid activation function
- dtype (dtype) – the data format for this layer
-
output_size
¶
-
-
class
decoders.tacotron2_decoder.
Tacotron2Decoder
(params, model, name='tacotron_2_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Tacotron 2 Decoder
-
__init__
(params, model, name='tacotron_2_decoder', mode='train')[source]¶ Tacotron-2 like decoder constructor. A lot of optional configurations are currently for testing. Not all configurations are supported. Use of thed efault config is recommended.
See parent class for arguments description.
Config parameters:
attention_layer_size (int) — size of attention layer.
attention_type (string) — Determines whether attention mechanism to use, should be one of ‘bahdanau’, ‘location’, or None. Use of ‘location’-sensitive attention is strongly recommended.
bahdanau_normalize (bool) — Whether to enable weight norm on the attention parameters. Defaults to False.
decoder_cell_units (int) — dimension of decoder RNN cells.
decoder_layers (int) — number of decoder RNN layers to use.
decoder_cell_type (callable) — could be “lstm”, “gru”, “glstm”, or “slstm”. Currently, only ‘lstm’ has been tested. Defaults to ‘lstm’.
time_major (bool) — whether to output as time major or batch major. Default is False for batch major.
use_swap_memory (bool) — default is False.
enable_prenet (bool) — whether to use the fully-connected prenet in the decoder. Defaults to True
prenet_layers (int) — number of fully-connected layers to use. Defaults to 2.
prenet_units (int) — number of units in each layer. Defaults to 256.
prenet_activation (callable) — activation function to use for the prenet lyaers. Defaults to relu
enable_postnet (bool) — whether to use the convolutional postnet in the decoder. Defaults to True
postnet_conv_layers (bool) — list with the description of convolutional layers. Must be passed if postnet is enabled For example:
"postnet_conv_layers": [ { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME", "activation_fn": tf.nn.tanh }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME", "activation_fn": tf.nn.tanh }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME", "activation_fn": tf.nn.tanh }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME", "activation_fn": tf.nn.tanh }, { "kernel_size": [5], "stride": [1], "num_channels": 80, "padding": "SAME", "activation_fn": None } ]
postnet_bn_momentum (float) — momentum for batch norm. Defaults to 0.1.
postnet_bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-5.
postnet_data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.
postnet_keep_dropout_prob (float) — keep probability for dropout in the postnet conv layers. Default to 0.5.
mask_decoder_sequence (bool) — Defaults to True.
attention_bias (bool) — Wether to use a bias term when calculating the attention. Only works for “location” attention. Defaults to False.
zoneout_prob (float) — zoneout probability for rnn layers. Defaults to 0.
dropout_prob (float) — dropout probability for rnn layers. Defaults to 0.1
parallel_iterations (int) — Number of parallel_iterations for tf.while loop inside dynamic_decode. Defaults to 32.
-
_build_attention
(encoder_outputs, encoder_sequence_length, attention_bias)[source]¶ Builds Attention part of the graph. Currently supports “bahdanau”, and “location”
-
_decode
(input_dict)[source]¶ Decodes representation into data
Parameters: input_dict (dict) – Python dictionary with inputs to decoder. Must define: * src_inputs - decoder input Tensor of shape [batch_size, time, dim]
or [time, batch_size, dim]- src_lengths - decoder input lengths Tensor of shape [batch_size]
- tgt_inputs - Only during training. labels Tensor of the shape [batch_size, time, num_features] or [time, batch_size, num_features]
- stop_token_inputs - Only during training. labels Tensor of the shape [batch_size, time, 1] or [time, batch_size, 1]
- tgt_lengths - Only during training. labels lengths Tensor of the shape [batch_size]
Returns: A python dictionary containing: - outputs - array containing:
- decoder_output - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram representation learned by the decoder rnn
- spectrogram_prediction - tensor of shape [batch_size, time, num_features] or [time, batch_size, num_features]. Spectrogram containing the residual corrections from the postnet if enabled
- alignments - tensor of shape [batch_size, time, memory_size] or [time, batch_size, memory_size]. The alignments learned by the attention layer
- stop_token_prediction - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions
- final_sequence_lengths - tensor of shape [batch_size]
- stop_token_predictions - tensor of shape [batch_size, time, 1] or [time, batch_size, 1]. The stop token predictions for use inside the loss function.
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
centaur_decoder¶
-
class
decoders.centaur_decoder.
CentaurDecoder
(params, model, name='centaur_decoder', mode='train')[source]¶ Bases:
decoders.decoder.Decoder
Centaur decoder that consists of attention blocks followed by convolutional layers.
-
__init__
(params, model, name='centaur_decoder', mode='train')[source]¶ Centaur decoder constructor.
See parent class for arguments description.
Config parameters:
prenet_layers (int) — number of fully-connected layers to use.
prenet_hidden_size (int) — number of units in each pre-net layer.
hidden_size (int) — dimensionality of hidden embeddings.
conv_layers (list) — list with the description of convolutional layers. For example:
"conv_layers": [ { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "VALID", "is_causal": True }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "VALID", "is_causal": True }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "VALID", "is_causal": True }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "VALID", "is_causal": True } ]
mag_conv_layers (list) — list with the description of convolutional layers to reconstruct magnitude.
attention_dropout (float) — dropout rate for attention layers.
layer_postprocess_dropout (float) — dropout rate for transformer block sublayers.
prenet_activation_fn (callable) — activation function to use for the prenet lyaers. Defaults to relu.
prenet_dropout (float) — dropout rate for the pre-net. Defaults to 0.5.
prenet_use_inference_dropout (bool) — whether to use dropout during the inference. Defaults to False.
cnn_dropout_prob (float) — dropout probabilty for cnn layers. Defaults to 0.5.
bn_momentum (float) — momentum for batch norm. Defaults to 0.95.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-8.
reduction_factor (int) — number of frames to predict in a time. Defaults to 1.
attention_layers (int) — number of attention blocks. Defaults to 4.
self_attention_conv_params (dict) — description of convolutional layer inside attention blocks. Defaults to None.
attention_heads (int) — number of attention heads. Defaults to 1.
attention_cnn_dropout_prob (float) — dropout rate for convolutional layers inside attention blocks. Defaults to 0.5.
window_size (int) — size of attention window for forcing monotonic attention during the inference. Defaults to None.
back_step_size (int) — number of steps attention is allowed to go back during the inference. Defaults to 0.
force_layers (list) — indices of layers where forcing of monotonic attention should be enabled. Defaults to all layers.
-
static
_convert_outputs
(outputs, reduction_factor, batch_size)[source]¶ Convert output of the decoder to appropriate format.
-
_inference_initial_state
(encoder_outputs, encoder_decoder_attention_bias)[source]¶ Create initial state for inference.
-
static
_shrink
(values, last_dim, reduction_factor)[source]¶ Shrink the given input by reduction_factor.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-