encoders¶
This package contains various encoders. An encoder typically takes data and produces representation.
encoder¶
-
class
encoders.encoder.
Encoder
(params, model, name='encoder', mode='train')[source]¶ Bases:
object
Abstract class from which all encoders must inherit.
-
__init__
(params, model, name='encoder', mode='train')[source]¶ Encoder constructor. Note that encoder constructors should not modify TensorFlow graph, all graph construction should happen in the
self._encode()
method.Parameters: - params (dict) – parameters describing the encoder.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - model (instance of a class derived from
Model
) – parent model that created this encoder. Could be None if no model access is required for the use case. - name (str) – name for encoder variable scope.
- mode (str) – mode encoder is going to be run in. Could be “train”, “eval” or “infer”.
Config parameters:
- initializer — any valid TensorFlow initializer. If no initializer is provided, model initializer will be used.
- initializer_params (dict) — dictionary that will be passed to
initializer
__init__
method. - regularizer — and valid TensorFlow regularizer. If no regularizer is provided, model regularizer will be used.
- regularizer_params (dict) — dictionary that will be passed to
regularizer
__init__
method. - dtype — model dtype. Could be either
tf.float16
,tf.float32
or “mixed”. For details see mixed precision training section in docs. If no dtype is provided, model dtype will be used.
- params (dict) – parameters describing the encoder.
All supported parameters are listed in
-
_cast_types
(input_dict)[source]¶ This function performs automatic cast of all inputs to encoder dtype.
Parameters: input_dict (dict) – dictionary passed to self._encode()
method.Returns: same as input_dict, but with all Tensors cast to encoder dtype. Return type: dict
-
_encode
(input_dict)[source]¶ This is the main function which should construct encoder graph. Typically, encoder will take raw input sequence as an input and produce some hidden representation as an output.
Parameters: input_dict (dict) – dictionary containing encoder inputs. If the encoder is used with
models.encoder_decoder
class,input_dict
will have the following content:{ "source_tensors": data_layer.input_tensors['source_tensors'] }
Returns: dictionary of encoder outputs. Return all necessary outputs. Typically this will be just: { "outputs": outputs, "state": state, }
Return type: dict
-
encode
(input_dict)[source]¶ Wrapper around
self._encode()
method. Here name, initializer and dtype are set in the variable scope and thenself._encode()
method is called.Parameters: input_dict (dict) – see self._encode()
docs.Returns: see self._encode()
docs.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
mode
¶ Mode encoder is run in.
-
name
¶ Encoder name.
-
params
¶ Parameters used to construct the encoder (dictionary).
-
ds2_encoder¶
-
class
encoders.ds2_encoder.
DeepSpeech2Encoder
(params, model, name='ds2_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
DeepSpeech-2 like encoder.
-
__init__
(params, model, name='ds2_encoder', mode='train')[source]¶ DeepSpeech-2 like encoder constructor.
See parent class for arguments description.
Config parameters:
dropout_keep_prop (float) — keep probability for dropout.
conv_layers (list) — list with the description of convolutional layers. For example:
"conv_layers": [ { "kernel_size": [11, 41], "stride": [2, 2], "num_channels": 32, "padding": "SAME", }, { "kernel_size": [11, 21], "stride": [1, 2], "num_channels": 64, "padding": "SAME", }, { "kernel_size": [11, 21], "stride": [1, 2], "num_channels": 96, "padding": "SAME", }, ]
activation_fn — activation function to use.
num_rnn_layers — number of RNN layers to use.
rnn_type (string) — could be “lstm”, “gru”, “cudnn_gru”, “cudnn_lstm” or “layernorm_lstm”.
rnn_unidirectional (bool) — whether to use uni-directional or bi-directional RNNs.
rnn_cell_dim (int) — dimension of RNN cells.
row_conv (bool) — whether to use a “row” (“in plane”) convolutional layer after RNNs.
row_conv_width (int) — width parameter for “row” convolutional layer.
n_hidden (int) — number of hidden units for the last fully connected layer.
data_format (string) — could be either “channels_first”, “channels_last”, “BCTF”, “BTFC”, “BCFT”, “BFTC”.
Defaults to “channels_last”.
bn_momentum (float) — momentum for batch norm. Defaults to 0.99.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-3.
-
_encode
(input_dict)[source]¶ Creates TensorFlow graph for DeepSpeech-2 like encoder.
Parameters: input_dict (dict) – input dictionary that has to contain the following fields:
input_dict = { "source_tensors": [ src_sequence (shape=[batch_size, sequence length, num features]), src_length (shape=[batch_size]) ] }
Returns: dictionary with the following tensors: { 'outputs': hidden state, shape=[batch_size, sequence length, n_hidden] 'src_length': tensor, shape=[batch_size] }
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
tdnn_encoder¶
-
class
encoders.tdnn_encoder.
TDNNEncoder
(params, model, name='w2l_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
General time delay neural network (TDNN) encoder. Fully convolutional model
-
__init__
(params, model, name='w2l_encoder', mode='train')[source]¶ TDNN encoder constructor.
See parent class for arguments description.
Config parameters:
dropout_keep_prob (float) — keep probability for dropout.
convnet_layers (list) — list with the description of convolutional layers. For example:
"convnet_layers": [ { "type": "conv1d", "repeat" : 5, "kernel_size": [7], "stride": [1], "num_channels": 250, "padding": "SAME" }, { "type": "conv1d", "repeat" : 3, "kernel_size": [11], "stride": [1], "num_channels": 500, "padding": "SAME" }, { "type": "conv1d", "repeat" : 1, "kernel_size": [32], "stride": [1], "num_channels": 1000, "padding": "SAME" }, { "type": "conv1d", "repeat" : 1, "kernel_size": [1], "stride": [1], "num_channels": 1000, "padding": "SAME" }, ]
activation_fn — activation function to use.
data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.
normalization — normalization to use. Accepts [None, ‘batch_norm’]. Use None if you don’t want to use normalization. Defaults to ‘batch_norm’.
bn_momentum (float) — momentum for batch norm. Defaults to 0.90.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-3.
drop_block_prob (float) — probability of dropping encoder blocks. Defaults to 0.0 which corresponds to training without dropping blocks.
drop_block_index (int) – index of the block to drop on inference. Defaults to -1 which corresponds to keeping all blocks.
use_conv_mask (bool) — whether to apply a sequence mask prior to convolution operations. Defaults to False for backwards compatibility. Recommended to set as True
-
_encode
(input_dict)[source]¶ Creates TensorFlow graph for Wav2Letter like encoder.
Parameters: input_dict (dict) – input dictionary that has to contain the following fields:
input_dict = { "source_tensors": [ src_sequence (shape=[batch_size, sequence length, num features]), src_length (shape=[batch_size]) ] }
Returns: dictionary with the following tensors: { 'outputs': hidden state, shape=[batch_size, sequence length, n_hidden] 'src_length': tensor, shape=[batch_size] }
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
rnn_encoders¶
RNN-based encoders
-
class
encoders.rnn_encoders.
BidirectionalRNNEncoderWithEmbedding
(params, model, name='bidir_rnn_encoder_with_emb', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Bi-directional RNN-based encoder with embeddings. Can support various RNN cell types.
-
__init__
(params, model, name='bidir_rnn_encoder_with_emb', mode='train')[source]¶ Initializes bi-directional encoder with embeddings.
Parameters: params (dict) – dictionary with encoder parameters Must define:
- src_vocab_size - data vocabulary size
- src_emb_size - size of embedding to use
- encoder_cell_units - number of units in RNN cell
- encoder_cell_type - cell type: lstm, gru, etc.
- encoder_layers - number of layers
- encoder_dp_input_keep_prob -
- encoder_dp_output_keep_prob -
- encoder_use_skip_connections - true/false
- time_major (optional)
- use_swap_memory (optional)
- mode - train or infer
… add any cell-specific parameters here as well
Returns: encoder_params
-
_encode
(input_dict)[source]¶ Encodes data into representation. :param input_dict: a Python dictionary.
- Must define:
- *src_inputs - a Tensor of shape [batch_size, time] or
- [time, batch_size] (depending on time_major param)
- src_lengths - a Tensor of shape [batch_size]
Returns: - encoder_outputs - a Tensor of shape
- [batch_size, time, representation_dim]
or [time, batch_size, representation_dim] * encoder_state - a Tensor of shape [batch_size, dim] * src_lengths - (copy ref from input) a Tensor of shape [batch_size]
Return type: a Python dictionary with
-
enc_emb_w
¶
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
src_emb_size
¶
-
src_vocab_size
¶
-
-
class
encoders.rnn_encoders.
GNMTLikeEncoderWithEmbedding
(params, model, name='gnmt_encoder_with_emb', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Encoder similar to the one used in GNMT model: https://arxiv.org/abs/1609.08144. Must have at least 2 layers
-
__init__
(params, model, name='gnmt_encoder_with_emb', mode='train')[source]¶ Encodes data into representation.
Parameters: params (dict) – a Python dictionary. Must define:
- src_inputs - a Tensor of shape [batch_size, time] or
- [time, batch_size] (depending on time_major param)
- src_lengths - a Tensor of shape [batch_size]
Returns: - encoder_outputs - a Tensor of shape
- [batch_size, time, representation_dim]
or [time, batch_size, representation_dim] * encoder_state - a Tensor of shape [batch_size, dim] * src_lengths - (copy ref from input) a Tensor of shape [batch_size]
Return type: a Python dictionary with
-
enc_emb_w
¶
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
src_emb_size
¶
-
src_vocab_size
¶
-
-
class
encoders.rnn_encoders.
GNMTLikeEncoderWithEmbedding_cuDNN
(params, model, name='gnmt_encoder_with_emb_cudnn', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Encoder similar to the one used in GNMT model: https://arxiv.org/abs/1609.08144. Must have at least 2 layers. Uses cuDNN RNN blocks for efficiency
-
__init__
(params, model, name='gnmt_encoder_with_emb_cudnn', mode='train')[source]¶ Encodes data into representation
Parameters: params (dict) – a Python dictionary. Must define:
- src_inputs - a Tensor of shape [batch_size, time] or
- [time, batch_size] (depending on time_major param)
- src_lengths - a Tensor of shape [batch_size]
Returns: - encoder_outputs - a Tensor of shape
- [batch_size, time, representation_dim]
or [time, batch_size, representation_dim] * encoder_state - a Tensor of shape [batch_size, dim] * src_lengths - (copy ref from input) a Tensor of shape [batch_size]
Return type: a Python dictionary with
-
enc_emb_w
¶
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
src_emb_size
¶
-
src_vocab_size
¶
-
-
class
encoders.rnn_encoders.
UnidirectionalRNNEncoderWithEmbedding
(params, model, name='unidir_rnn_encoder_with_emb', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Uni-directional RNN decoder with embeddings. Can support various RNN cell types.
-
__init__
(params, model, name='unidir_rnn_encoder_with_emb', mode='train')[source]¶ Initializes uni-directional encoder with embeddings.
Parameters: params (dict) – dictionary with encoder parameters Must define:
- src_vocab_size - data vocabulary size
- src_emb_size - size of embedding to use
- encoder_cell_units - number of units in RNN cell
- encoder_cell_type - cell type: lstm, gru, etc.
- encoder_layers - number of layers
- encoder_dp_input_keep_prob -
- encoder_dp_output_keep_prob -
- encoder_use_skip_connections - true/false
- time_major (optional)
- use_swap_memory (optional)
- mode - train or infer
… add any cell-specific parameters here as well
-
_encode
(input_dict)[source]¶ Encodes data into representation.
Parameters: input_dict – a Python dictionary. Must define:
- src_inputs - a Tensor of shape [batch_size, time] or
- [time, batch_size] (depending on time_major param)
- src_lengths - a Tensor of shape [batch_size]
Returns: Return type: a Python dictionary with - encoder_outputs - a Tensor of shape
- [batch_size, time, representation_dim]
or [time, batch_size, representation_dim] * encoder_state - a Tensor of shape [batch_size, dim] * src_lengths - (copy ref from input) a Tensor of shape [batch_size]
-
enc_emb_w
¶
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
src_emb_size
¶
-
src_vocab_size
¶
-
transformer_encoder¶
-
class
encoders.transformer_encoder.
TransformerEncoder
(params, model, name='transformer_encoder', mode='train')[source]¶ Bases:
open_seq2seq.encoders.encoder.Encoder
Transformer model encoder
convs2s_encoder¶
Conv-based encoder
-
class
encoders.convs2s_encoder.
ConvS2SEncoder
(params, model, name='convs2s_encoder_with_emb', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Fully convolutional Encoder of ConvS2S
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
src_emb_size
¶
-
src_vocab_size
¶
-
static
resnet_encoder¶
-
class
encoders.resnet_encoder.
ResNetEncoder
(params, model, name='resnet_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
resnet_blocks¶
Contains definitions for Residual Networks.
Residual networks (‘v1’ ResNets) were originally proposed in: [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
The full preactivation ‘v2’ ResNet variant was introduced by: [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
The key difference of the full preactivation ‘v2’ variant compared to the ‘v1’ variant in [1] is the use of batch normalization before every weight layer rather than after.
-
encoders.resnet_blocks.
batch_norm
(inputs, training, data_format, regularizer, momentum, epsilon)[source]¶ Performs a batch normalization using a standard set of parameters.
-
encoders.resnet_blocks.
block_layer
(inputs, filters, bottleneck, block_fn, blocks, strides, training, name, data_format, regularizer, bn_regularizer, bn_momentum, bn_epsilon)[source]¶ Creates one layer of blocks for the ResNet model.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- filters – The number of filters for the first convolution of the layer.
- bottleneck – Is the block created a bottleneck block.
- block_fn – The block to use within the model, either building_block or bottleneck_block.
- blocks – The number of blocks contained in the layer.
- strides – The stride to use for the first convolution of the layer. If greater than 1, this layer will ultimately downsample the input.
- training – Either True or False, whether we are currently training the model. Needed for batch norm.
- name – A string name for the tensor output of the block layer.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: The output tensor of the block layer.
-
encoders.resnet_blocks.
bottleneck_block_v1
(inputs, filters, training, projection_shortcut, strides, data_format, regularizer, bn_regularizer, bn_momentum, bn_epsilon)[source]¶ A single block for ResNet v1, with a bottleneck.
Similar to _building_block_v1(), except using the “bottleneck” blocks described in:
- Convolution then batch normalization then ReLU as described by:
- Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385.pdf by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- filters – The number of filters for the convolutions.
- training – A Boolean for whether the model is in training or inference mode. Needed for batch normalization.
- projection_shortcut – The function to use for projection shortcuts (typically a 1x1 convolution when downsampling the input).
- strides – The block’s stride. If greater than 1, this block will ultimately downsample the input.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: The output tensor of the block; shape should match inputs.
-
encoders.resnet_blocks.
bottleneck_block_v2
(inputs, filters, training, projection_shortcut, strides, data_format, regularizer, bn_regularizer, bn_momentum, bn_epsilon)[source]¶ A single block for ResNet v2, without a bottleneck.
Similar to _building_block_v2(), except using the “bottleneck” blocks described in:
- Convolution then batch normalization then ReLU as described by:
- Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385.pdf by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
- Adapted to the ordering conventions of:
- Batch normalization then ReLu then convolution as described by:
- Identity Mappings in Deep Residual Networks https://arxiv.org/pdf/1603.05027.pdf by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- filters – The number of filters for the convolutions.
- training – A Boolean for whether the model is in training or inference mode. Needed for batch normalization.
- projection_shortcut – The function to use for projection shortcuts (typically a 1x1 convolution when downsampling the input).
- strides – The block’s stride. If greater than 1, this block will ultimately downsample the input.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: The output tensor of the block; shape should match inputs.
-
encoders.resnet_blocks.
building_block_v1
(inputs, filters, training, projection_shortcut, strides, data_format, regularizer, bn_regularizer, bn_momentum, bn_epsilon)[source]¶ A single block for ResNet v1, without a bottleneck.
- Convolution then batch normalization then ReLU as described by:
- Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385.pdf by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Dec 2015.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- filters – The number of filters for the convolutions.
- training – A Boolean for whether the model is in training or inference mode. Needed for batch normalization.
- projection_shortcut – The function to use for projection shortcuts (typically a 1x1 convolution when downsampling the input).
- strides – The block’s stride. If greater than 1, this block will ultimately downsample the input.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: The output tensor of the block; shape should match inputs.
-
encoders.resnet_blocks.
building_block_v2
(inputs, filters, training, projection_shortcut, strides, data_format, regularizer, bn_regularizer, bn_momentum, bn_epsilon)[source]¶ A single block for ResNet v2, without a bottleneck.
- Batch normalization then ReLu then convolution as described by:
- Identity Mappings in Deep Residual Networks https://arxiv.org/pdf/1603.05027.pdf by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Jul 2016.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- filters – The number of filters for the convolutions.
- training – A Boolean for whether the model is in training or inference mode. Needed for batch normalization.
- projection_shortcut – The function to use for projection shortcuts (typically a 1x1 convolution when downsampling the input).
- strides – The block’s stride. If greater than 1, this block will ultimately downsample the input.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: The output tensor of the block; shape should match inputs.
-
encoders.resnet_blocks.
conv2d_fixed_padding
(inputs, filters, kernel_size, strides, data_format, regularizer)[source]¶ Strided 2-D convolution with explicit padding.
-
encoders.resnet_blocks.
fixed_padding
(inputs, kernel_size, data_format)[source]¶ Pads the input along the spatial dimensions independently of input size.
Parameters: - inputs – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.
- kernel_size – The kernel to be used in the conv2d or max_pool2d operation. Should be a positive integer.
- data_format – The input format (‘channels_last’ or ‘channels_first’).
Returns: A tensor with the same format as the input with the data either intact (if kernel_size == 1) or padded (if kernel_size > 1).
cnn_encoder¶
This module contains classes and functions to build “general” convolutional neural networks from the description of arbitrary “layers”.
-
class
encoders.cnn_encoder.
CNNEncoder
(params, model, name='cnn_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
General CNN encoder that can be used to construct various different models.
-
__init__
(params, model, name='cnn_encoder', mode='train')[source]¶ CNN Encoder constructor.
See parent class for arguments description.
Config parameters:
cnn_layers (list) — list with the description of “convolutional” layers. For example:
"conv_layers": [ (tf.layers.conv2d, { 'filters': 64, 'kernel_size': (11, 11), 'strides': (4, 4), 'padding': 'VALID', 'activation': tf.nn.relu, }), (tf.layers.max_pooling2d, { 'pool_size': (3, 3), 'strides': (2, 2), }), (tf.layers.conv2d, { 'filters': 192, 'kernel_size': (5, 5), 'strides': (1, 1), 'padding': 'SAME', }), (tf.layers.batch_normalization, {'momentum': 0.9, 'epsilon': 0.0001}), (tf.nn.relu, {}), ]
Note that you don’t need to provide “regularizer”, “training”, “data_format” and “axis” parameters since they will be automatically added. “axis” will be derived from “data_format” and will be
1 if data_format == "channels_first" else 3
.fc_layers (list) — list with the description of “fully-connected” layers. The only different from convolutional layers is that the input will be automatically reshaped to 2D (batch size x num features). For example:
'fc_layers': [ (tf.layers.dense, {'units': 4096, 'activation': tf.nn.relu}), (tf.layers.dropout, {'rate': 0.5}), (tf.layers.dense, {'units': 4096, 'activation': tf.nn.relu}), (tf.layers.dropout, {'rate': 0.5}), ],
Note that you don’t need to provide “regularizer”, “training”, “data_format” and “axis” parameters since they will be automatically added. “axis” will be derived from “data_format” and will be
1 if data_format == "channels_first" else 3
.data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_first”.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
encoders.cnn_encoder.
build_layer
(inputs, layer, layer_params, data_format, regularizer, training, verbose=True)[source]¶ This function builds a layer from the layer function and it’s parameters.
It will automatically add regularizer parameter to the layer_params if the layer supports regularization. To check this, it will look for the “regularizer”, “kernel_regularizer” and “gamma_regularizer” names in this order in the
layer
call signature. If one of this parameters is supported it will pass regularizer object as a value for that parameter. Based on the same “checking signature” technique “data_format” and “training” parameters will try to be added. Finally, “axis” parameter will try to be specified with axis =1 if data_format == 'channels_first' else 3
. This is required for automatic building batch normalization layer.Parameters: - inputs – input Tensor that will be passed to the layer. Note that layer has to accept input as the first parameter.
- layer – layer function or class with
__call__
method defined. - layer_params (dict) – parameters passed to the
layer
. - data_format (string) – data format (“channels_first” or “channels_last”) that will be tried to be passed as an additional argument.
- regularizer – regularizer instance that will be tried to be passed as an additional argument.
- training (bool) – whether layer is built in training mode. Will be tried to be passed as an additional argument.
- verbose (bool) – whether to print information about built layers.
Returns: Tensor with layer output.
tacotron2_encoder¶
-
class
encoders.tacotron2_encoder.
Tacotron2Encoder
(params, model, name='tacotron2_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
Tacotron-2 like encoder.
Consists of an embedding layer followed by a convolutional layer followed by a recurrent layer.
-
__init__
(params, model, name='tacotron2_encoder', mode='train')[source]¶ Tacotron-2 like encoder constructor.
See parent class for arguments description.
Config parameters:
cnn_dropout_prob (float) — dropout probabilty for cnn layers.
rnn_dropout_prob (float) — dropout probabilty for cnn layers.
src_emb_size (int) — dimensionality of character embedding.
conv_layers (list) — list with the description of convolutional layers. For example:
"conv_layers": [ { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" } ]
activation_fn (callable) — activation function to use for conv layers.
num_rnn_layers — number of RNN layers to use.
rnn_cell_dim (int) — dimension of RNN cells.
rnn_type (callable) — Any valid RNN Cell class. Suggested class is lstm
rnn_unidirectional (bool) — whether to use uni-directional or bi-directional RNNs.
zoneout_prob (float) — zoneout probability. Defaults to 0.
use_cudnn_rnn (bool) — need to be enabled in rnn_type is a Cudnn class.
data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.
bn_momentum (float) — momentum for batch norm. Defaults to 0.1.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-5.
style_embedding_enable (bool) — Whether to enable GST. Defaults to False.
style_embedding_params (dict) — Parameters for GST layer. See _embed_style documentation.
-
_embed_style
(style_spec, style_len)[source]¶ Code that implements the reference encoder as described in “Towards end-to-end prosody transfer for expressive speech synthesis with Tacotron”, and “Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis”
Config parameters:
- conv_layers (list) — See the conv_layers parameter for the Tacotron-2 model.
- num_rnn_layers (int) — Number of rnn layers in the reference encoder
- rnn_cell_dim (int) — Size of rnn layer
- rnn_unidirectional (bool) — Uni- or bi-directional rnn.
- rnn_type — Must be a valid tf rnn cell class
- emb_size (int) — Size of gst
- attention_layer_size (int) — Size of linear layers in attention
- num_tokens (int) — Number of tokens for gst
- num_heads (int) — Number of attention heads
-
_encode
(input_dict)[source]¶ Creates TensorFlow graph for Tacotron-2 like encoder.
Parameters: input_dict (dict) – dictionary with inputs. Must define:
source_tensors - array containing [- source_sequence: tensor of shape [batch_size, sequence length]
- src_length: tensor of shape [batch_size]
]
Returns: A python dictionary containing: - outputs - tensor containing the encoded text to be passed to the attention layer
- src_length - the length of the encoded text
Return type: dict
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
wavenet_encoder¶
-
class
encoders.wavenet_encoder.
WavenetEncoder
(params, model, name='wavenet_encoder', mode='train')[source]¶ Bases:
encoders.encoder.Encoder
WaveNet like encoder.
Consists of several blocks of dilated causal convolutions.
-
__init__
(params, model, name='wavenet_encoder', mode='train')[source]¶ WaveNet like encoder constructor.
Config parameters: * layer_type (str) — type of convolutional layer, currently only
supports “conv1d”- kernel_size (int) — size of kernel
- strides (int) — size of stride
- padding (str) — padding, can be “SAME” or “VALID”
- blocks (int) — number of dilation cycles
- layers_per_block (int) — number of dilated convolutional layers in each block
- filters (int) — number of output channels
- quantization_channels (int) — depth of mu-law quantized input
- data_format (string) — could be either “channels_first” or “channels_last”. Defaults to “channels_last”.
- bn_momentum (float) — momentum for batch norm. Defaults to 0.1.
- bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-5.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
-
encoders.wavenet_encoder.
causal_conv_bn_actv
(layer_type, name, inputs, filters, kernel_size, activation_fn, strides, padding, regularizer, training, data_format, bn_momentum, bn_epsilon, dilation=1)[source]¶ Defines a single dilated causal convolutional layer with batch norm
-
encoders.wavenet_encoder.
conv_1x1
(layer_type, name, inputs, filters, strides, regularizer, training, data_format)[source]¶ Defines a single 1x1 convolution for convenience
-
encoders.wavenet_encoder.
wavenet_conv_block
(layer_type, name, inputs, condition_filter, condition_gate, filters, kernel_size, strides, padding, regularizer, training, data_format, bn_momentum, bn_epsilon, layers_per_block)[source]¶ Defines a single WaveNet block using the architecture specified in the original paper, including skip and residual connections
centaur_encoder¶
-
class
encoders.centaur_encoder.
CentaurEncoder
(params, model, name='centaur_encoder', mode='train')[source]¶ Bases:
open_seq2seq.encoders.encoder.Encoder
Centaur encoder that consists of convolutional layers.
-
__init__
(params, model, name='centaur_encoder', mode='train')[source]¶ Centaur encoder constructor.
See parent class for arguments description.
Config parameters:
src_vocab_size (int) — number of symbols in alphabet.
embedding_size (int) — dimensionality of character embedding.
output_size (int) — dimensionality of output embedding.
conv_layers (list) — list with the description of convolutional layers. For example:
"conv_layers": [ { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" }, { "kernel_size": [5], "stride": [1], "num_channels": 512, "padding": "SAME" } ]
bn_momentum (float) — momentum for batch norm. Defaults to 0.95.
bn_epsilon (float) — epsilon for batch norm. Defaults to 1e-8.
cnn_dropout_prob (float) — dropout probabilty for cnn layers. Defaults to 0.5.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-