models¶

All base models available in OpenSeq2Seq.

model¶

class models.model.Model(params, mode='train', hvd=None)[source]¶

Bases: object

Abstract class that any model should inherit from. It automatically enables multi-GPU (or Horovod) computation, has mixed precision support, logs training summaries, etc.

__init__(params, mode='train', hvd=None)[source]¶

Model constructor. The TensorFlow graph should not be created here, but rather in the self.compile() method.

Parameters:

params (dict) – parameters describing the model. All supported parameters are listed in get_required_params(), get_optional_params() functions.
mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
hvd (optional) – if Horovod is used, this should be horovod.tensorflow module. If Horovod is not used, it should be None.

Config parameters:

random_seed (int) — random seed to use.
use_horovod (bool) — whether to use Horovod for distributed execution.
num_gpus (int) — number of GPUs to use. This parameter cannot be used if gpu_ids is specified. When use_horovod is True this parameter is ignored.
gpu_ids (list of ints) — GPU ids to use. This parameter cannot be used if num_gpus is specified. When use_horovod is True this parameter is ignored.
batch_size_per_gpu (int) — batch size to use for each GPU.
eval_batch_size_per_gpu (int) — batch size to use for each GPU during inference. This is for when training and inference have different computation and memory requirements, such as when training uses sampled softmax and inference uses full softmax. If not specified, it’s set to batch_size_per_gpu.
restore_best_checkpoint (bool) — if set to True, when doing evaluation and inference, the model will load the best checkpoint instead of the latest checkpoint. Best checkpoint is evaluated based on evaluation results, so it’s only available when the model is trained untder train_eval mode. Default to False.
load_model (str) — points to the location of the pretrained model for transfer learning. If specified, during training, the system will look into the checkpoint in this folder and restore all variables whose names and shapes match a variable in the new model.
num_epochs (int) — number of epochs to run training for. This parameter cannot be used if max_steps is specified.
max_steps (int) — number of steps to run training for. This parameter cannot be used if num_epochs is specified.
save_summaries_steps (int or None) — how often to save summaries. Setting it to None disables summaries saving.
print_loss_steps (int or None) — how often to print loss during training. Setting it to None disables loss printing.
print_samples_steps (int or None) — how often to print training samples (input sequences, correct answers and model predictions). Setting it to None disables samples printing.
print_bench_info_steps (int or None) — how often to print training benchmarking information (average number of objects processed per step). Setting it to None disables intermediate benchmarking printing, but the average information across the whole training will always be printed after the last iteration.
save_checkpoint_steps (int or None) — how often to save model checkpoints. Setting it to None disables checkpoint saving.
num_checkpoints (int) — number of last checkpoints to keep.
eval_steps (int) — how often to run evaluation during training. This parameter is only checked if --mode argument of run.py is “train_eval”. If no evaluation is needed you should use “train” mode.
logdir (string) — path to the log directory where all checkpoints and summaries will be saved.
data_layer (any class derived from DataLayer) — data layer class to use.
data_layer_params (dict) — dictionary with data layer configuration. For complete list of possible parameters see the corresponding class docs.
optimizer (string or TensorFlow optimizer class) — optimizer to use for training. Could be either “Adam”, “Adagrad”, “Ftrl”, “Momentum”, “RMSProp”, “SGD” or any valid TensorFlow optimizer class.
optimizer_params (dict) — dictionary that will be passed to optimizer __init__ method.
initializer — any valid TensorFlow initializer.
initializer_params (dict) — dictionary that will be passed to initializer __init__ method.
freeze_variables_regex (str or None) — if zero or more characters at the beginning of the name of a trainable variable match this pattern, then this variable will be frozen during training. Setting it to None disables freezing of variables.
regularizer — and valid TensorFlow regularizer.
regularizer_params (dict) — dictionary that will be passed to regularizer __init__ method.
dtype — model dtype. Could be either tf.float16, tf.float32 or “mixed”. For details see mixed precision training section in docs.
lr_policy — any valid learning rate policy function. For examples, see optimizers.lr_policies module.
lr_policy_params (dict) — dictionary containing lr_policy parameters.
max_grad_norm (float) — maximum value of gradient norm. Clipping will be performed if some gradients exceed this value (this is checked for each variable independently).
loss_scaling — could be float or string. If float, static loss scaling is applied. If string, the corresponding automatic loss scaling algorithm is used. Must be one of ‘Backoff’ of ‘LogMax’ (case insensitive). Only used when dtype=”mixed”. For details see mixed precision training section in docs.
loss_scaling_params (dict) — dictionary containing loss scaling parameters.
summaries (list) — which summaries to log. Could contain “learning_rate”, “gradients”, “gradient_norm”, “global_gradient_norm”, “variables”, “variable_norm”, “loss_scale”.
iter_size (int) — use this parameter to emulate large batches. The gradients will be accumulated for iter_size number of steps before applying update.
larc_params — dictionary with parameters for LARC (or LARS) optimization algorithms. Can contain the following parameters:
- larc_mode — Could be either “scale” (LARS) or “clip” (LARC). Note that it works in addition to any other optimization algorithm since we treat it as adaptive gradient clipping and learning rate adjustment.
- larc_eta (float) — LARC or LARS scaling parameter.
- min_update (float) — minimal value of the LARC (LARS) update.
- epsilon (float) — small number added to gradient norm in denominator for numerical stability.

_build_forward_pass_graph(input_tensors, gpu_id=0)[source]¶

Abstract method. Should create the graph of the forward pass of the model.

Parameters:

input_tensors – input_tensors defined by the data_layer class.
gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.

Returns:

tuple containing loss tensor and list of outputs tensors.

Loss tensor will be automatically provided to the optimizer and corresponding train_op will be created.

Samples tensors are stored in the _outputs attribute and can be accessed by calling get_output_tensors() function. For example, this happens inside utils.hooks.RunEvaluationHook to fetch output values for evaluation.

Both loss and outputs can be None when corresponding part of the graph is not built.

Return type:

tuple

_get_num_objects_per_step(worker_id=0)[source]¶

Define this method if you need benchmarking functionality. For example, for translation models, this method should return number of tokens in current batch, for image recognition model should return number of images in current batch.

Parameters:	worker_id (int) – id of the worker to get data layer from (not used for Horovod).
Returns:	tf.Tensor with number of objects in batch.

build_trt_forward_pass_graph(input_tensors, gpu_id=0, checkpoint=None)[source]¶: Wrapper around _build_forward_pass_graph which converts graph using TF-TRT

clip_last_batch(last_batch, true_size)[source]¶

This method performs last batch clipping. Used in cases when dataset is not divisible by the batch size and model does not support dynamic batch sizes. In those cases, the last batch will contain some data from the “next epoch” and this method can be used to remove that data. This method works for both dense and sparse tensors. In most cases you will not need to overwrite this method.

Parameters:	last_batch (list) – list with elements that could be either `np.array` or `tf.SparseTensorValue` containing data for last batch. The assumption is that the first axis of all data tensors will correspond to the current batch size. true_size (int) – true size that the last batch should be cut to.

compile(force_var_reuse=False, checkpoint=None)[source]¶: TensorFlow graph is built here.

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

finalize_inference(results_per_batch, output_file)[source]¶

This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). output_file (str) – name of the output file that inference results should be saved to.

get_data_layer(worker_id=0)[source]¶

Returns model data layer. When using Horovod, worker_id parameter is ignored. When using tower-based multi-GPU approach, worker_id can be used to select data layer for corresponding tower/GPU.

Parameters:	worker_id (int) – id of the worker to get data layer from (not used for Horovod).
Returns:	model data layer.

get_num_objects_per_step(worker_id=0)[source]¶

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

get_output_tensors(worker_id=0)[source]¶

Returns output tensors generated by _build_forward_pass_graph.() When using Horovod, worker_id parameter is ignored. When using tower-based multi-GPU approach, worker_id can be used to select tensors for corresponding tower/GPU.

Parameters:	worker_id (int) – id of the worker to get tensors from (not used for Horovod).
Returns:	output tensors.

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

get_tf_dtype()[source]¶: Returns actual TensorFlow dtype that will be used as variables dtype.

hvd¶: horovod.tensorflow module

infer(input_values, output_values)[source]¶

This method is analogous to self.evaluate(), but used in conjunction with self.finalize_inference() to perform inference.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in `self.finalize_inference()` method).
Return type:	list

last_step¶: Number of steps the training should be run for.

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

mode¶: Mode the model is executed in (“train”, “eval” or “infer”).

num_gpus¶: Number of GPUs the model will be run on. For Horovod this is always 1 and actual number of GPUs is controlled by Open-MPI parameters.

on_horovod¶: Whether the model is run on Horovod or not.

params¶: Parameters used to construct the model (dictionary).

steps_in_epoch¶: Number of steps in epoch. This parameter is only populated if num_epochs was specified in the config (otherwise it is None). It is used in training hooks to correctly print epoch number.

encoder_decoder¶

class models.encoder_decoder.EncoderDecoderModel(params, mode='train', hvd=None)[source]¶

Bases: open_seq2seq.models.model.Model

Standard encoder-decoder class with one encoder and one decoder. “encoder-decoder-loss” models should inherit from this class.

__init__(params, mode='train', hvd=None)[source]¶

Encoder-decoder model constructor. Note that TensorFlow graph should not be created here. All graph creation logic is happening inside self._build_forward_pass_graph() method.

Parameters:

params (dict) – parameters describing the model. All supported parameters are listed in get_required_params(), get_optional_params() functions.
mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
hvd (optional) – if Horovod is used, this should be horovod.tensorflow module. If Horovod is not used, it should be None.

Config parameters:

encoder (any class derived from Encoder) — encoder class to use.
encoder_params (dict) — dictionary with encoder configuration. For complete list of possible parameters see the corresponding class docs.
decoder (any class derived from Decoder) — decoder class to use.
decoder_params (dict) — dictionary with decoder configuration. For complete list of possible parameters see the corresponding class docs.
loss (any class derived from Loss) — loss class to use.
loss_params (dict) — dictionary with loss configuration. For complete list of possible parameters see the corresponding class docs.

_build_forward_pass_graph(input_tensors, gpu_id=0)[source]¶

TensorFlow graph for encoder-decoder-loss model is created here. This function connects encoder, decoder and loss together. As an input for encoder it will specify source tensors (as returned from the data layer). As an input for decoder it will specify target tensors as well as all output returned from encoder. For loss it will also specify target tensors and all output returned from decoder. Note that loss will only be built for mode == “train” or “eval”.

Parameters:	input_tensors (dict) – `input_tensors` dictionary that has to contain `source_tensors` key with the list of all source tensors, and `target_tensors` with the list of all target tensors. Note that `target_tensors` only need to be provided if mode is “train” or “eval”. gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.
Returns:	tuple containing loss tensor as returned from `loss.compute_loss()` and list of outputs tensors, which is taken from `decoder.decode()['outputs']`. When `mode == 'infer'`, loss will be None.
Return type:	tuple

_create_decoder()[source]¶

This function should return decoder class. Overwrite this function if additional parameters need to be specified for decoder, besides provided in the config.

Returns:	instance of a class derived from `decoders.decoder.Decoder`.

_create_encoder()[source]¶

This function should return encoder class. Overwrite this function if additional parameters need to be specified for encoder, besides provided in the config.

Returns:	instance of a class derived from `encoders.encoder.Encoder`.

_create_loss()[source]¶

This function should return loss class. Overwrite this function if additional parameters need to be specified for loss, besides provided in the config.

Returns:	instance of a class derived from `losses.loss.Loss`.

decoder¶: Model decoder.

encoder¶: Model encoder.

static get_optional_params()[source]¶

Static method with description of optional parameters.

Returns:	Dictionary containing all the parameters that can be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

loss_computator¶: Model loss computator.

speech2text¶

class models.speech2text.Speech2Text(params, mode='train', hvd=None)[source]¶

Bases: models.encoder_decoder.EncoderDecoderModel

_build_forward_pass_graph(input_tensors, gpu_id=0)[source]¶

TensorFlow graph for speech2text model is created here. This function connects encoder, decoder and loss together. As an input for encoder it will specify source tensors (as returned from the data layer). As an input for decoder it will specify target tensors as well as all output returned from encoder. For loss it will also specify target tensors and all output returned from decoder. Note that loss will only be built for mode == “train” or “eval”.

Parameters:	input_tensors (dict) – `input_tensors` dictionary that has to contain `source_tensors` key with the list of all source tensors, and `target_tensors` with the list of all target tensors. Note that `target_tensors` only need to be provided if mode is “train” or “eval”. gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.
Returns:	tuple containing loss tensor as returned from `loss.compute_loss()` and list of outputs tensors, which is taken from `decoder.decode()['outputs']`. When `mode == 'infer'`, loss will be None.
Return type:	tuple

_get_num_objects_per_step(worker_id=0)[source]¶: Returns number of audio frames in current batch.

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

finalize_inference(results_per_batch, output_file)[source]¶

This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). output_file (str) – name of the output file that inference results should be saved to.

infer(input_values, output_values)[source]¶

This method is analogous to self.evaluate(), but used in conjunction with self.finalize_inference() to perform inference.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in `self.finalize_inference()` method).
Return type:	list

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

models.speech2text.dense_tensor_to_chars(tensor, idx2char, startindex, endindex)[source]¶

models.speech2text.levenshtein(a, b)[source]¶: Calculates the Levenshtein distance between a and b. The code was copied from: http://hetland.org/coding/python/levenshtein.py

models.speech2text.plot_attention(alignments, pred_text, encoder_len, training_step)[source]¶

models.speech2text.sparse_tensor_to_chars(tensor, idx2char)[source]¶

models.speech2text.sparse_tensor_to_chars_bpe(tensor)[source]¶

text2text¶

class models.text2text.Text2Text(params, mode='train', hvd=None)[source]¶

Bases: models.encoder_decoder.EncoderDecoderModel

An example class implementing classical text-to-text model.

_get_num_objects_per_step(worker_id=0)[source]¶: Returns number of source tokens + number of target tokens in batch.

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

finalize_inference(results_per_batch, output_file)[source]¶

This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). output_file (str) – name of the output file that inference results should be saved to.

infer(input_values, output_values)[source]¶

This method is analogous to self.evaluate(), but used in conjunction with self.finalize_inference() to perform inference.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in `self.finalize_inference()` method).
Return type:	list

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

models.text2text.calculate_bleu(preds, targets)[source]¶

Function to calculate BLEU score.

Parameters:	preds – list of lists targets – list of lists
Returns:	BLEU score
Return type:	float32

models.text2text.transform_for_bleu(row, vocab, ignore_special=False, delim=' ', bpe_used=False)[source]¶

text2speech¶

class models.text2speech.Text2Speech(params, mode='train', hvd=None)[source]¶

Bases: models.encoder_decoder.EncoderDecoderModel

Text-to-speech data layer.

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None, samples_count=1)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

finalize_inference(results_per_batch, output_file)[source]¶

This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). output_file (str) – name of the output file that inference results should be saved to.

get_alignments(attention_mask)[source]¶

Get attention alignment plots.

Parameters:	attention_mask – attention alignment.
Returns:	Specs and titles to plot.

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

infer(input_values, output_values)[source]¶

This method is analogous to self.evaluate(), but used in conjunction with self.finalize_inference() to perform inference.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in `self.finalize_inference()` method).
Return type:	list

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

print_logs(mode, specs, titles, stop_token_pred, stop_target, audio_length, step, predicted_final_spec, predicted_mag_spec=None)[source]¶

Save audio files and plots.

Parameters:	mode – “train” or “eval”. specs – spectograms to plot. titles – spectogram titles. stop_token_pred – stop token prediction. stop_target – stop target. audio_length – length of the audio. step – current step. predicted_final_spec – predicted mel spectogram. predicted_mag_spec – predicted magnitude spectogram.
Returns:	Dictionary to log.

models.text2speech.griffin_lim(magnitudes, n_iters=50, n_fft=1024)[source]¶: Griffin-Lim algorithm to convert magnitude spectrograms to audio signals

models.text2speech.plot_spectrograms(specs, titles, stop_token_pred, audio_length, logdir, train_step, stop_token_target=None, number=0, append=False, save_to_tensorboard=False)[source]¶

Helper function to create a image to be logged to disk or a tf.Summary to be logged to tensorboard.

Parameters:

specs (array) – array of images to show
titles (array) – array of titles. Must match lengths of specs array
stop_token_pred (np.array) – np.array of size [time, 1] containing the stop token predictions from the model.
audio_length (int) – lenth of the predicted spectrogram
logdir (str) – dir to save image file is save_to_tensorboard is disabled.
train_step (int) – current training step
stop_token_target (np.array) – np.array of size [time, 1] containing the stop token target.
number (int) – Current sample number (used if evaluating more than 1 sample from a batch)
append (str) – Optional string to append to file name eg. train, eval, infer
save_to_tensorboard (bool) – If False, the created image is saved to the logdir as a png file. If True, the function returns a tf.Summary object containing the image and will be logged to the current tensorboard file.

Returns:

tf.Summary or None

models.text2speech.save_audio(magnitudes, logdir, step, sampling_rate, n_fft=1024, mode='train', number=0, save_format='tensorboard', power=1.5, gl_iters=50, verbose=True, max_normalization=False)[source]¶

Helper function to create a wav file to be logged to disk or a tf.Summary to be logged to tensorboard.

Parameters:

magnitudes (np.array) – np.array of size [time, n_fft/2 + 1] containing the energy spectrogram.
logdir (str) – dir to save image file is save_to_tensorboard is disabled.
step (int) – current training step
n_fft (int) – number of filters for fft and ifft.
sampling_rate (int) – samplng rate in Hz of the audio to be saved.
number (int) – Current sample number (used if evaluating more than 1 sample
mode (str) – Optional string to append to file name eg. train, eval, infer from a batch)
save_format – save_audio can either return the np.array containing the generated sound, log the wav file to the disk, or return a tensorboard summary object. Each method can be enabled by passing save_format as “np.array”, “tensorboard”, or “disk” respectively.

Returns:

tf.Summary or None

text2speech_centaur¶

class models.text2speech_centaur.Text2SpeechCentaur(params, mode='train', hvd=None)[source]¶

Bases: models.text2speech.Text2Speech

Text-to-speech data layer for Centaur.

get_alignments(attention_mask)[source]¶

Get attention alignment plots.

Parameters:	attention_mask – attention alignment.
Returns:	Specs and titles to plot.

text2speech_tacotron¶

class models.text2speech_tacotron.Text2SpeechTacotron(params, mode='train', hvd=None)[source]¶

Bases: models.text2speech.Text2Speech

Text-to-speech data layer for Tacotron.

get_alignments(attention_mask)[source]¶

Get attention alignment plots.

Parameters:	attention_mask – attention alignment.
Returns:	Specs and titles to plot.

text2speech_wavenet¶

class models.text2speech_wavenet.Text2SpeechWavenet(params, mode='train', hvd=None)[source]¶

Bases: models.encoder_decoder.EncoderDecoderModel

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

finalize_inference(results_per_batch, output_file)[source]¶

This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). output_file (str) – name of the output file that inference results should be saved to.

static get_required_params()[source]¶

Static method with description of required parameters.

Returns:	Dictionary containing all the parameters that have to be included into the `params` parameter of the class `__init__()` method.
Return type:	dict

infer(input_values, output_values)[source]¶

This method is analogous to self.evaluate(), but used in conjunction with self.finalize_inference() to perform inference.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in `self.finalize_inference()` method).
Return type:	list

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

models.text2speech_wavenet.save_audio(signal, logdir, step, sampling_rate, mode)[source]¶

image2label¶

class models.image2label.Image2Label(params, mode='train', hvd=None)[source]¶

Bases: models.encoder_decoder.EncoderDecoderModel

_get_num_objects_per_step(worker_id=0)[source]¶: Returns number of images in current batch, i.e. batch size.

evaluate(input_values, output_values)[source]¶

This method can be used in conjunction with self.finalize_evaluation() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer().input_tensors` concatenated across all workers. That is, input tensors for one batch combined from all GPUs. output_values – evaluation of `self.get_output_tensors()` concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns:	all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type:	list

finalize_evaluation(results_per_batch, training_step=None)[source]¶

This method can be used in conjunction with self.evaluate() to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called every eval_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors (using evaluation model). The self.evaluate() method is called on each batch data and it’s results will be collected and provided to self.finalize_evaluation() for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.

Parameters:	results_per_batch (list) – aggregation of values returned from all calls to `self.evaluate()` method (number of calls will be equal to number of evaluation batches). training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict

maybe_print_logs(input_values, output_values, training_step)[source]¶

This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every print_samples_steps (config parameter) iterations and input/output values will be populated automatically by calling sess.run on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.

Parameters:	input_values – evaluation of `self.get_data_layer(0).input_tensors`, that is, input tensors for one batch on the first GPU. output_values – evaluation of `self.get_output_tensors(0)`, that is, output tensors for one batch on the first GPU. training_step (int) – Current training step.
Returns:	dictionary with values that need to be logged to TensorBoard (can be empty).
Return type:	dict