models¶
All base models available in OpenSeq2Seq.
model¶
-
class
models.model.
Model
(params, mode='train', hvd=None)[source]¶ Bases:
object
Abstract class that any model should inherit from. It automatically enables multi-GPU (or Horovod) computation, has mixed precision support, logs training summaries, etc.
-
__init__
(params, mode='train', hvd=None)[source]¶ Model constructor. The TensorFlow graph should not be created here, but rather in the
self.compile()
method.Parameters: - params (dict) – parameters describing the model.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
- hvd (optional) – if Horovod is used, this should be
horovod.tensorflow
module. If Horovod is not used, it should be None.
Config parameters:
- random_seed (int) — random seed to use.
- use_horovod (bool) — whether to use Horovod for distributed execution.
- num_gpus (int) — number of GPUs to use. This parameter cannot be
used if
gpu_ids
is specified. Whenuse_horovod
is True this parameter is ignored. - gpu_ids (list of ints) — GPU ids to use. This parameter cannot be
used if
num_gpus
is specified. Whenuse_horovod
is True this parameter is ignored. - batch_size_per_gpu (int) — batch size to use for each GPU.
- eval_batch_size_per_gpu (int) — batch size to use for each GPU during
inference. This is for when training and inference have different computation
and memory requirements, such as when training uses sampled softmax and
inference uses full softmax. If not specified, it’s set
to
batch_size_per_gpu
. - restore_best_checkpoint (bool) — if set to True, when doing evaluation
and inference, the model will load the best checkpoint instead of the latest
checkpoint. Best checkpoint is evaluated based on evaluation results, so
it’s only available when the model is trained untder
train_eval
mode. Default to False. - load_model (str) — points to the location of the pretrained model for transfer learning. If specified, during training, the system will look into the checkpoint in this folder and restore all variables whose names and shapes match a variable in the new model.
- num_epochs (int) — number of epochs to run training for.
This parameter cannot be used if
max_steps
is specified. - max_steps (int) — number of steps to run training for.
This parameter cannot be used if
num_epochs
is specified. - save_summaries_steps (int or None) — how often to save summaries. Setting it to None disables summaries saving.
- print_loss_steps (int or None) — how often to print loss during training. Setting it to None disables loss printing.
- print_samples_steps (int or None) — how often to print training samples (input sequences, correct answers and model predictions). Setting it to None disables samples printing.
- print_bench_info_steps (int or None) — how often to print training benchmarking information (average number of objects processed per step). Setting it to None disables intermediate benchmarking printing, but the average information across the whole training will always be printed after the last iteration.
- save_checkpoint_steps (int or None) — how often to save model checkpoints. Setting it to None disables checkpoint saving.
- num_checkpoints (int) — number of last checkpoints to keep.
- eval_steps (int) — how often to run evaluation during training.
This parameter is only checked if
--mode
argument ofrun.py
is “train_eval”. If no evaluation is needed you should use “train” mode. - logdir (string) — path to the log directory where all checkpoints and summaries will be saved.
- data_layer (any class derived from
DataLayer
) — data layer class to use. - data_layer_params (dict) — dictionary with data layer configuration. For complete list of possible parameters see the corresponding class docs.
- optimizer (string or TensorFlow optimizer class) — optimizer to use for training. Could be either “Adam”, “Adagrad”, “Ftrl”, “Momentum”, “RMSProp”, “SGD” or any valid TensorFlow optimizer class.
- optimizer_params (dict) — dictionary that will be passed to
optimizer
__init__
method. - initializer — any valid TensorFlow initializer.
- initializer_params (dict) — dictionary that will be passed to
initializer
__init__
method. - freeze_variables_regex (str or None) — if zero or more characters at the beginning of the name of a trainable variable match this pattern, then this variable will be frozen during training. Setting it to None disables freezing of variables.
- regularizer — and valid TensorFlow regularizer.
- regularizer_params (dict) — dictionary that will be passed to
regularizer
__init__
method. - dtype — model dtype. Could be either
tf.float16
,tf.float32
or “mixed”. For details see mixed precision training section in docs. - lr_policy — any valid learning rate policy function. For examples,
see
optimizers.lr_policies
module. - lr_policy_params (dict) — dictionary containing lr_policy parameters.
- max_grad_norm (float) — maximum value of gradient norm. Clipping will be performed if some gradients exceed this value (this is checked for each variable independently).
- loss_scaling — could be float or string. If float, static loss scaling is applied. If string, the corresponding automatic loss scaling algorithm is used. Must be one of ‘Backoff’ of ‘LogMax’ (case insensitive). Only used when dtype=”mixed”. For details see mixed precision training section in docs.
- loss_scaling_params (dict) — dictionary containing loss scaling parameters.
- summaries (list) — which summaries to log. Could contain “learning_rate”, “gradients”, “gradient_norm”, “global_gradient_norm”, “variables”, “variable_norm”, “loss_scale”.
- iter_size (int) — use this parameter to emulate large batches.
The gradients will be accumulated for
iter_size
number of steps before applying update. - larc_params — dictionary with parameters for LARC (or LARS)
optimization algorithms. Can contain the following parameters:
- larc_mode — Could be either “scale” (LARS) or “clip” (LARC). Note that it works in addition to any other optimization algorithm since we treat it as adaptive gradient clipping and learning rate adjustment.
- larc_eta (float) — LARC or LARS scaling parameter.
- min_update (float) — minimal value of the LARC (LARS) update.
- epsilon (float) — small number added to gradient norm in denominator for numerical stability.
- params (dict) – parameters describing the model.
All supported parameters are listed in
-
_build_forward_pass_graph
(input_tensors, gpu_id=0)[source]¶ Abstract method. Should create the graph of the forward pass of the model.
Parameters: - input_tensors –
input_tensors
defined by the data_layer class. - gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.
Returns: tuple containing loss tensor and list of outputs tensors.
Loss tensor will be automatically provided to the optimizer and corresponding
train_op
will be created.Samples tensors are stored in the
_outputs
attribute and can be accessed by callingget_output_tensors()
function. For example, this happens insideutils.hooks.RunEvaluationHook
to fetch output values for evaluation.Both loss and outputs can be None when corresponding part of the graph is not built.
Return type: tuple
- input_tensors –
-
_get_num_objects_per_step
(worker_id=0)[source]¶ Define this method if you need benchmarking functionality. For example, for translation models, this method should return number of tokens in current batch, for image recognition model should return number of images in current batch.
Parameters: worker_id (int) – id of the worker to get data layer from (not used for Horovod). Returns: tf.Tensor with number of objects in batch.
-
build_trt_forward_pass_graph
(input_tensors, gpu_id=0, checkpoint=None)[source]¶ Wrapper around _build_forward_pass_graph which converts graph using TF-TRT
-
clip_last_batch
(last_batch, true_size)[source]¶ This method performs last batch clipping. Used in cases when dataset is not divisible by the batch size and model does not support dynamic batch sizes. In those cases, the last batch will contain some data from the “next epoch” and this method can be used to remove that data. This method works for both dense and sparse tensors. In most cases you will not need to overwrite this method.
Parameters: - last_batch (list) – list with elements that could be either
np.array
ortf.SparseTensorValue
containing data for last batch. The assumption is that the first axis of all data tensors will correspond to the current batch size. - true_size (int) – true size that the last batch should be cut to.
- last_batch (list) – list with elements that could be either
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
finalize_inference
(results_per_batch, output_file)[source]¶ This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.
Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - output_file (str) – name of the output file that inference results should be saved to.
- results_per_batch (list) – aggregation of values returned from all calls
to
-
get_data_layer
(worker_id=0)[source]¶ Returns model data layer. When using Horovod,
worker_id
parameter is ignored. When using tower-based multi-GPU approach,worker_id
can be used to select data layer for corresponding tower/GPU.Parameters: worker_id (int) – id of the worker to get data layer from (not used for Horovod). Returns: model data layer.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
get_output_tensors
(worker_id=0)[source]¶ Returns output tensors generated by
_build_forward_pass_graph.()
When using Horovod,worker_id
parameter is ignored. When using tower-based multi-GPU approach,worker_id
can be used to select tensors for corresponding tower/GPU.Parameters: worker_id (int) – id of the worker to get tensors from (not used for Horovod). Returns: output tensors.
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
hvd
¶ horovod.tensorflow module
-
infer
(input_values, output_values)[source]¶ This method is analogous to
self.evaluate()
, but used in conjunction withself.finalize_inference()
to perform inference.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in
self.finalize_inference()
method).Return type: list
- input_values – evaluation of
-
last_step
¶ Number of steps the training should be run for.
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-
mode
¶ Mode the model is executed in (“train”, “eval” or “infer”).
-
num_gpus
¶ Number of GPUs the model will be run on. For Horovod this is always 1 and actual number of GPUs is controlled by Open-MPI parameters.
-
on_horovod
¶ Whether the model is run on Horovod or not.
-
params
¶ Parameters used to construct the model (dictionary).
-
steps_in_epoch
¶ Number of steps in epoch. This parameter is only populated if
num_epochs
was specified in the config (otherwise it is None). It is used in training hooks to correctly print epoch number.
-
encoder_decoder¶
-
class
models.encoder_decoder.
EncoderDecoderModel
(params, mode='train', hvd=None)[source]¶ Bases:
open_seq2seq.models.model.Model
Standard encoder-decoder class with one encoder and one decoder. “encoder-decoder-loss” models should inherit from this class.
-
__init__
(params, mode='train', hvd=None)[source]¶ Encoder-decoder model constructor. Note that TensorFlow graph should not be created here. All graph creation logic is happening inside
self._build_forward_pass_graph()
method.Parameters: - params (dict) – parameters describing the model.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
- hvd (optional) – if Horovod is used, this should be
horovod.tensorflow
module. If Horovod is not used, it should be None.
Config parameters:
- encoder (any class derived from
Encoder
) — encoder class to use. - encoder_params (dict) — dictionary with encoder configuration. For complete list of possible parameters see the corresponding class docs.
- decoder (any class derived from
Decoder
) — decoder class to use. - decoder_params (dict) — dictionary with decoder configuration. For complete list of possible parameters see the corresponding class docs.
- loss (any class derived from
Loss
) — loss class to use. - loss_params (dict) — dictionary with loss configuration. For complete list of possible parameters see the corresponding class docs.
- params (dict) – parameters describing the model.
All supported parameters are listed in
-
_build_forward_pass_graph
(input_tensors, gpu_id=0)[source]¶ TensorFlow graph for encoder-decoder-loss model is created here. This function connects encoder, decoder and loss together. As an input for encoder it will specify source tensors (as returned from the data layer). As an input for decoder it will specify target tensors as well as all output returned from encoder. For loss it will also specify target tensors and all output returned from decoder. Note that loss will only be built for mode == “train” or “eval”.
Parameters: - input_tensors (dict) –
input_tensors
dictionary that has to containsource_tensors
key with the list of all source tensors, andtarget_tensors
with the list of all target tensors. Note thattarget_tensors
only need to be provided if mode is “train” or “eval”. - gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.
Returns: tuple containing loss tensor as returned from
loss.compute_loss()
and list of outputs tensors, which is taken fromdecoder.decode()['outputs']
. Whenmode == 'infer'
, loss will be None.Return type: tuple
- input_tensors (dict) –
-
_create_decoder
()[source]¶ This function should return decoder class. Overwrite this function if additional parameters need to be specified for decoder, besides provided in the config.
Returns: instance of a class derived from decoders.decoder.Decoder
.
-
_create_encoder
()[source]¶ This function should return encoder class. Overwrite this function if additional parameters need to be specified for encoder, besides provided in the config.
Returns: instance of a class derived from encoders.encoder.Encoder
.
-
_create_loss
()[source]¶ This function should return loss class. Overwrite this function if additional parameters need to be specified for loss, besides provided in the config.
Returns: instance of a class derived from losses.loss.Loss
.
-
decoder
¶ Model decoder.
-
encoder
¶ Model encoder.
-
static
get_optional_params
()[source]¶ Static method with description of optional parameters.
Returns: Dictionary containing all the parameters that can be included into the params
parameter of the class__init__()
method.Return type: dict
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
loss_computator
¶ Model loss computator.
-
speech2text¶
-
class
models.speech2text.
Speech2Text
(params, mode='train', hvd=None)[source]¶ Bases:
models.encoder_decoder.EncoderDecoderModel
-
_build_forward_pass_graph
(input_tensors, gpu_id=0)[source]¶ TensorFlow graph for speech2text model is created here. This function connects encoder, decoder and loss together. As an input for encoder it will specify source tensors (as returned from the data layer). As an input for decoder it will specify target tensors as well as all output returned from encoder. For loss it will also specify target tensors and all output returned from decoder. Note that loss will only be built for mode == “train” or “eval”.
Parameters: - input_tensors (dict) –
input_tensors
dictionary that has to containsource_tensors
key with the list of all source tensors, andtarget_tensors
with the list of all target tensors. Note thattarget_tensors
only need to be provided if mode is “train” or “eval”. - gpu_id (int, optional) – id of the GPU where the current copy of the model is constructed. For Horovod this is always zero.
Returns: tuple containing loss tensor as returned from
loss.compute_loss()
and list of outputs tensors, which is taken fromdecoder.decode()['outputs']
. Whenmode == 'infer'
, loss will be None.Return type: tuple
- input_tensors (dict) –
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
finalize_inference
(results_per_batch, output_file)[source]¶ This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.
Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - output_file (str) – name of the output file that inference results should be saved to.
- results_per_batch (list) – aggregation of values returned from all calls
to
-
infer
(input_values, output_values)[source]¶ This method is analogous to
self.evaluate()
, but used in conjunction withself.finalize_inference()
to perform inference.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in
self.finalize_inference()
method).Return type: list
- input_values – evaluation of
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-
-
models.speech2text.
levenshtein
(a, b)[source]¶ Calculates the Levenshtein distance between a and b. The code was copied from: http://hetland.org/coding/python/levenshtein.py
text2text¶
-
class
models.text2text.
Text2Text
(params, mode='train', hvd=None)[source]¶ Bases:
models.encoder_decoder.EncoderDecoderModel
An example class implementing classical text-to-text model.
-
_get_num_objects_per_step
(worker_id=0)[source]¶ Returns number of source tokens + number of target tokens in batch.
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
finalize_inference
(results_per_batch, output_file)[source]¶ This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.
Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - output_file (str) – name of the output file that inference results should be saved to.
- results_per_batch (list) – aggregation of values returned from all calls
to
-
infer
(input_values, output_values)[source]¶ This method is analogous to
self.evaluate()
, but used in conjunction withself.finalize_inference()
to perform inference.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in
self.finalize_inference()
method).Return type: list
- input_values – evaluation of
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-
text2speech¶
-
class
models.text2speech.
Text2Speech
(params, mode='train', hvd=None)[source]¶ Bases:
models.encoder_decoder.EncoderDecoderModel
Text-to-speech data layer.
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None, samples_count=1)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
finalize_inference
(results_per_batch, output_file)[source]¶ This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.
Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - output_file (str) – name of the output file that inference results should be saved to.
- results_per_batch (list) – aggregation of values returned from all calls
to
-
get_alignments
(attention_mask)[source]¶ Get attention alignment plots.
Parameters: attention_mask – attention alignment. Returns: Specs and titles to plot.
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
infer
(input_values, output_values)[source]¶ This method is analogous to
self.evaluate()
, but used in conjunction withself.finalize_inference()
to perform inference.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in
self.finalize_inference()
method).Return type: list
- input_values – evaluation of
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-
print_logs
(mode, specs, titles, stop_token_pred, stop_target, audio_length, step, predicted_final_spec, predicted_mag_spec=None)[source]¶ Save audio files and plots.
Parameters: - mode – “train” or “eval”.
- specs – spectograms to plot.
- titles – spectogram titles.
- stop_token_pred – stop token prediction.
- stop_target – stop target.
- audio_length – length of the audio.
- step – current step.
- predicted_final_spec – predicted mel spectogram.
- predicted_mag_spec – predicted magnitude spectogram.
Returns: Dictionary to log.
-
-
models.text2speech.
griffin_lim
(magnitudes, n_iters=50, n_fft=1024)[source]¶ Griffin-Lim algorithm to convert magnitude spectrograms to audio signals
-
models.text2speech.
plot_spectrograms
(specs, titles, stop_token_pred, audio_length, logdir, train_step, stop_token_target=None, number=0, append=False, save_to_tensorboard=False)[source]¶ Helper function to create a image to be logged to disk or a tf.Summary to be logged to tensorboard.
Parameters: - specs (array) – array of images to show
- titles (array) – array of titles. Must match lengths of specs array
- stop_token_pred (np.array) – np.array of size [time, 1] containing the stop token predictions from the model.
- audio_length (int) – lenth of the predicted spectrogram
- logdir (str) – dir to save image file is save_to_tensorboard is disabled.
- train_step (int) – current training step
- stop_token_target (np.array) – np.array of size [time, 1] containing the stop token target.
- number (int) – Current sample number (used if evaluating more than 1 sample from a batch)
- append (str) – Optional string to append to file name eg. train, eval, infer
- save_to_tensorboard (bool) – If False, the created image is saved to the logdir as a png file. If True, the function returns a tf.Summary object containing the image and will be logged to the current tensorboard file.
Returns: tf.Summary or None
-
models.text2speech.
save_audio
(magnitudes, logdir, step, sampling_rate, n_fft=1024, mode='train', number=0, save_format='tensorboard', power=1.5, gl_iters=50, verbose=True, max_normalization=False)[source]¶ Helper function to create a wav file to be logged to disk or a tf.Summary to be logged to tensorboard.
Parameters: - magnitudes (np.array) – np.array of size [time, n_fft/2 + 1] containing the energy spectrogram.
- logdir (str) – dir to save image file is save_to_tensorboard is disabled.
- step (int) – current training step
- n_fft (int) – number of filters for fft and ifft.
- sampling_rate (int) – samplng rate in Hz of the audio to be saved.
- number (int) – Current sample number (used if evaluating more than 1 sample
- mode (str) – Optional string to append to file name eg. train, eval, infer from a batch)
- save_format – save_audio can either return the np.array containing the generated sound, log the wav file to the disk, or return a tensorboard summary object. Each method can be enabled by passing save_format as “np.array”, “tensorboard”, or “disk” respectively.
Returns: tf.Summary or None
text2speech_centaur¶
-
class
models.text2speech_centaur.
Text2SpeechCentaur
(params, mode='train', hvd=None)[source]¶ Bases:
models.text2speech.Text2Speech
Text-to-speech data layer for Centaur.
text2speech_tacotron¶
-
class
models.text2speech_tacotron.
Text2SpeechTacotron
(params, mode='train', hvd=None)[source]¶ Bases:
models.text2speech.Text2Speech
Text-to-speech data layer for Tacotron.
text2speech_wavenet¶
-
class
models.text2speech_wavenet.
Text2SpeechWavenet
(params, mode='train', hvd=None)[source]¶ Bases:
models.encoder_decoder.EncoderDecoderModel
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
finalize_inference
(results_per_batch, output_file)[source]¶ This method should be implemented if the model support inference mode. For example for speech-to-text and text-to-text models, this method will log the corresponding input-output pair to the output_file.
Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - output_file (str) – name of the output file that inference results should be saved to.
- results_per_batch (list) – aggregation of values returned from all calls
to
-
static
get_required_params
()[source]¶ Static method with description of required parameters.
Returns: Dictionary containing all the parameters that have to be included into the params
parameter of the class__init__()
method.Return type: dict
-
infer
(input_values, output_values)[source]¶ This method is analogous to
self.evaluate()
, but used in conjunction withself.finalize_inference()
to perform inference.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for inference finalization (e.g. this method can return final generated sequences for each batch which will then be saved to file in
self.finalize_inference()
method).Return type: list
- input_values – evaluation of
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-
image2label¶
-
class
models.image2label.
Image2Label
(params, mode='train', hvd=None)[source]¶ Bases:
models.encoder_decoder.EncoderDecoderModel
-
_get_num_objects_per_step
(worker_id=0)[source]¶ Returns number of images in current batch, i.e. batch size.
-
evaluate
(input_values, output_values)[source]¶ This method can be used in conjunction with
self.finalize_evaluation()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that this function is not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting this function can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer().input_tensors
concatenated across all workers. That is, input tensors for one batch combined from all GPUs. - output_values – evaluation of
self.get_output_tensors()
concatenated across all workers. That is, output tensors for one batch combined from all GPUs.
Returns: all necessary values for evaluation finalization (e.g. accuracy on current batch, which will then be averaged in finalization method).
Return type: list
- input_values – evaluation of
-
finalize_evaluation
(results_per_batch, training_step=None)[source]¶ This method can be used in conjunction with
self.evaluate()
to calculate evaluation metrics. For example, for speech-to-text models these methods can calculate word-error-rate on the validation data. For text-to-text models, these methods can compute BLEU score. Look at the corresponding derived classes for examples of this. These methods will be called everyeval_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors (using evaluation model). Theself.evaluate()
method is called on each batch data and it’s results will be collected and provided toself.finalize_evaluation()
for finalization. Note that these methods are not abstract and does not have to be implemented in derived classes. But if evaluation functionality is required, overwriting these methods can be a useful way to add it.Parameters: - results_per_batch (list) – aggregation of values returned from all calls
to
self.evaluate()
method (number of calls will be equal to number of evaluation batches). - training_step (int) – current training step. Will only be passed if mode is “train_eval”.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- results_per_batch (list) – aggregation of values returned from all calls
to
-
maybe_print_logs
(input_values, output_values, training_step)[source]¶ This method can be used to print logs that help to visualize training. For example, you can print sample input sequences and their corresponding predictions. This method will be called every
print_samples_steps
(config parameter) iterations and input/output values will be populated automatically by callingsess.run
on corresponding tensors. Note that this method is not abstract and does not have to be implemented in derived classes. But if additional printing functionality is required, overwriting this method can be a useful way to add it.Parameters: - input_values – evaluation of
self.get_data_layer(0).input_tensors
, that is, input tensors for one batch on the first GPU. - output_values – evaluation of
self.get_output_tensors(0)
, that is, output tensors for one batch on the first GPU. - training_step (int) – Current training step.
Returns: dictionary with values that need to be logged to TensorBoard (can be empty).
Return type: dict
- input_values – evaluation of
-