Using Existing Models

In this tutorial we will describe everything you can do with OpenSeq2Seq without writing any new code. We will cover the following topics: how to run one of the implemented models (for training, evaluation or inference), what parameters can be specified in the config file/command line and what are the different kinds of output that OpenSeq2Seq generates for you.

How to run models

The main script to run all models is Since it is a fairly simple Python script, you can probably understand how to use it by running --help which will display all available command line parameters and their short description. If that does not contain enough details, continue reading this section. Otherwise, you can safely skip to the next section, which describes config parameters.

There are 2 main parameters of that will be used most often: --config_file and --mode. The first one is a required parameter with path to the python configuration file (described in the next section). --mode parameter can be one of the “train”, “eval”, “train_eval” or “infer”. This will do what it says: run the model in the corresponding mode (with “train_eval” executing training with periodic evaluation). The other parameters of the script are the following:

  • --continue_learning — specify this when you want to continue learning from existing checkpoint. This parameter is only checked when --mode is “train” or “train_eval”.
  • --infer_output_file — this specifies the path to output of the inference. This parameter is only checked when --mode is “infer”.
  • --no_dir_check — this parameter disables log directory checking. By default, will be checking that the log directory (specified in the python config) is valid. Specifically, it will check that it exists when --mode equals “eval” or “infer” (or when --continue_learning is specified for training). If training is performed, but --continue_learning is not specified, the script will check that log directory is empty or does not exist, otherwise finishing with exception. Finally, whenever necessary it will check that the log directory contains a valid TensorFlow checkpoint of the saved model.
  • --benchmark — specifying this parameter will automatically prepare config for time benchmarking: disable all logging and evaluation. This parameter is only useful for training benchmarking, since in other cases no config preparation is needed. Moreover, specifying it will force the model to run in the “train” mode.
  • --bench_steps — number of steps to run the model for benchmarking. For now this can only be used in conjunction with --benchmark parameter and thus only works in the training benchmarking.
  • --bench_start — first step to start counting time for benchmarking. This parameter works in all modes whether or not --benchmark parameter was specified.
  • --debug_port — this enables TensorFlow debugging. To use it first run, e.g. tensorboard --logdir=. --debugger_port=6067 and while tensorboard is running execute with --debug_port=6067 attribute. After that tensorboard should have debugging tab.
  • --enable_logs — specifying this parameter will enable additional convenient log information to be saved. Namely, the script will save all output (both stdout and stderr), exact configuration file, git information (git commit hash and git diff) and exact command line parameters used to start the script. For all log files it will automatically append current time stamp so that subsequent runs do not overwrite any information. One important thing to note is that specifying this parameter will force the script to save all TensorFlow logs (tensorboard events, checkpoint, etc.) in the logs subfolder. Thus, if you want to restore the model that was saved with enable_logs specified you will need to either specify it again or move the model checkpoints from the logs directory into the base logdir folder (which is a config parameter).

Config parameters

The experiment parameters are completely defined in one Python configuration file. This file must define base_params dictionary and base_model class. base_model should be any class derived from Model. Currently it can be Speech2Text, Text2Text or Image2Label. Note that this parameter is not a string, but an actual Python class, so you will need to add corresponding imports in the configuration file. In addition to base_params and base_model you can define train_params, eval_params and infer_params dictionaries that will overwrite corresponding parts of base_params when the corresponding mode is used. For examples of configuration files look in the example_configs directory. The complete list of all possible configuration parameters is defined in the documentation in various places. A good place to look first is the Model.__init__() method (config parameters section), which defines most of the first level parameters:

Model.__init__(params, mode='train', hvd=None)[source]

Model constructor. The TensorFlow graph should not be created here, but rather in the self.compile() method.

  • params (dict) – parameters describing the model. All supported parameters are listed in get_required_params(), get_optional_params() functions.
  • mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
  • hvd (optional) – if Horovod is used, this should be horovod.tensorflow module. If Horovod is not used, it should be None.

Config parameters:

  • random_seed (int) — random seed to use.
  • use_horovod (bool) — whether to use Horovod for distributed execution.
  • num_gpus (int) — number of GPUs to use. This parameter cannot be used if gpu_ids is specified. When use_horovod is True this parameter is ignored.
  • gpu_ids (list of ints) — GPU ids to use. This parameter cannot be used if num_gpus is specified. When use_horovod is True this parameter is ignored.
  • batch_size_per_gpu (int) — batch size to use for each GPU.
  • eval_batch_size_per_gpu (int) — batch size to use for each GPU during inference. This is for when training and inference have different computation and memory requirements, such as when training uses sampled softmax and inference uses full softmax. If not specified, it’s set to batch_size_per_gpu.
  • restore_best_checkpoint (bool) — if set to True, when doing evaluation and inference, the model will load the best checkpoint instead of the latest checkpoint. Best checkpoint is evaluated based on evaluation results, so it’s only available when the model is trained untder train_eval mode. Default to False.
  • load_model (str) — points to the location of the pretrained model for transfer learning. If specified, during training, the system will look into the checkpoint in this folder and restore all variables whose names and shapes match a variable in the new model.
  • num_epochs (int) — number of epochs to run training for. This parameter cannot be used if max_steps is specified.
  • max_steps (int) — number of steps to run training for. This parameter cannot be used if num_epochs is specified.
  • save_summaries_steps (int or None) — how often to save summaries. Setting it to None disables summaries saving.
  • print_loss_steps (int or None) — how often to print loss during training. Setting it to None disables loss printing.
  • print_samples_steps (int or None) — how often to print training samples (input sequences, correct answers and model predictions). Setting it to None disables samples printing.
  • print_bench_info_steps (int or None) — how often to print training benchmarking information (average number of objects processed per step). Setting it to None disables intermediate benchmarking printing, but the average information across the whole training will always be printed after the last iteration.
  • save_checkpoint_steps (int or None) — how often to save model checkpoints. Setting it to None disables checkpoint saving.
  • num_checkpoints (int) — number of last checkpoints to keep.
  • eval_steps (int) — how often to run evaluation during training. This parameter is only checked if --mode argument of is “train_eval”. If no evaluation is needed you should use “train” mode.
  • logdir (string) — path to the log directory where all checkpoints and summaries will be saved.
  • data_layer (any class derived from DataLayer) — data layer class to use.
  • data_layer_params (dict) — dictionary with data layer configuration. For complete list of possible parameters see the corresponding class docs.
  • optimizer (string or TensorFlow optimizer class) — optimizer to use for training. Could be either “Adam”, “Adagrad”, “Ftrl”, “Momentum”, “RMSProp”, “SGD” or any valid TensorFlow optimizer class.
  • optimizer_params (dict) — dictionary that will be passed to optimizer __init__ method.
  • initializer — any valid TensorFlow initializer.
  • initializer_params (dict) — dictionary that will be passed to initializer __init__ method.
  • freeze_variables_regex (str or None) — if zero or more characters at the beginning of the name of a trainable variable match this pattern, then this variable will be frozen during training. Setting it to None disables freezing of variables.
  • regularizer — and valid TensorFlow regularizer.
  • regularizer_params (dict) — dictionary that will be passed to regularizer __init__ method.
  • dtype — model dtype. Could be either tf.float16, tf.float32 or “mixed”. For details see mixed precision training section in docs.
  • lr_policy — any valid learning rate policy function. For examples, see optimizers.lr_policies module.
  • lr_policy_params (dict) — dictionary containing lr_policy parameters.
  • max_grad_norm (float) — maximum value of gradient norm. Clipping will be performed if some gradients exceed this value (this is checked for each variable independently).
  • loss_scaling — could be float or string. If float, static loss scaling is applied. If string, the corresponding automatic loss scaling algorithm is used. Must be one of ‘Backoff’ of ‘LogMax’ (case insensitive). Only used when dtype=”mixed”. For details see mixed precision training section in docs.
  • loss_scaling_params (dict) — dictionary containing loss scaling parameters.
  • summaries (list) — which summaries to log. Could contain “learning_rate”, “gradients”, “gradient_norm”, “global_gradient_norm”, “variables”, “variable_norm”, “loss_scale”.
  • iter_size (int) — use this parameter to emulate large batches. The gradients will be accumulated for iter_size number of steps before applying update.
  • larc_params — dictionary with parameters for LARC (or LARS) optimization algorithms. Can contain the following parameters:
    • larc_mode — Could be either “scale” (LARS) or “clip” (LARC). Note that it works in addition to any other optimization algorithm since we treat it as adaptive gradient clipping and learning rate adjustment.
    • larc_eta (float) — LARC or LARS scaling parameter.
    • min_update (float) — minimal value of the LARC (LARS) update.
    • epsilon (float) — small number added to gradient norm in denominator for numerical stability.

Note that some of the parameters are also config dictionaries for corresponding classes. To see list of their configuration options, you should proceed to the corresponding class docs. For example, to see all supported data layer parameters, look into the docs for data.data_layer.DataLayer. Sometimes, derived classes might define their additional parameters, in that case you should be looking into both, parent class and its child. For example, look into models.encoder_decoder.EncoderDecoderModel, which defines parameters specific for models that can be expressed as encoder-decoder-loss blocks. You can also have a look at encoders.encoder.Encoder (which defines some parameters shared across all encoders) and encoders.ds2_encoder.DeepSpeech2Encoder (which additionally defines a set of DeepSpeech-2 specific parameters).


For convenience all string or numerical config parameters can be overwritten by command line arguments. To overwrite parameters of the nested dictionaries, separate the dictionary and parameter name with “/”. For example, try to specify --logdir argument or --lr_policy_params/learning_rate in your execution.