Using Existing Models¶
In this tutorial we will describe everything you can do with OpenSeq2Seq without writing any new code. We will cover the following topics: how to run one of the implemented models (for training, evaluation or inference), what parameters can be specified in the config file/command line and what are the different kinds of output that OpenSeq2Seq generates for you.
How to run models¶
The main script to run all models is
run.py. Since it is a fairly simple
Python script, you can probably understand
how to use it by running
run.py --help which will display all available
command line parameters and their short description. If that does not contain
enough details, continue reading this section. Otherwise, you can safely skip
to the next section, which describes config parameters.
There are 2 main parameters of
run.py that will be
used most often:
--mode. The first one is a required
parameter with path to the python configuration file (described in the
--mode parameter can be one of the
“train”, “eval”, “train_eval” or “infer”. This will do what it says: run
the model in the corresponding mode (with “train_eval” executing training
with periodic evaluation).
The other parameters of the
run.py script are the following:
--continue_learning— specify this when you want to continue learning from existing checkpoint. This parameter is only checked when
--modeis “train” or “train_eval”.
--infer_output_file— this specifies the path to output of the inference. This parameter is only checked when
--no_dir_check— this parameter disables log directory checking. By default,
run.pywill be checking that the log directory (specified in the python config) is valid. Specifically, it will check that it exists when
--modeequals “eval” or “infer” (or when
--continue_learningis specified for training). If training is performed, but
--continue_learningis not specified, the script will check that log directory is empty or does not exist, otherwise finishing with exception. Finally, whenever necessary it will check that the log directory contains a valid TensorFlow checkpoint of the saved model.
--benchmark— specifying this parameter will automatically prepare config for time benchmarking: disable all logging and evaluation. This parameter is only useful for training benchmarking, since in other cases no config preparation is needed. Moreover, specifying it will force the model to run in the “train” mode.
--bench_steps— number of steps to run the model for benchmarking. For now this can only be used in conjunction with
--benchmarkparameter and thus only works in the training benchmarking.
--bench_start— first step to start counting time for benchmarking. This parameter works in all modes whether or not
--benchmarkparameter was specified.
--debug_port— this enables TensorFlow debugging. To use it first run, e.g.
tensorboard --logdir=. --debugger_port=6067and while tensorboard is running execute
--debug_port=6067attribute. After that tensorboard should have debugging tab.
--enable_logs— specifying this parameter will enable additional convenient log information to be saved. Namely, the script will save all output (both stdout and stderr), exact configuration file, git information (git commit hash and git diff) and exact command line parameters used to start the script. For all log files it will automatically append current time stamp so that subsequent runs do not overwrite any information. One important thing to note is that specifying this parameter will force the script to save all TensorFlow logs (tensorboard events, checkpoint, etc.) in the
logssubfolder. Thus, if you want to restore the model that was saved with
enable_logsspecified you will need to either specify it again or move the model checkpoints from the
logsdirectory into the base
logdirfolder (which is a config parameter).
The experiment parameters are completely defined in one Python configuration
file. This file must define
base_params dictionary and
base_model should be any class derived from
Model. Currently it can be
Note that this parameter is not a string, but an actual Python class, so you
will need to add corresponding imports in the configuration file. In addition
base_model you can define
infer_params dictionaries that will
overwrite corresponding parts of
base_params when the corresponding mode
is used. For examples of configuration files look in the
directory. The complete list of all possible configuration parameters is
defined in the documentation in various places. A good place to look first is
(config parameters section), which defines most of the first level parameters:
__init__(params, mode='train', hvd=None)
Model constructor. The TensorFlow graph should not be created here, but rather in the
- params (dict) – parameters describing the model.
All supported parameters are listed in
- mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
- hvd (optional) – if Horovod is used, this should be
horovod.tensorflowmodule. If Horovod is not used, it should be None.
- random_seed (int) — random seed to use.
- use_horovod (bool) — whether to use Horovod for distributed execution.
- num_gpus (int) — number of GPUs to use. This parameter cannot be
gpu_idsis specified. When
use_horovodis True this parameter is ignored.
- gpu_ids (list of ints) — GPU ids to use. This parameter cannot be
num_gpusis specified. When
use_horovodis True this parameter is ignored.
- batch_size_per_gpu (int) — batch size to use for each GPU.
- eval_batch_size_per_gpu (int) — batch size to use for each GPU during
inference. This is for when training and inference have different computation
and memory requirements, such as when training uses sampled softmax and
inference uses full softmax. If not specified, it’s set
- restore_best_checkpoint (bool) — if set to True, when doing evaluation
and inference, the model will load the best checkpoint instead of the latest
checkpoint. Best checkpoint is evaluated based on evaluation results, so
it’s only available when the model is trained untder
train_evalmode. Default to False.
- load_model (str) — points to the location of the pretrained model for transfer learning. If specified, during training, the system will look into the checkpoint in this folder and restore all variables whose names and shapes match a variable in the new model.
- num_epochs (int) — number of epochs to run training for.
This parameter cannot be used if
- max_steps (int) — number of steps to run training for.
This parameter cannot be used if
- save_summaries_steps (int or None) — how often to save summaries. Setting it to None disables summaries saving.
- print_loss_steps (int or None) — how often to print loss during training. Setting it to None disables loss printing.
- print_samples_steps (int or None) — how often to print training samples (input sequences, correct answers and model predictions). Setting it to None disables samples printing.
- print_bench_info_steps (int or None) — how often to print training benchmarking information (average number of objects processed per step). Setting it to None disables intermediate benchmarking printing, but the average information across the whole training will always be printed after the last iteration.
- save_checkpoint_steps (int or None) — how often to save model checkpoints. Setting it to None disables checkpoint saving.
- num_checkpoints (int) — number of last checkpoints to keep.
- eval_steps (int) — how often to run evaluation during training.
This parameter is only checked if
run.pyis “train_eval”. If no evaluation is needed you should use “train” mode.
- logdir (string) — path to the log directory where all checkpoints and summaries will be saved.
- data_layer (any class derived from
DataLayer) — data layer class to use.
- data_layer_params (dict) — dictionary with data layer configuration. For complete list of possible parameters see the corresponding class docs.
- optimizer (string or TensorFlow optimizer class) — optimizer to use for training. Could be either “Adam”, “Adagrad”, “Ftrl”, “Momentum”, “RMSProp”, “SGD” or any valid TensorFlow optimizer class.
- optimizer_params (dict) — dictionary that will be passed to
- initializer — any valid TensorFlow initializer.
- initializer_params (dict) — dictionary that will be passed to
- freeze_variables_regex (str or None) — if zero or more characters at the beginning of the name of a trainable variable match this pattern, then this variable will be frozen during training. Setting it to None disables freezing of variables.
- regularizer — and valid TensorFlow regularizer.
- regularizer_params (dict) — dictionary that will be passed to
- dtype — model dtype. Could be either
tf.float32or “mixed”. For details see mixed precision training section in docs.
- lr_policy — any valid learning rate policy function. For examples,
- lr_policy_params (dict) — dictionary containing lr_policy parameters.
- max_grad_norm (float) — maximum value of gradient norm. Clipping will be performed if some gradients exceed this value (this is checked for each variable independently).
- loss_scaling — could be float or string. If float, static loss scaling is applied. If string, the corresponding automatic loss scaling algorithm is used. Must be one of ‘Backoff’ of ‘LogMax’ (case insensitive). Only used when dtype=”mixed”. For details see mixed precision training section in docs.
- loss_scaling_params (dict) — dictionary containing loss scaling parameters.
- summaries (list) — which summaries to log. Could contain “learning_rate”, “gradients”, “gradient_norm”, “global_gradient_norm”, “variables”, “variable_norm”, “loss_scale”.
- iter_size (int) — use this parameter to emulate large batches.
The gradients will be accumulated for
iter_sizenumber of steps before applying update.
- larc_params — dictionary with parameters for LARC (or LARS)
optimization algorithms. Can contain the following parameters:
- larc_mode — Could be either “scale” (LARS) or “clip” (LARC). Note that it works in addition to any other optimization algorithm since we treat it as adaptive gradient clipping and learning rate adjustment.
- larc_eta (float) — LARC or LARS scaling parameter.
- min_update (float) — minimal value of the LARC (LARS) update.
- epsilon (float) — small number added to gradient norm in denominator for numerical stability.
- params (dict) – parameters describing the model. All supported parameters are listed in
Note that some of the parameters are also config dictionaries for corresponding
classes. To see list of their configuration options, you should proceed to the
corresponding class docs. For example, to see all supported data layer parameters,
look into the docs for
data.data_layer.DataLayer. Sometimes, derived classes
might define their additional parameters, in that case you should be looking
into both, parent class and its child. For example, look into
models.encoder_decoder.EncoderDecoderModel, which defines parameters
specific for models that can be expressed as encoder-decoder-loss blocks.
You can also have a look at
encoders.encoder.Encoder (which defines some parameters shared across
all encoders) and
additionally defines a set of DeepSpeech-2 specific parameters).
For convenience all string or numerical config parameters can be overwritten
by command line arguments. To overwrite parameters of the nested
dictionaries, separate the dictionary and parameter name with “/”.
For example, try to specify
--logdir argument or
--lr_policy_params/learning_rate in your