Using Existing Models¶
In this tutorial we will describe everything you can do with OpenSeq2Seq without writing any new code. We will cover the following topics: how to run one of the implemented models (for training, evaluation or inference), what parameters can be specified in the config file/command line and what are the different kinds of output that OpenSeq2Seq generates for you.
How to run models¶
The main script to run all models is run.py
. Since it is a fairly simple
Python script, you can probably understand
how to use it by running run.py --help
which will display all available
command line parameters and their short description. If that does not contain
enough details, continue reading this section. Otherwise, you can safely skip
to the next section, which describes config parameters.
There are 2 main parameters of run.py
that will be
used most often: --config_file
and --mode
. The first one is a required
parameter with path to the python configuration file (described in the
next section). --mode
parameter can be one of the
“train”, “eval”, “train_eval” or “infer”. This will do what it says: run
the model in the corresponding mode (with “train_eval” executing training
with periodic evaluation).
The other parameters of the run.py
script are the following:
--continue_learning
— specify this when you want to continue learning from existing checkpoint. This parameter is only checked when--mode
is “train” or “train_eval”.--infer_output_file
— this specifies the path to output of the inference. This parameter is only checked when--mode
is “infer”.--no_dir_check
— this parameter disables log directory checking. By default,run.py
will be checking that the log directory (specified in the python config) is valid. Specifically, it will check that it exists when--mode
equals “eval” or “infer” (or when--continue_learning
is specified for training). If training is performed, but--continue_learning
is not specified, the script will check that log directory is empty or does not exist, otherwise finishing with exception. Finally, whenever necessary it will check that the log directory contains a valid TensorFlow checkpoint of the saved model.--benchmark
— specifying this parameter will automatically prepare config for time benchmarking: disable all logging and evaluation. This parameter is only useful for training benchmarking, since in other cases no config preparation is needed. Moreover, specifying it will force the model to run in the “train” mode.--bench_steps
— number of steps to run the model for benchmarking. For now this can only be used in conjunction with--benchmark
parameter and thus only works in the training benchmarking.--bench_start
— first step to start counting time for benchmarking. This parameter works in all modes whether or not--benchmark
parameter was specified.--debug_port
— this enables TensorFlow debugging. To use it first run, e.g.tensorboard --logdir=. --debugger_port=6067
and while tensorboard is running executerun.py
with--debug_port=6067
attribute. After that tensorboard should have debugging tab.--enable_logs
— specifying this parameter will enable additional convenient log information to be saved. Namely, the script will save all output (both stdout and stderr), exact configuration file, git information (git commit hash and git diff) and exact command line parameters used to start the script. For all log files it will automatically append current time stamp so that subsequent runs do not overwrite any information. One important thing to note is that specifying this parameter will force the script to save all TensorFlow logs (tensorboard events, checkpoint, etc.) in thelogs
subfolder. Thus, if you want to restore the model that was saved withenable_logs
specified you will need to either specify it again or move the model checkpoints from thelogs
directory into the baselogdir
folder (which is a config parameter).
Config parameters¶
The experiment parameters are completely defined in one Python configuration
file. This file must define base_params
dictionary and base_model
class.
base_model
should be any class derived from
Model
. Currently it can be
Speech2Text
,
Text2Text
or
Image2Label
.
Note that this parameter is not a string, but an actual Python class, so you
will need to add corresponding imports in the configuration file. In addition
to base_params
and base_model
you can define
train_params
, eval_params
and infer_params
dictionaries that will
overwrite corresponding parts of base_params
when the corresponding mode
is used. For examples of configuration files look in the example_configs
directory. The complete list of all possible configuration parameters is
defined in the documentation in various places. A good place to look first is
the Model.__init__()
method
(config parameters section), which defines most of the first level parameters:
-
Model.
__init__
(params, mode='train', hvd=None)[source] Model constructor. The TensorFlow graph should not be created here, but rather in the
self.compile()
method.Parameters: - params (dict) – parameters describing the model.
All supported parameters are listed in
get_required_params()
,get_optional_params()
functions. - mode (string, optional) – “train”, “eval” or “infer”. If mode is “train” all parts of the graph will be built (model, loss, optimizer). If mode is “eval”, only model and loss will be built. If mode is “infer”, only model will be built.
- hvd (optional) – if Horovod is used, this should be
horovod.tensorflow
module. If Horovod is not used, it should be None.
Config parameters:
- random_seed (int) — random seed to use.
- use_horovod (bool) — whether to use Horovod for distributed execution.
- num_gpus (int) — number of GPUs to use. This parameter cannot be
used if
gpu_ids
is specified. Whenuse_horovod
is True this parameter is ignored. - gpu_ids (list of ints) — GPU ids to use. This parameter cannot be
used if
num_gpus
is specified. Whenuse_horovod
is True this parameter is ignored. - batch_size_per_gpu (int) — batch size to use for each GPU.
- eval_batch_size_per_gpu (int) — batch size to use for each GPU during
inference. This is for when training and inference have different computation
and memory requirements, such as when training uses sampled softmax and
inference uses full softmax. If not specified, it’s set
to
batch_size_per_gpu
. - restore_best_checkpoint (bool) — if set to True, when doing evaluation
and inference, the model will load the best checkpoint instead of the latest
checkpoint. Best checkpoint is evaluated based on evaluation results, so
it’s only available when the model is trained untder
train_eval
mode. Default to False. - load_model (str) — points to the location of the pretrained model for transfer learning. If specified, during training, the system will look into the checkpoint in this folder and restore all variables whose names and shapes match a variable in the new model.
- num_epochs (int) — number of epochs to run training for.
This parameter cannot be used if
max_steps
is specified. - max_steps (int) — number of steps to run training for.
This parameter cannot be used if
num_epochs
is specified. - save_summaries_steps (int or None) — how often to save summaries. Setting it to None disables summaries saving.
- print_loss_steps (int or None) — how often to print loss during training. Setting it to None disables loss printing.
- print_samples_steps (int or None) — how often to print training samples (input sequences, correct answers and model predictions). Setting it to None disables samples printing.
- print_bench_info_steps (int or None) — how often to print training benchmarking information (average number of objects processed per step). Setting it to None disables intermediate benchmarking printing, but the average information across the whole training will always be printed after the last iteration.
- save_checkpoint_steps (int or None) — how often to save model checkpoints. Setting it to None disables checkpoint saving.
- num_checkpoints (int) — number of last checkpoints to keep.
- eval_steps (int) — how often to run evaluation during training.
This parameter is only checked if
--mode
argument ofrun.py
is “train_eval”. If no evaluation is needed you should use “train” mode. - logdir (string) — path to the log directory where all checkpoints and summaries will be saved.
- data_layer (any class derived from
DataLayer
) — data layer class to use. - data_layer_params (dict) — dictionary with data layer configuration. For complete list of possible parameters see the corresponding class docs.
- optimizer (string or TensorFlow optimizer class) — optimizer to use for training. Could be either “Adam”, “Adagrad”, “Ftrl”, “Momentum”, “RMSProp”, “SGD” or any valid TensorFlow optimizer class.
- optimizer_params (dict) — dictionary that will be passed to
optimizer
__init__
method. - initializer — any valid TensorFlow initializer.
- initializer_params (dict) — dictionary that will be passed to
initializer
__init__
method. - freeze_variables_regex (str or None) — if zero or more characters at the beginning of the name of a trainable variable match this pattern, then this variable will be frozen during training. Setting it to None disables freezing of variables.
- regularizer — and valid TensorFlow regularizer.
- regularizer_params (dict) — dictionary that will be passed to
regularizer
__init__
method. - dtype — model dtype. Could be either
tf.float16
,tf.float32
or “mixed”. For details see mixed precision training section in docs. - lr_policy — any valid learning rate policy function. For examples,
see
optimizers.lr_policies
module. - lr_policy_params (dict) — dictionary containing lr_policy parameters.
- max_grad_norm (float) — maximum value of gradient norm. Clipping will be performed if some gradients exceed this value (this is checked for each variable independently).
- loss_scaling — could be float or string. If float, static loss scaling is applied. If string, the corresponding automatic loss scaling algorithm is used. Must be one of ‘Backoff’ of ‘LogMax’ (case insensitive). Only used when dtype=”mixed”. For details see mixed precision training section in docs.
- loss_scaling_params (dict) — dictionary containing loss scaling parameters.
- summaries (list) — which summaries to log. Could contain “learning_rate”, “gradients”, “gradient_norm”, “global_gradient_norm”, “variables”, “variable_norm”, “loss_scale”.
- iter_size (int) — use this parameter to emulate large batches.
The gradients will be accumulated for
iter_size
number of steps before applying update. - larc_params — dictionary with parameters for LARC (or LARS)
optimization algorithms. Can contain the following parameters:
- larc_mode — Could be either “scale” (LARS) or “clip” (LARC). Note that it works in addition to any other optimization algorithm since we treat it as adaptive gradient clipping and learning rate adjustment.
- larc_eta (float) — LARC or LARS scaling parameter.
- min_update (float) — minimal value of the LARC (LARS) update.
- epsilon (float) — small number added to gradient norm in denominator for numerical stability.
- params (dict) – parameters describing the model.
All supported parameters are listed in
Note that some of the parameters are also config dictionaries for corresponding
classes. To see list of their configuration options, you should proceed to the
corresponding class docs. For example, to see all supported data layer parameters,
look into the docs for data.data_layer.DataLayer
. Sometimes, derived classes
might define their additional parameters, in that case you should be looking
into both, parent class and its child. For example, look into
models.encoder_decoder.EncoderDecoderModel
, which defines parameters
specific for models that can be expressed as encoder-decoder-loss blocks.
You can also have a look at
encoders.encoder.Encoder
(which defines some parameters shared across
all encoders) and encoders.ds2_encoder.DeepSpeech2Encoder
(which
additionally defines a set of DeepSpeech-2 specific parameters).
Note
For convenience all string or numerical config parameters can be overwritten
by command line arguments. To overwrite parameters of the nested
dictionaries, separate the dictionary and parameter name with “/”.
For example, try to specify --logdir
argument or
--lr_policy_params/learning_rate
in your run.py
execution.