Multi-GPU and Distributed Training

OpenSeq2Seq supports two modes for parallel training: simple multi-tower approach and Horovod-based approach.

Standard Tensorflow distributed training

For multi-GPU training with native Distributed Tensorflow approach , you need to set use_horovod: False and num_gpus= in the configuration file. To start training use script:

python --config_file=... --mode=train_eval


To use Horovod you will need to set use_horovod: True in the config and use mpirun:

mpiexec -np <num_gpus> python --config_file=... --mode=train_eval --use_horovod=True --enable_logs

You can use Horovod both for multi-GPU and for multi-node training.


num_gpus parameter will be ignored when use_horovod is set to True. In that case the number of GPUs to use is specified in the command line with mpirun arguments.