Installation Instructions¶
Pre-built docker container¶
The recommended way to install OpenSeq2Seq is to use NVIDIA TensorFlow Docker container.
Install CUDA 10 from https://developer.nvidia.com/cuda-downloads
Install Docker ( see https://docs.docker.com/install/linux/docker-ce/ubuntu/#prerequisites )
use version compatible with nvidia-docker, e.g.:
sudo apt-get install docker-ce=5:18.09.1~3-0~ubuntu-xenial
Verify the installation:
sudo docker container run hello-world
Add yourself to docker group:
sudo usermod -a -G docker $USER
logout after that
Install nvidia-docker2 ( see documentation ):
sudo apt-get install nvidia-docker2 sudo pkill -SIGHUP dockerd
Pull latest NVIDIA TensorFlow container from NVIDIA GPU Cloud
see https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html:
docker pull nvcr.io/nvidia/tensorflow:19.05-py3
Run contrainer:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm nvcr.io/nvidia/tensorflow:19.05-py3
Pull OpenSeq2Seq from GitHub inside the container:
git clone https://github.com/NVIDIA/OpenSeq2Seq
General installation¶
If you are feeling adventurous, then feel free to try these instructions.
OpenSeq2Seq supports Python >= 3.5. We recommend to use Anaconda Python distribution.
Note
Currently, TensorFlow 1.x doesn’t support Python 3.7. Please make sure that your Anaconda environment includes Python version which is compatible with TensorFlow. For example, you can download Anaconda with Python 3.6 for Linux:
wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
Clone OpenSeq2Seq and install Python requirements:
git clone https://github.com/NVIDIA/OpenSeq2Seq
cd OpenSeq2Seq
pip install -r requirements.txt
If you would like to get higher speech recognition accuracy with custom CTC beam search decoder, you have to build TensorFlow from sources as described in the Installation for speech recognition. Otherwise you can just install TensorFlow using pip:
pip install tensorflow-gpu
Installation of OpenSeq2Seq for speech recognition¶
CTC-based speech recognition models can use the following decoders to get a transcription out of a model’s state:
- greedy decoder, the fastest, but might yield spelling errors (can be enabled with
"use_language_model": False
)- beam search decoder with language model (LM) rescoring, the most accurate, but the slowest
You can find more information about these decoders at Decoders section.
CTC beam search decoder with language model rescoring is an optional component and might be used for speech recognition inference only.
There are two implementations of CTC beam search decoder with LM rescoring in OpenSeq2Seq:
- Baidu CTC decoder (the recommended). It can be installed with
scripts/install_decoders.sh
command. To test the installation please runpython scripts/ctc_decoders_test.py
.- Custom native TF op (rather deprecated). See installation instructions below.
How to build a custom native TF op for CTC decoder with language model (optional)¶
First of all, make sure that you installed CUDA >= 10.0, cuDNN >= 7.4, NCCL >= 2.3.
Install boost:
sudo apt-get install libboost-all-dev
Build kenlm (assuming you are in the OpenSeq2Seq folder):
sudo apt-get install cmake ./scripts/install_kenlm.sh
It will install KenLM in OpenSeq2Seq directory. If you installed KenLM in a different location, you will need to set the corresponding symlink:
cd OpenSeq2Seq/ctc_decoder_with_lm ln -s <kenlm location> kenlm cd ..
Download and build the latest stable 1.x TensorFlow (make sure that you have Bazel >= 0.15):
git clone https://github.com/tensorflow/tensorflow -b r1.13.1 cd tensorflow ./configure ln -s <OpenSeq2Seq location>/ctc_decoder_with_lm ./tensorflow/core/user_ops/ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --copt=-O3 --config=cuda //tensorflow/tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg pip install /tmp/tensorflow_pkg/<your tensorflow build>.whl
Or you can always check the latest TensorFlow installation instructions for TensorFlow installation, and then run the following commands in order to build the custom CTC decoder (assuming you are in tensorflow directory):
ln -s <OpenSeq2Seq location>/ctc_decoder_with_lm ./tensorflow/core/user_ops/ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --copt=-O3 //tensorflow/core/user_ops/ctc_decoder_with_lm:libctc_decoder_with_kenlm.so //tensorflow/core/user_ops/ctc_decoder_with_lm:generate_trie cp bazel-bin/tensorflow/core/user_ops/ctc_decoder_with_lm/*.so tensorflow/core/user_ops/ctc_decoder_with_lm/ cp bazel-bin/tensorflow/core/user_ops/ctc_decoder_with_lm/generate_trie tensorflow/core/user_ops/ctc_decoder_with_lm/
Please add
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
tobazel build ...
if you are using GCC 5 and later.Validate TensorFlow installation:
python -c "import tensorflow as tf; print(tf.__version__)"
How to download a language model for a CTC decoder (optional)¶
In order to achieve the best accuracy, you should download the language
model from OpenSLR using download_lm.sh
script
(might take some time):
./scripts/download_lm.sh
After that you should be able to run toy speech example with enabled CTC beam search decoder:
python run.py --config_file=example_configs/speech2text/ds2_toy_config.py --mode=train_eval
Horovod installation¶
For multi-GPU and distribuited training we recommended install Horovod .
After TensorFlow and all other requirements are installed, install mpi:
pip install mpi4py
and then follow
these steps to install
Horovod.
Running tests¶
In order to check that everything is installed correctly it is recommended to run unittests:
bash scripts/run_all_tests.sh
It might take up to 30 minutes. You should see a lot of output, but no errors in the end.
Training¶
To train without Horovod:
python run.py --config_file=... --mode=train_eval --enable_logs
When training with Horovod, use the following commands (don’t forget to substitute valid config_file path there and number of GPUs)
mpiexec --allow-run-as-root -np <num_gpus> python run.py --config_file=... --mode=train_eval --use_horovod=True --enable_logs