Speech Recognition¶
First, make sure you followed the Speech installation instructions .
After that you should be able to run toy speech example with no errors:
python run.py --config_file=example_configs/speech2text/ds2_toy_config.py --mode=train_eval
How to train the model on LibriSpeech dataset¶
First, you need to download and preprocess the LibriSpeech dataset. Assuming you are in the base folder, run:
sudo apt-get -y install sox libsox-dev
mkdir -p data
python scripts/import_librivox.py data/librispeech
Note, that this will take a lot of time, since it needs to download, extract and convert around 55GB of audio files. The final dataset size will be around 224GB (including archives and original compressed audio files, feel free to delete them to get 106GB).
Now, everything should be setup to train the model:
python run.py --config_file=example_configs/speech2text/ds2_small_1gpu.py --mode=train_eval
If you want to run evaluation/inference with the trained model, replace
--mode=train_eval
with --mode=eval
or --mode=infer
.
For inference you will need to provide additional
--infer_output_file=<output file>
argument.
How to build your own language model¶
Language models usually help speech2text decoder to correct misspellings in recognized utterances.
FullyConnectedCTCDecoder
uses N-gram KenLM based models.
In order to build a language model, please use build_lm.py
script.
For example, run the following commands for LibriSpeech:
export LS_DIR=/data/speech/LibriSpeech/
python scripts/build_lm.py --n 5 $LS_DIR/librivox-train-clean-100.csv $LS_DIR/librivox-train-clean-360.csv librivox-train-other-500.csv
You will get as a result two files: the binary language model librivox-train-clean-100.binary
and its trie librivox-train-clean-100.trie
.