Text-To-Speech ================== How to train the model on `LJSpeech `_ dataset ---------------------------------------------------------------------------------------- First, you need to download the dataset. The dataset consists of metadata.csv and a directory of wav files. metadata-csv lists all the wavs filename and their corresponding transcripts delimited by the '|' character. In order to train the model, a vocab file must be specified. The vocab file should contain all the characters present in the dataset plus a special end of sentence token '~'. The vocab file should have one character per line. An example vocab file is present inside the openseq2seq/test_utils folder called "vocab_tts.txt". Inside the configuration files, be sure to change ``vocab_file``, and ``dataset_location`` to point to the location of the vocab file and the directory containing the wav files. The example configuration files assume that the the dataset is split into train, val, and test sets. you would have to split metadata.csv into three separate csvs on your own called train.csv, val.csv, and test.csv. You can train the model via:: python run.py --config_file=example_configs/text2speech/tacotron_float.py --mode=train_eval If you do not want to split the dataset and want to train the model using the entire dataset, change ``dataset_files`` inside train_params to point to metadata.csv and run:: python run.py --config_file=example_configs/text2speech/tacotron_float.py --mode=train If you want to run evaluation/inference with the trained model, replace ``--mode=train_eval`` with ``--mode=eval`` or ``--mode=infer``. For inference you will need to provide additional ``--infer_output_file`` argument. However this argument is ignored to text to speech. The generated audio filew will be logged to the logdir.