Audio books (Armenian)#

This config can be used as example to process audiobooks in Armenian language and prepare dataset in the NeMo format.

This config performs the following data processing.

Create initial manifest by collecting all avalible files with mp3 extension in raw_data_dir folder.
Convert mp3 into wav format using the Ffmpeg suite, with a downsampling to a 16000 Hz sample rate and a unification of all audio channels into a mono track.
Count duration for audio files in seconds and save it into duration field.
Filter out broken files with duration shorter than 0 seconds. You can directly change the config file to control this.
Predict transcription using a openai/whisper-large-v3 ASR model and save results into pred_text field.
Drops everything with non-Armenian characters.
Normalise some text examples with SubRegex.

Required arguments:

workspace_dir: specify the workspace folder where all audio files will be stored.

Note that you can customize any part of this config either directly or from command-line.

Output format:

${workspace_dir}/final_manifest.json - final_manifest manifest with all the data.

Output manifest contain the following fields:

audio_filepath (str): relative path to the audio files.
text (str): transcription predicted by Whisper (Upper-case with punctuation).
duration (float): audio duration in seconds.