FLEURS

FLEURS#

This config can be used to prepare FLEURS dataset in the NeMo format. It produces manifest for dev split of armenian language. This config performs the following data processing.

  1. Downloads FLEURS data

  2. Calculates the length of wav files

Required arguments.

  • workspace_dir: specify the workspace folder where all audio files will be stored.

Note that you can customize any part of this config either directly or from command-line.

Output format

This config generates output manifest files:

  • ${workspace_dir}/${final_manifest} - dev subset of the data.

Output manifest contains the following keys:

  • audio_filepath (str): relative path to the audio files.

  • text (str): transcription (lower-case without punctuation).

  • duration (float): audio duration in seconds.

Config link: dataset_configs/armenian/fleurs/config.yaml