Speech Data Processor#

Speech Data Processor (SDP) is a toolkit to make it easy to:

Write code to process a new dataset, minimizing the amount of boilerplate code required.
Share the steps for processing a speech dataset.

SDP is hosted here: NVIDIA/NeMo-speech-data-processor. It’s mainly used to prepare datasets for NeMo toolkit.

SDP’s philosophy is to represent processing operations as ‘processor’ classes, which take in a path to a NeMo-style data manifest as input (or a path to the raw data directory if you do not have a NeMo-style manifest to start with), apply some processing to it, and then save the output manifest file.

You specify which processors you want to run using a YAML config file. Many common processing operations are provided, and it is easy to add your own.

Overview diagram of Speech Data Processor

To learn more about SDP, have a look at the following sections.

How to write config files?
How to add a new processor?
Supported datasets
API