Input – Passing Data to the Pipeline

Input Callables & Iterables

The input data for the pipeline is provided by a callable or iterable class, i.e. an object which implements the IterableBase or CallableBase interface. These classes are expected to provide the data & format blueprint as follows:

Note

The flattened data sequence returned by the input callable/iterable is converted back into the structured format automatically by the pipeline using the provided blueprint, so that the user does not need to worry about the conversion and can assume that SampleDataGroup objects are used throughout the pipeline.

The blueprint provided by the input callable/iterable is used by the pipeline to obtain the data format after each processing step and check for compatibility (see Pipeline & Processing Steps).

Note

The data format for the output of the pipeline is in turn needed to auto-convert the output from the flat sequence back into the structured format, e.g. using the DALIStructuredOutputIterator (see Output – DALI Structured Output Iterator).

See also

Note that there are pre-defined input callables/iterables which cover a wide range of use-cases, so that often there is no need to implement a custom callable/iterable. These are:

The pre-defined input callables/iterables are designed to be agnostic of the actual dataset, which is interfaced through a DataProvider class (see section below). The input data provider is expected to provide the data in the structured format (i.e. as a SampleDataGroup object) one sample at a time, and the conversion to the flattened format (and batching of samples if needed) is performed internally by the input callable/iterable. In this way, the conversion is transparent to the user of the package when using the included input callables/iterables.

Input Data Provider

The pre-defined input callables/iterables are generic, and do not assume a specific dataset. Dataset- (and use-case-) specific functionality is provided by a class implementing the DataProvider interface. Such data providers can be used by the input callable/iterable to read the actual data from the dataset. The task of data reading is split as follows:

  • Input callable/iterable:

    • Define which sample(s) to load

    • Use the data provider to load the actual data

    • Convert the data to the flat format which needs to be returned by the input callable/iterable

    • When outputting the data format ( used_sample_data_structure() or used_sample_data_structure()), this information is internally obtained from the data provider

    • Similarly, the length of the dataset is internally derived from the data provider (but may be modified, e.g. by sharding, dropping of samples, converting to number of batches, etc.)

  • Data provider:

This means that while the pre-defined input callables/iterables are a fixed and re-usable part of the package, the data provider is specific and needs to be implemented by the user of the package.

See also

Note that there are data providers for the NuScenes dataset in the examples folder of the package ( packages/dali_pipeline_framework/examples/pipeline_setup/additional_impl/data_loading). They can be used as a reference for implementing data providers for other datasets. They do not read all the data available in NuScenes (e.g. lidar point clouds), but rather focus on the data which is needed for the use cases at hand. Depending on your use-case, it may be possible to use them as is, or at least as a starting point for a customized implementation.

Please also see the Data Loading for NuScenes page for more details on the design of the data loaders.

Also, note that some re-usability between use-cases is possible by implementing common functionality which can be used by different data provides. This is also discussed in the Data Loading for NuScenes page.