Output – DALI Structured Output Iterator

In general, the DALI pipeline emits a flat sequence of tensors (or DALI tensor lists). In case of our framework, these are the results obtained from calling get_data() on the SampleDataGroup object used in the pipeline.

For complex data formats, a flat list quickly becomes hard to manage. Therefore, we introduce the DALIStructuredOutputIterator class, which re-assembles the data to its original structure.

The DALIStructuredOutputIterator is designed to be a drop-in replacement for a PyTorch DataLoader. Apart from the re-assembly of the data, this is achieved by:

  • Using the same interface as a PyTorch DataLoader (i.e. the iterator interface)

  • Option to auto-convert the output to a nested dictionary (using to_dictionary() internally)

  • Option to apply a user-defined post-processing function whenever obtaining the data (to perform light-weight steps not possible in the pipeline, e.g. convert certain fields to a type not directly supported by DALI)

Note

The user-defined post-processing in DALIStructuredOutputIterator runs in the training thread when data is requested; keep it lightweight and prefer doing work inside the DALI pipeline where possible.

Note

While the DALIStructuredOutputIterator class is designed to be a drop-in replacement for a PyTorch DataLoader, there may be issues if the training implementation contains checks in the form of assert isinstance(iterator_object, DataLoader). These checks may be inside dependencies used by the training implementation, and so cannot be changed easily in a clean way. For these cases, the DALIStructuredOutputIterator provides a CreateAsDataLoaderObject() method, which creates an iterator object masked as a PyTorch DataLoader object, so that these checks pass.