General
Megatron-Energon is a data loader that works best with your Megatron project. However, you can use it in any of your PyTorch-based deep learning projects.
What can it offer compared to other data loaders?
The most important features are:
Comes with a standardized WebDataset-based format on disk
Optimized for high-speed multi-rank training
Can easily mix and blend multiple datasets
Its state is savable and restorable
Handles various kinds of multi-modal data even in one training
Energon also comes with a command line tool that you can use to prepare your datasets.
Quickstart
Get started with just a few steps. For this, please check out the main README.md. Then come back here for the details.