General

Megatron-Energon is a data loader that works best with your Megatron project. However, you can use it in any of your PyTorch-based deep learning projects.

What can it offer compared to other data loaders?

The most important features are:

  • Comes with a standardized WebDataset-based format on disk

  • Optimized for high-speed multi-rank training

  • Can easily mix and blend multiple datasets

  • Its state is savable and restorable

  • Handles various kinds of multi-modal data even in one training

Energon also comes with a command line tool that you can use to prepare your datasets.

Quickstart

Get started with just a few steps. For this, please check out the main README.md. Then come back here for the details.