Megatron-Energon Documentation
This is the documentation of Megatron’s multi-modal data loader “Energon”.
We recommend getting started in the Introduction section, which explains what Energon is and how to install it.
Once installed, check out the Basic Usage section starting with Quickstart for some basic examples and tutorials. Some underlying concepts, will be explained in the rest of that section.
For specific use cases and advanced usage, please read Advanced Usage.
In the end you will also find some documentation on how to interface with energon programmatically and how to contribute to the code base.
Basic Usage
- Quickstart
- Data Preparation
- Important Considerations
- Steps to Create a Monolithic Dataset
- Steps to Create a Polylithic Dataset
- Steps to Create a JSONL Dataset
- Step 1: Creating a WebDataset
- Step 2: Preparing the Dataset
- Dataset Format on Disk (WebDataset)
- Dataset Format on Disk for JSONL Datasets
- Dataset Format on Disk for Filesystem Datasets
- Data Decoding
- Data Flow
- Task Encoder
- Metadataset
- Save and Restore
- Glossary
Advanced Usage
API
Internals