OpenMathInstruct-2¶
Using our pipelines we created OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.
The models trained on this dataset achieve strong results on common mathematical benchmarks.
model | GSM8K | MATH | AMC 2023 | AIME 2024 | Omni-MATH |
Llama3.1-8B-Instruct | 84.5 | 51.9 | 9/40 | 2/30 | 12.7 |
OpenMath2-Llama3.1-8B (nemo | HF) | 91.7 | 67.8 | 16/40 | 3/30 | 22.0 |
+ majority@256 | 94.1 | 76.1 | 23/40 | 3/30 | 24.6 |
Llama3.1-70B-Instruct | 95.1 | 68.0 | 19/40 | 6/30 | 19.0 |
OpenMath2-Llama3.1-70B (nemo | HF) | 94.9 | 71.9 | 20/40 | 4/30 | 23.1 |
+ majority@256 | 96.0 | 79.6 | 24/40 | 6/30 | 27.6 |
Paper¶
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
If you find our work useful, please consider citing us!
@inproceedings{toshniwal2024openmathinstruct2,
title = {{OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}},
author = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman},
year = {2025},
booktitle = {ICLR},
}
How to reproduce our results¶
Browse the sections below to see all commands needed to fully reproduce our results.
Please note that unless you have an access to a large GPU cluster, it might take a very long time for some of the commands to complete!