Skip to content

OpenMathInstruct-2

Using our pipelines we created OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.

The models trained on this dataset achieve strong results on common mathematical benchmarks.

model GSM8K MATH AMC 2023 AIME 2024 Omni-MATH
Llama3.1-8B-Instruct 84.5 51.9 9/40 2/30 12.7
OpenMath2-Llama3.1-8B (nemo | HF) 91.7 67.8 16/40 3/30 22.0
+ majority@256 94.1 76.1 23/40 3/30 24.6
Llama3.1-70B-Instruct 95.1 68.0 19/40 6/30 19.0
OpenMath2-Llama3.1-70B (nemo | HF) 94.9 71.9 20/40 4/30 23.1
+ majority@256 96.0 79.6 24/40 6/30 27.6

Paper

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

If you find our work useful, please consider citing us!

@inproceedings{toshniwal2024openmathinstruct2,
  title   = {{OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}},
  author  = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman},
  year    = {2025},
  booktitle = {ICLR},
}

How to reproduce our results

Browse the sections below to see all commands needed to fully reproduce our results.

Please note that unless you have an access to a large GPU cluster, it might take a very long time for some of the commands to complete!