OpenMathInstruct-2¶

Using our pipelines we created OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.

The models trained on this dataset achieve strong results on common mathematical benchmarks.

model	GSM8K	MATH	AMC 2023	AIME 2024	Omni-MATH
Llama3.1-8B-Instruct	84.5	51.9	9/40	2/30	12.7
OpenMath2-Llama3.1-8B (nemo \| HF)	91.7	67.8	16/40	3/30	22.0
+ majority@256	94.1	76.1	23/40	3/30	24.6
Llama3.1-70B-Instruct	95.1	68.0	19/40	6/30	19.0
OpenMath2-Llama3.1-70B (nemo \| HF)	94.9	71.9	20/40	4/30	23.1
+ majority@256	96.0	79.6	24/40	6/30	27.6

Paper¶

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

If you find our work useful, please consider citing us!

@inproceedings{toshniwal2024openmathinstruct2,
  title   = {{OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}},
  author  = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman},
  year    = {2025},
  booktitle = {ICLR},
}

How to reproduce our results¶

Browse the sections below to see all commands needed to fully reproduce our results.

Please note that unless you have an access to a large GPU cluster, it might take a very long time for some of the commands to complete!