Model training¶
Download data and convert to SFT format¶
OpenReasoning dataset consists of 5 independent parts:
- Math CoT data
- Math TIR data
- Math GenSelect data
- Code CoT data
- Science CoT data
All datasets except GenSelect are now released. You can use code snippets below to download them and prepare for SFT. For final training dataset, you should concatenate all of the data together.
Math CoT data¶
Math CoT data is released as part of the nvidia/Nemotron-Post-Training-Dataset-v1 dataset.
from functools import partial
from datasets import load_dataset
from nemo_skills.prompt.utils import get_prompt
def apply_format(elem, prompt):
assert len(elem['messages']) == 2
elem['input'] = prompt.fill({'problem': elem['messages'][0]['content']})
elem['output'] = elem['messages'][1]['content'] + prompt.config.template.assistant_end
return elem
dataset = load_dataset("nvidia/Nemotron-Post-Training-Dataset-v1", split="math")
prompt = get_prompt('generic/math', 'qwen-instruct')
func = partial(apply_format, prompt=prompt)
dataset = dataset.map(func, num_proc=20)
dataset = dataset.remove_columns(['messages'])
dataset.to_json("open-reasoning-math-cot.jsonl")
Math TIR data¶
We re-use math TIR data from nvidia/OpenMathReasoning dataset. While we included this data in training and our released models are capable of TIR inference, we found that results are generally worse than using CoT. To fix this, TIR data would need to be re-generated using newer models, but this is not done in our current release.
To get this data, follow instructions for the second-round SFT data in OpenMathReasoning documentation.
Math GenSelect data¶
Coming soon!
Code CoT data¶
Code CoT data is released as part of the nvidia/Nemotron-Post-Training-Dataset-v1 dataset.
import json
from functools import partial
from datasets import load_dataset
from nemo_skills.prompt.utils import get_prompt
question_datasets = {
"taco": load_dataset("BAAI/TACO"),
"apps": load_dataset("codeparrot/apps"),
"code_contests": load_dataset("deepmind/code_contests"),
"open-r1/codeforces": load_dataset("open-r1/codeforces")
}
def get_question(ds_name, split, index):
benchmark = question_datasets[ds_name][split][int(index)]
if ds_name == "code_contests":
return benchmark["description"]
elif ds_name in ["taco", "apps"]:
return benchmark["question"]
elif ds_name == "open-r1/codeforces":
question = benchmark["description"]
if benchmark["input_format"]:
question += "\n\nInput\n\n" + benchmark["input_format"]
if benchmark["output_format"]:
question += "\n\nOutput\n\n" + benchmark["output_format"]
if benchmark["examples"]:
question += "\n\nExamples"
for example in benchmark["examples"]:
if "input" in example:
question += "\n\nInput\n\n" + example["input"]
if "output" in example:
question += "\n\nOutput\n\n" + example["output"]
if benchmark["note"]:
question += "\n\nNote\n\n" + benchmark["note"]
return question
else:
raise RuntimeError("Something wrong with the data!")
def apply_format(elem, prompt):
metadata = json.loads(elem['metadata'])
question = get_question(metadata['dataset'], metadata['split'], int(metadata['index']))
elem['input'] = prompt.fill({'question': question})
elem['output'] = elem['messages'][1]['content'] + prompt.config.template.assistant_end
return elem
dataset = load_dataset("nvidia/Nemotron-Post-Training-Dataset-v1", split="code")
prompt = get_prompt('eval/livecodebench/python_codegen_reasoning', 'qwen-instruct')
func = partial(apply_format, prompt=prompt)
dataset = dataset.map(func, num_proc=20)
dataset = dataset.remove_columns(['messages'])
dataset.to_json("open-reasoning-code-cot.jsonl")
Science CoT data¶
Science CoT data is released as nvidia/OpenScienceReasoning-2 dataset.
from functools import partial
from datasets import load_dataset
from nemo_skills.prompt.utils import get_prompt
def apply_format(elem, prompt):
elem['input'] = prompt.fill({'question': elem['input']})
elem['output'] += prompt.config.template.assistant_end
return elem
dataset = load_dataset("nvidia/OpenScienceReasoning-2", split="train")
prompt = get_prompt('generic/default', 'qwen-instruct') # data already includes instruction
func = partial(apply_format, prompt=prompt)
dataset = dataset.map(func, num_proc=20)
dataset.to_json("open-reasoning-science-cot.jsonl")
Train the models¶
We mostly use the same training commands as for OpenMathReasoning models. The only difference is that we pack sequences to 49152 length and use a little different hyperparameters detailed in the following table.
lr | min_lr | TP | PP | CP | |
---|---|---|---|---|---|
Qwen2.5-Math-1.5B | 1e-4 | 1e-7 | 1 | 1 | 4 |
Qwen2.5-Math-7B | 1e-4 | 1e-7 | 4 | 1 | 4 |
Qwen2.5-14B | 1e-4 | 1e-7 | 8 | 1 | 4 |
Qwen2.5-32B | 1e-4 | 1e-7 | 8 | 2 | 4 |
All models are trained for 30000 steps with a single round of SFT and we take the last checkpoint as the final model.