NeMo-Skills
NeMo-Skills is a collection of pipelines to improve "skills" of large language models (LLMs). We support everything needed for LLM development, from synthetic data generation, to model training, to evaluation on a wide range of benchmarks. Start developing on a local workstation and move to a large-scale Slurm cluster with just a one-line change.
Here are some of the features we support:
- Flexible LLM inference:
- Seamlessly switch between API providers, local server and large-scale Slurm jobs for LLM inference.
- Host models (on 1 or many nodes) with TensorRT-LLM, vLLM, sglang, Megatron or NeMo.
- Scale SDG jobs from 1 GPU on a local machine all the way to tens of thousands of GPUs on a Slurm cluster.
- Model evaluation:
- Evaluate your models on many popular benchmarks.
- Math problem solving: hmmt_feb25, brumo25, aime24, aime25, omni-math (and many more)
- Formal proofs in Lean: minif2f, proofnet
- Coding skills: scicode, livecodebench, human-eval, mbpp
- Chat/instruction following: ifeval, arena-hard, mt-bench
- General knowledge: mmlu, mmlu-pro, gpqa
- Long context: ruler
- Easily parallelize each evaluation across many Slurm jobs, self-host LLM judges, bring your own prompts or change benchmark configuration in any other way.
- Evaluate your models on many popular benchmarks.
- Model training: Train models using NeMo-Aligner, NeMo-RL or verl.
To get started, follow these steps, browse available pipelines or run ns --help
to see all available
commands and their options.
You can find more examples of how to use NeMo-Skills in the tutorials page.
We've built and released many popular models and datasets using NeMo-Skills. See all of them in the Papers & Releases documentation.