NeMo-Skills

NeMo-Skills is a collection of pipelines to improve "skills" of large language models. You can use it to generate synthetic data, train/evaluate models, analyzing outputs and more! Here are some of the things we support.

  • Flexible inference: Seamlessly switch between API providers, local server and large-scale slurm jobs for LLM inference.
  • Multiple formats: Use any of the NeMo, vLLM, sglang and TensorRT-LLM servers and easily convert checkpoints from one format to another.
  • Model evaluation: Evaluate your models on many popular benchmarks
    • Math problem solving: math, aime24, aime25, omni-math (and many more)
    • Formal proofs in Lean: minif2f, proofnet
    • Coding skills: human-eval, mbpp
    • Chat/instruction following: ifeval, arena-hard, mt-bench
    • General knowledge: mmlu, mmlu-pro, gpqa
  • Model training: Train models at speed-of-light using NeMo-Aligner.

To get started, follow this tutorial, browse available pipelines or run ns --help to see all available commands and their options.