NAS Parallel Benchmarks

The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The NPB 1 benchmarks are derived from computational fluid dynamics (CFD) applications and consist of five kernels and three pseudo-applications. Problem sizes in NPB are predefined and indicated as different classes. Reference implementations of NPB are available in commonly-used programming models like MPI and OpenMP.

Building the Benchmarks

  1. Download and unpack the NPB source code from

    wget tar xvzf NPB3.4.2.tar.gz cd NPB3.4.2/NPB3.4-OMP
  2. Create the make.def file to configure the build for NVIDIA HPC compilers:

    cat > config/make.def <<'EOF' FC = nvfortran FLINK = $(FC) F_LIB = F_INC = FFLAGS = -O3 -mp FLINKFLAGS = $(FFLAGS) CC = nvc CLINK = $(CC) C_LIB = -lm C_INC = CFLAGS = -O3 -mp CLINKFLAGS = $(CFLAGS) UCC = gcc BINDIR = ../bin RAND = randi8 WTIME = wtime.c EOF
  3. Create the suite.def file to build all benchmarks with the D problem size:

    cat > config/suite.def <<'EOF' bt D cg D ep D lu D mg D sp D ua D EOF
  4. Compile all benchmarks:

    make -j suite

A successful compilation will generate these binaries in the bin/ directory:

$ ls bin/ bt.D.x cg.D.x ep.D.x ft.D.x lu.D.x mg.D.x sp.D.x ua.D.x

Running the Benchmarks

Run each benchmark individually using the command shown below. In the command, replace ${BENCHMARK} with the benchmark name, for example cg.D.x, and replace ${THREADS} and ${FLAGS} with the appropriate values from the reference results shown above.


Reference Results


These figures are provided as guidelines and should not be interpreted as performance targets.

Grace CPU Superchip, 480GB Memory Capacity

Use this script to run all the benchmarks on 72 cores of the Grace CPU:

#!/bin/bash for BENCHMARK in bt cg ep lu mg sp ua ; do OMP_NUM_THREADS=72 OMP_PROC_BIND=close numactl -m0 ./bin/${BENCHMARK}.D.x done

Performance is reported on the line marked “Mops / total”. The expected performance is shown below.

BenchmarkMops / total