CUDA Quantum in C++

Here we seek to get the new user started with some basic C++ examples using CUDA Quantum.

Introduction

Welcome to CUDA Quantum! We’re going to take a look at how to construct quantum programs using CUDA Quantum kernel expressions.

CUDA Quantum kernels are any typed callable in the language that is annotated with the __qpu__ attribute. Let’s take a look at a very simple “Hello World” example, specifically a CUDA Quantum kernel that prepares a GHZ state on a programmer-specified number of qubits.

// Compile and run with:
// nvq++ static_kernel.cpp -o ghz.x && ./ghz.x

#include <cudaq.h>

// Define a CUDA Quantum kernel that is fully specified
// at compile time via templates.
template <std::size_t N>
struct ghz {
  auto operator()() __qpu__ {

    // Compile-time, std::array-like qreg.
    cudaq::qreg<N> q;
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {

  auto kernel = ghz<10>{};
  auto counts = cudaq::sample(kernel);
  counts.dump();

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  return 0;
}

Here we see that we can define a custom struct that is templated on a size_t parameter. Our kernel expression is free to use this template parameter in the allocation of a compile-time-known register of qubits. Within the kernel, we are free to apply various quantum operations, like a hadamard on qubit 0 h(q[0]). Controlled operations are modifications of single-qubit operations, like the x<cudaq::ctrl>(q[0],q[1]) operation to affect a controlled-X. We can measure single qubits or entire registers.

In this example we are interested in sampling the final state produced by this CUDA Quantum kernel. To do so, we leverage the generic cudaq::sample function, which returns a data type encoding the qubit measurement strings and the corresponding number of times that string was observed (here the default number of shots is used, 1000).

To compile and execute this code, we run the following

nvq++ static_kernel.cpp -o ghz.x
./ghz.x

Computing Expectation Values

CUDA Quantum provides generic library functions enabling one to compute expectation values of quantum spin operators with respect to a parameterized CUDA Quantum kernel. Let’s take a look at an example of this:

// Compile and run with:
// nvq++ expectation_values.cpp -o d2.x && ./d2.x

#include <cudaq.h>
#include <cudaq/algorithm.h>

// The example here shows a simple use case for the cudaq::observe()
// function in computing expected values of provided spin_ops.

struct ansatz {
  auto operator()(double theta) __qpu__ {
    cudaq::qreg q(2);
    x(q[0]);
    ry(theta, q[1]);
    x<cudaq::ctrl>(q[1], q[0]);
  }
};

int main() {

  // Build up your spin op algebraically
  using namespace cudaq::spin;
  cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
                     .21829 * z(0) - 6.125 * z(1);

  // Observe takes the kernel, the spin_op, and the concrete params for the
  // kernel
  double energy = cudaq::observe(ansatz{}, h, .59);
  printf("Energy is %lf\n", energy);
  return 0;
}

Here we define a parameterized CUDA Quantum kernel, a callable type named ansatz that takes as input a single angle theta. This angle is used as part of a single ry rotation.

In host code, we define a Hamiltonian operator we are interested in via the CUDA Quantun spin_op type. CUDA Quantum provides a generic function cudaq::observe which takes a parameterized kernel, the spin_op whose expectation value we wish to compute, and the runtime parameters at which we evaluate the parameterized kernel.

The return type of this function is an cudaq::observe_result which contains all the data from the execution, but is trivially convertible to a double, resulting in the expectation value we are interested in.

To compile and execute this code, we run the following

nvq++ expectation_values.cpp -o exp_vals.x
./exp_vals.x

Multi-control Synthesis

Now let’s take a look at how CUDA Quantum allows one to control a general unitary on an arbitrary number of control qubits. For this scenario, our general unitary can be described by another pre-defined CUDA Quantum kernel expression. Let’s take a look at the following example:

// Compile and run with:
// nvq++ multi_controlled_operations.cpp -o ccnot.x && ./ccnot.x

#include <cudaq.h>
#include <cudaq/algorithm.h>

// Here we demonstrate how one might apply multi-controlled
// operations on a general CUDA Quantum kernel.
struct ApplyX {
  void operator()(cudaq::qubit &q) __qpu__ { x(q); }
};

struct ccnot_test {
  // constrain the signature of the incoming kernel
  void operator()(cudaq::takes_qubit auto &&apply_x) __qpu__ {
    cudaq::qreg qs(3);

    x(qs);
    x(qs[1]);

    // Control U (apply_x) on the first two qubits of
    // the allocated register.
    cudaq::control(apply_x, qs.front(2), qs[2]);

    mz(qs);
  }
};

int main() {
  // We can achieve the same thing as above via
  // a lambda expression.
  auto ccnot = []() __qpu__ {
    cudaq::qreg q(3);

    x(q);
    x(q[1]);

    x<cudaq::ctrl>(q[0], q[1], q[2]);

    mz(q);
  };

  auto counts = cudaq::sample(ccnot);

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  auto counts2 = cudaq::sample(ccnot_test{}, ApplyX{});

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts2) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }
}

In this example, we show 2 distinct ways for generating a Toffoli operation. The first one in host code is the definition of a CUDA Quantum lambda that synthesizes a Toffoli via the general multi-control functionality for any single-qubit quantum operation x<cudaq::ctrl>(q[0], q[1], q[2]).

The second way to generate a Toffoli operation starts with a kernel that takes another kernel as input. CUDA Quantum exposes a way to synthesize a control on any general unitary described as another kernel - the cudaq::control() call. Here we take as input a kernel that applies an X operation to the given qubit. Within the control call, we specify two control qubits, and the final target qubit. This call requires trailing parameters that serve as the arguments for the applied kernel (apply_x takes a single target qubit).

To compile and execute this code, we run the following

nvq++ multi_controlled_operations.cpp -o mcx.x
./mcx.x

Simulations with cuQuantum

CUDA Quantum provides native support for cuQuantum-accelerated state vector and tensor network simulations. Let’s take a look at an example that is too large for a standard CPU-only simulator, but can be trivially simulated via a NVIDIA GPU-accelerated backend:

// Compile and run with:
// nvq++ cuquantum_backends.cpp -o dyn.x -qpu cuquantum && ./dyn.x

// This example is meant to demonstrate the cuQuantum
// target and its ability to easily handle a larger number
// of qubits compared the CPU-only backend.

// Without the `-qpu cuquantum` flag, this seems to hang, i.e.
// it takes a long time for the CPU-only backend to handle
// this number of qubits.

#include <cudaq.h>

// Define a quantum kernel with a runtime parameter
struct ghz {
  auto operator()(const int N) __qpu__ {

    // Dynamic, vector-like qreg
    cudaq::qreg q(N);
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  auto counts = cudaq::sample(ghz{}, 30);
  counts.dump();

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  return 0;
}

Here we generate a GHZ state on 30 qubits. To run with the built-in cuQuantum state vector support, we pass the --qpu cuquantum flag at compile time:

nvq++ --qpu cuquantum cuquantum_backends.cpp -o ghz.x
./ghz.x