CUDA Quantum in C++

Welcome to CUDA Quantum! This is a introduction by example for using CUDA Quantum in C++.

Introduction

Welcome to CUDA Quantum! We’re going to take a look at how to construct quantum programs using CUDA Quantum kernel expressions.

CUDA Quantum kernels are any typed callable in the language that is annotated with the __qpu__ attribute. Let’s take a look at a very simple “Hello World” example, specifically a CUDA Quantum kernel that prepares a GHZ state on a programmer-specified number of qubits.

// Compile and run with:
// ```
// nvq++ static_kernel.cpp -o ghz.x && ./ghz.x
// ```

#include <cudaq.h>

// Define a CUDA Quantum kernel that is fully specified
// at compile time via templates.
template <std::size_t N>
struct ghz {
  auto operator()() __qpu__ {

    // Compile-time, std::array-like `qreg`.
    cudaq::qreg<N> q;
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {

  auto kernel = ghz<10>{};
  auto counts = cudaq::sample(kernel);
  counts.dump();

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  return 0;
}

Here we see that we can define a custom struct that is templated on a size_t parameter. Our kernel expression is free to use this template parameter in the allocation of a compile-time-known register of qubits. Within the kernel, we are free to apply various quantum operations, like a Hadamard on qubit 0 h(q[0]). Controlled operations are modifications of single-qubit operations, like the x<cudaq::ctrl>(q[0],q[1]) operation to affect a controlled-X. We can measure single qubits or entire registers.

In this example we are interested in sampling the final state produced by this CUDA Quantum kernel. To do so, we leverage the generic cudaq::sample function, which returns a data type encoding the qubit measurement strings and the corresponding number of times that string was observed (here the default number of shots is used, 1000).

To compile and execute this code, we run the following

nvq++ static_kernel.cpp -o ghz.x
./ghz.x

Computing Expectation Values

CUDA Quantum provides generic library functions enabling one to compute expectation values of quantum spin operators with respect to a parameterized CUDA Quantum kernel. Let’s take a look at an example of this:

// Compile and run with:
// ```
// nvq++ expectation_values.cpp -o d2.x && ./d2.x
// ```

#include <cudaq.h>
#include <cudaq/algorithm.h>

// The example here shows a simple use case for the `cudaq::observe`
// function in computing expected values of provided spin_ops.

struct ansatz {
  auto operator()(double theta) __qpu__ {
    cudaq::qreg q(2);
    x(q[0]);
    ry(theta, q[1]);
    x<cudaq::ctrl>(q[1], q[0]);
  }
};

int main() {

  // Build up your spin op algebraically
  using namespace cudaq::spin;
  cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
                     .21829 * z(0) - 6.125 * z(1);

  // Observe takes the kernel, the spin_op, and the concrete
  // parameters for the kernel
  double energy = cudaq::observe(ansatz{}, h, .59);
  printf("Energy is %lf\n", energy);
  return 0;
}

Here we define a parameterized CUDA Quantum kernel, a callable type named ansatz that takes as input a single angle theta. This angle is used as part of a single ry rotation.

In host code, we define a Hamiltonian operator we are interested in via the CUDA Quantum spin_op type. CUDA Quantum provides a generic function cudaq::observe which takes a parameterized kernel, the spin_op whose expectation value we wish to compute, and the runtime parameters at which we evaluate the parameterized kernel.

The return type of this function is an cudaq::observe_result which contains all the data from the execution, but is trivially convertible to a double, resulting in the expectation value we are interested in.

To compile and execute this code, we run the following

nvq++ expectation_values.cpp -o exp_vals.x
./exp_vals.x

Multi-control Synthesis

Now let’s take a look at how CUDA Quantum allows one to control a general unitary on an arbitrary number of control qubits. For this scenario, our general unitary can be described by another pre-defined CUDA Quantum kernel expression. Let’s take a look at the following example:

// Compile and run with:
// ```
// nvq++ multi_controlled_operations.cpp -o ccnot.x && ./ccnot.x
// ```

#include <cudaq.h>
#include <cudaq/algorithm.h>

// Here we demonstrate how one might apply multi-controlled
// operations on a general CUDA Quantum kernel.
struct ApplyX {
  void operator()(cudaq::qubit &q) __qpu__ { x(q); }
};

struct ccnot_test {
  // constrain the signature of the incoming kernel
  void operator()(cudaq::takes_qubit auto &&apply_x) __qpu__ {
    cudaq::qreg qs(3);

    x(qs);
    x(qs[1]);

    // Control U (apply_x) on the first two qubits of
    // the allocated register.
    cudaq::control(apply_x, qs.front(2), qs[2]);

    mz(qs);
  }
};

int main() {
  // We can achieve the same thing as above via
  // a lambda expression.
  auto ccnot = []() __qpu__ {
    cudaq::qreg q(3);

    x(q);
    x(q[1]);

    x<cudaq::ctrl>(q[0], q[1], q[2]);

    mz(q);
  };

  auto counts = cudaq::sample(ccnot);

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  auto counts2 = cudaq::sample(ccnot_test{}, ApplyX{});

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts2) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }
}

In this example, we show 2 distinct ways for generating a Toffoli operation. The first one in host code is the definition of a CUDA Quantum lambda that synthesizes a Toffoli via the general multi-control functionality for any single-qubit quantum operation x<cudaq::ctrl>(q[0], q[1], q[2]).

The second way to generate a Toffoli operation starts with a kernel that takes another kernel as input. CUDA Quantum exposes a way to synthesize a control on any general unitary described as another kernel - the cudaq::control() call. Here we take as input a kernel that applies an X operation to the given qubit. Within the control call, we specify two control qubits, and the final target qubit. This call requires trailing parameters that serve as the arguments for the applied kernel (apply_x takes a single target qubit).

To compile and execute this code, we run the following

nvq++ multi_controlled_operations.cpp -o mcx.x
./mcx.x

Simulations with cuQuantum

CUDA Quantum provides native support for cuQuantum-accelerated state vector and tensor network simulations. Let’s take a look at an example that is too large for a standard CPU-only simulator, but can be trivially simulated via a NVIDIA GPU-accelerated backend:

// Compile and run with:
// ```
// nvq++ cuquantum_backends.cpp -o dyn.x --target nvidia && ./dyn.x
// ```

// This example is meant to demonstrate the cuQuantum
// GPU-accelerated backends and their ability to easily handle
// a larger number of qubits compared the CPU-only backend.

// Without the `--target nvidia` flag, this seems to hang, i.e.
// it takes a long time for the CPU-only backend to handle
// this number of qubits.

#include <cudaq.h>

// Define a quantum kernel with a runtime parameter
struct ghz {
  auto operator()(const int N) __qpu__ {

    // Dynamic, vector-like `qreg`
    cudaq::qreg q(N);
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  auto counts = cudaq::sample(ghz{}, 30);
  counts.dump();

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  return 0;
}

Here we generate a GHZ state on 30 qubits. To run with the built-in cuQuantum state vector support, we pass the --target nvidia flag at compile time:

nvq++ --target nvidia cuquantum_backends.cpp -o ghz.x
./ghz.x

Using Quantum Hardware Providers

CUDA Quantum contains support for using a set of hardware providers. For more information about executing quantum kernels on different hardware backends, please take a look at CUDA Quantum Hardware Backends.

The following code illustrates how run kernels on Quantinuum’s backends.

// Compile and run with:
// ```
// nvq++ --target quantinuum --quantinuum-machine H1-2E quantinuum.cpp -o out.x
// ./out.x
// ```
// Assumes a valid set of credentials have been stored.
// To first confirm the correctness of the program locally,
// Add a --emulate to the `nvq++` command above.

#include <cudaq.h>
#include <fstream>

// Define a simple quantum kernel to execute on Quantinuum.
struct ghz {
  // Maximally entangled state between 5 qubits.
  auto operator()() __qpu__ {
    cudaq::qreg q(5);
    h(q[0]);
    for (int i = 0; i < 4; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  // Submit to Quantinuum asynchronously. E.g, continue executing
  // code in the file until the job has been returned.
  auto future = cudaq::sample_async(ghz{});
  // ... classical code to execute in the meantime ...

  // Can write the future to file:
  {
    std::ofstream out("saveMe.json");
    out << future;
  }

  // Then come back and read it in later.
  cudaq::async_result<cudaq::sample_result> readIn;
  std::ifstream in("saveMe.json");
  in >> readIn;

  // Get the results of the read in future.
  auto async_counts = readIn.get();
  async_counts.dump();

  // OR: Submit to Quantinuum synchronously. E.g, wait for the job
  // result to be returned before proceeding.
  auto counts = cudaq::sample(ghz{});
  counts.dump();
}

The following code illustrates how run kernels on IonQ’s backends.

// Compile and run with:
// ```
// nvq++ --target ionq ionq.cpp -o out.x && ./out.x
// ```
// Assumes a valid set of credentials have been stored.

#include <cudaq.h>
#include <fstream>

// Define a simple quantum kernel to execute on IonQ.
struct ghz {
  // Maximally entangled state between 5 qubits.
  auto operator()() __qpu__ {
    cudaq::qreg q(5);
    h(q[0]);
    for (int i = 0; i < 4; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    // Note: All qubits will be measured at the end upon performing
    // the sampling. You may encounter a pre-flight error on IonQ
    // backends if you include explicit measurements.
  }
};

int main() {
  // Submit to IonQ asynchronously. E.g, continue executing
  // code in the file until the job has been returned.
  auto future = cudaq::sample_async(ghz{});
  // ... classical code to execute in the meantime ...

  // Can write the future to file:
  {
    std::ofstream out("saveMe.json");
    out << future;
  }

  // Then come back and read it in later.
  cudaq::async_result<cudaq::sample_result> readIn;
  std::ifstream in("saveMe.json");
  in >> readIn;

  // Get the results of the read in future.
  auto async_counts = readIn.get();
  async_counts.dump();

  // OR: Submit to IonQ synchronously. E.g, wait for the job
  // result to be returned before proceeding.
  auto counts = cudaq::sample(ghz{});
  counts.dump();
}