CUDA Quantum in C++

Welcome to CUDA Quantum! This is a introduction by example for using CUDA Quantum in C++.

Introduction

Welcome to CUDA Quantum! We’re going to take a look at how to construct quantum programs using CUDA Quantum kernel expressions.

CUDA Quantum kernels are any typed callable in the language that is annotated with the __qpu__ attribute. Let’s take a look at a very simple “Hello World” example, specifically a CUDA Quantum kernel that prepares a GHZ state on a programmer-specified number of qubits.

// Compile and run with:
// ```
// nvq++ static_kernel.cpp -o ghz.x && ./ghz.x
// ```

#include <cudaq.h>

// Define a CUDA Quantum kernel that is fully specified
// at compile time via templates.
template <std::size_t N>
struct ghz {
  auto operator()() __qpu__ {

    // Compile-time sized array like std::array
    cudaq::qarray<N> q;
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {

  auto kernel = ghz<10>{};
  auto counts = cudaq::sample(kernel);

  if (!cudaq::mpi::is_initialized() || cudaq::mpi::rank() == 0) {
    counts.dump();

    // Fine grain access to the bits and counts
    for (auto &[bits, count] : counts) {
      printf("Observed: %s, %lu\n", bits.data(), count);
    }
  }

  return 0;
}

Here we see that we can define a custom struct that is templated on a size_t parameter. Our kernel expression is free to use this template parameter in the allocation of a compile-time-known register of qubits. Within the kernel, we are free to apply various quantum operations, like a Hadamard on qubit 0 h(q[0]). Controlled operations are modifications of single-qubit operations, like the x<cudaq::ctrl>(q[0],q[1]) operation to affect a controlled-X. We can measure single qubits or entire registers.

In this example we are interested in sampling the final state produced by this CUDA Quantum kernel. To do so, we leverage the generic cudaq::sample function, which returns a data type encoding the qubit measurement strings and the corresponding number of times that string was observed (here the default number of shots is used, 1000).

To compile and execute this code, we run the following

nvq++ static_kernel.cpp -o ghz.x
./ghz.x

Computing Expectation Values

CUDA Quantum provides generic library functions enabling one to compute expectation values of quantum spin operators with respect to a parameterized CUDA Quantum kernel. Let’s take a look at an example of this:

// Compile and run with:
// ```
// nvq++ expectation_values.cpp -o d2.x && ./d2.x
// ```

#include <cudaq.h>
#include <cudaq/algorithm.h>

// The example here shows a simple use case for the `cudaq::observe`
// function in computing expected values of provided spin_ops.

struct ansatz {
  auto operator()(double theta) __qpu__ {
    cudaq::qvector q(2);
    x(q[0]);
    ry(theta, q[1]);
    x<cudaq::ctrl>(q[1], q[0]);
  }
};

int main() {

  // Build up your spin op algebraically
  using namespace cudaq::spin;
  cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
                     .21829 * z(0) - 6.125 * z(1);

  // Observe takes the kernel, the spin_op, and the concrete
  // parameters for the kernel
  double energy = cudaq::observe(ansatz{}, h, .59);
  printf("Energy is %lf\n", energy);
  return 0;
}

Here we define a parameterized CUDA Quantum kernel, a callable type named ansatz that takes as input a single angle theta. This angle is used as part of a single ry rotation.

In host code, we define a Hamiltonian operator we are interested in via the CUDA Quantum spin_op type. CUDA Quantum provides a generic function cudaq::observe which takes a parameterized kernel, the spin_op whose expectation value we wish to compute, and the runtime parameters at which we evaluate the parameterized kernel.

The return type of this function is an cudaq::observe_result which contains all the data from the execution, but is trivially convertible to a double, resulting in the expectation value we are interested in.

To compile and execute this code, we run the following

nvq++ expectation_values.cpp -o exp_vals.x
./exp_vals.x

Multi-control Synthesis

Now let’s take a look at how CUDA Quantum allows one to control a general unitary on an arbitrary number of control qubits. For this scenario, our general unitary can be described by another pre-defined CUDA Quantum kernel expression. Let’s take a look at the following example:

// Compile and run with:
// ```
// nvq++ multi_controlled_operations.cpp -o ccnot.x && ./ccnot.x
// ```

#include <cudaq.h>
#include <cudaq/algorithm.h>

// Here we demonstrate how one might apply multi-controlled
// operations on a general CUDA Quantum kernel.
struct ApplyX {
  void operator()(cudaq::qubit &q) __qpu__ { x(q); }
};

struct ccnot_test {
  // constrain the signature of the incoming kernel
  void operator()(cudaq::takes_qubit auto &&apply_x) __qpu__ {
    cudaq::qvector qs(3);

    x(qs);
    x(qs[1]);

    // Control U (apply_x) on the first two qubits of
    // the allocated register.
    cudaq::control(apply_x, qs.front(2), qs[2]);

    mz(qs);
  }
};

int main() {
  // We can achieve the same thing as above via
  // a lambda expression.
  auto ccnot = []() __qpu__ {
    cudaq::qvector q(3);

    x(q);
    x(q[1]);

    x<cudaq::ctrl>(q[0], q[1], q[2]);

    mz(q);
  };

  auto counts = cudaq::sample(ccnot);

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }

  auto counts2 = cudaq::sample(ccnot_test{}, ApplyX{});

  // Fine grain access to the bits and counts
  for (auto &[bits, count] : counts2) {
    printf("Observed: %s, %lu\n", bits.data(), count);
  }
}

In this example, we show 2 distinct ways for generating a Toffoli operation. The first one in host code is the definition of a CUDA Quantum lambda that synthesizes a Toffoli via the general multi-control functionality for any single-qubit quantum operation x<cudaq::ctrl>(q[0], q[1], q[2]).

The second way to generate a Toffoli operation starts with a kernel that takes another kernel as input. CUDA Quantum exposes a way to synthesize a control on any general unitary described as another kernel - the cudaq::control() call. Here we take as input a kernel that applies an X operation to the given qubit. Within the control call, we specify two control qubits, and the final target qubit. This call requires trailing parameters that serve as the arguments for the applied kernel (apply_x takes a single target qubit).

To compile and execute this code, we run the following

nvq++ multi_controlled_operations.cpp -o mcx.x
./mcx.x

Simulations with cuQuantum

CUDA Quantum provides native support for cuQuantum-accelerated state vector and tensor network simulations. Let’s take a look at an example that is too large for a standard CPU-only simulator, but can be trivially simulated via a NVIDIA GPU-accelerated backend:

// Compile and run with:
// ```
// nvq++ cuquantum_backends.cpp -o dyn.x --target nvidia && ./dyn.x
// ```

// This example is meant to demonstrate the cuQuantum
// GPU-accelerated backends and their ability to easily handle
// a larger number of qubits compared the CPU-only backend.

// On CPU-only backends, this seems to hang, i.e. it takes a long
// time to handle this number of qubits.

#include <cudaq.h>

// Define a quantum kernel with a runtime parameter
struct ghz {
  auto operator()(const int N) __qpu__ {

    // Dynamically sized vector of qubits
    cudaq::qvector q(N);
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  auto counts = cudaq::sample(/*shots=*/100, ghz{}, 28);

  if (!cudaq::mpi::is_initialized() || cudaq::mpi::rank() == 0) {
    counts.dump();

    // Fine grain access to the bits and counts
    for (auto &[bits, count] : counts) {
      printf("Observed: %s, %lu\n", bits.data(), count);
    }
  }

  return 0;
}

Here we generate a GHZ state on 28 qubits. To run with the built-in cuQuantum state vector support, we pass the --target nvidia flag at compile time:

nvq++ --target nvidia cuquantum_backends.cpp -o ghz.x
./ghz.x

Alternatively, we can set the environment variable CUDAQ_DEFAULT_SIMULATOR to nvidia.

Noisy Simulation

CUDA Quantum makes it simple to model noise within the simulation of your quantum program. Let’s take a look at the various built-in noise models we support, before concluding with a brief example of a custom noise model constructed from user-defined Kraus Operators.

The following code illustrates how to run a simulation with depolarization noise.

// Compile and run with:
// ```
// nvq++ noise_depolarization.cpp --target density-matrix-cpu -o dyn.x
// && ./dyn.x
// ```
//
// Note: You must set the target to a density matrix backend for the noise
// to successfully impact the system.

#include <cudaq.h>

// CUDA Quantum supports several different models of noise. In this
// case, we will examine the modeling of depolarization noise. This
// depolarization will result in the qubit state decaying into a mix
// of the basis states, |0> and |1>, with a user provided probability.

int main() {

  // We will begin by defining an empty noise model that we will add
  // our depolarization channel to.
  cudaq::noise_model noise;

  // Depolarization channel with `1.0` probability of the qubit state
  // being scrambled.
  cudaq::depolarization_channel depolarization(1.);
  // We will apply the channel to any Y-gate on qubit 0. Meaning,
  // for each Y-gate on our qubit, the qubit will have a `1.0`
  // probability of decaying into a mixed state.
  noise.add_channel<cudaq::types::y>({0}, depolarization);

  // Our kernel will apply a Y-gate to qubit 0.
  // This will bring the qubit to the |1> state, where it will remain
  // with a probability of `1 - p = 0.0`.
  auto kernel = []() __qpu__ {
    cudaq::qubit q;
    y(q);
    mz(q);
  };

  // Now let's set the noise and we're ready to run the simulation!
  cudaq::set_noise(noise);

  // With noise, the measurements should be a roughly 50/50
  // mix between the |0> and |1> states.
  auto noisy_counts = cudaq::sample(kernel);
  noisy_counts.dump();

  // To confirm this, we can run the simulation again without noise.
  // Without noise, the qubit should still be in the |1> state.
  cudaq::unset_noise();
  auto noiseless_counts = cudaq::sample(kernel);
  noiseless_counts.dump();
}

The following code illustrates how to run a simulation with amplitude damping noise.

// Compile and run with:
// ```
// nvq++ noise_amplitude_damping.cpp --target density-matrix-cpu -o dyn.x
// && ./dyn.x
// ```
//
// Note: You must set the target to a density matrix backend for the noise
// to successfully impact the system.

#include <cudaq.h>

// CUDA Quantum supports several different models of noise. In this case,
// we will examine the modeling of energy dissipation within our system
// via environmental interactions. The result of this "amplitude damping"
// is to return the qubit to the |0> state with a user-specified probability.

int main() {

  // We will begin by defining an empty noise model that we will add
  // our damping channel to.
  cudaq::noise_model noise;

  // Amplitude damping channel with `1.0` probability of the qubit
  // decaying to the ground state.
  cudaq::amplitude_damping_channel ad(1.);

  // We will apply this channel to any Hadamard gate on the qubit.
  // Meaning, after each Hadamard on the qubit, there will be a
  // probability of `1.0` that the qubit decays back to ground.
  noise.add_channel<cudaq::types::h>({0}, ad);

  // The Hadamard gate here will bring the qubit to `1/sqrt(2) (|0> + |1>)`,
  // where it will remain with a probability of `1 - p = 0.0`.
  auto kernel = []() __qpu__ {
    cudaq::qubit q;
    h(q);
    mz(q);
  };

  // Now let's set the noise and we're ready to run the simulation!
  cudaq::set_noise(noise);

  // Our results should show all measurements in the |0> state, indicating
  // that the noise has successfully impacted the system.
  auto noisy_counts = cudaq::sample(kernel);
  noisy_counts.dump();

  // To confirm this, we can run the simulation again without noise.
  // The qubit will now have a 50/50 mix of measurements between
  // |0> and |1>.
  cudaq::unset_noise();
  auto noiseless_counts = cudaq::sample(kernel);
  noiseless_counts.dump();
}

The following code illustrates how to run a simulation with bit-flip noise.

// Compile and run with:
// ```
// nvq++ noise_bit_flip.cpp --target density-matrix-cpu -o dyn.x
// && ./dyn.x
// ```
//
// Note: You must set the target to a density matrix backend for the noise
// to successfully impact the system.

#include <cudaq.h>

// CUDA Quantum supports several different models of noise. In this case,
// we will examine the modeling of decoherence of the qubit state. This
// will occur from "bit flip" errors, wherein the qubit has a user-specified
// probability of undergoing an X-180 rotation.

int main() {

  // We will begin by defining an empty noise model that we will add
  // these decoherence channels to.
  cudaq::noise_model noise;

  // Bit flip channel with `1.0` probability of the qubit flipping 180 degrees.
  cudaq::bit_flip_channel bf(1.);
  // We will apply this channel to any X gate on the qubit, giving each X-gate
  // a probability of `1.0` of undergoing an extra X-gate.
  noise.add_channel<cudaq::types::x>({0}, bf);

  // After the X-gate, the qubit will remain in the |1> state with a probability
  // of `1 - p = 0.0`.
  auto kernel = []() __qpu__ {
    cudaq::qubit q;
    x(q);
    mz(q);
  };

  // Now let's set the noise and we're ready to run the simulation!
  cudaq::set_noise(noise);

  // Our results should show all measurements in the |0> state, indicating
  // that the noise has successfully impacted the system.
  auto noisy_counts = cudaq::sample(kernel);
  noisy_counts.dump();

  // To confirm this, we can run the simulation again without noise.
  // We should now see the qubit in the |1> state.
  cudaq::unset_noise();
  auto noiseless_counts = cudaq::sample(kernel);
  noiseless_counts.dump();
}

The following code illustrates how to run a simulation with phase-flip noise.

// Compile and run with:
// ```
// nvq++ noise_phase_flip.cpp --target density-matrix-cpu -o dyn.x
// && ./dyn.x
// ```
//
// Note: You must set the target to a density matrix backend for the noise
// to successfully impact the system.

#include <cudaq.h>

// CUDA Quantum supports several different models of noise. In this
// case, we will examine the modeling of decoherence of the qubit phase.
// This will occur from "phase flip" errors, wherein the qubit has a
// user-specified probability of undergoing a Z-180 rotation.

int main() {

  // We will begin by defining an empty noise model that we will add
  // our phase flip channel to.
  cudaq::noise_model noise;

  // Phase flip channel with `1.0` probability of the qubit
  // undergoing a phase rotation of 180 degrees (π).
  cudaq::phase_flip_channel pf(1.);
  // We will apply this channel to any Z gate on the qubit.
  // Meaning, after each Z gate on qubit 0, there will be a
  // probability of `1.0` that the qubit undergoes an extra
  // Z rotation.
  noise.add_channel<cudaq::types::z>({0}, pf);

  auto kernel = []() __qpu__ {
    cudaq::qubit q;
    // Place qubit in superposition state.
    h(q);
    // Rotate on Z by 180 degrees.
    z(q);
    // Apply another Hadamard.
    h(q);
    mz(q);
  };

  // Now let's set the noise and we're ready to run the simulation!
  cudaq::set_noise(noise);

  // With noise, our Z-gate will effectively cancel out due
  // to the presence of a phase flip error on the gate with a
  // probability of `1.0`. This will put us back in the |0> state.
  auto noisy_counts = cudaq::sample(kernel);
  noisy_counts.dump();

  // To confirm this, we can run the simulation again without noise.
  // Without noise, we'd expect the qubit to end in the |1> state due
  // to the phase rotation between the two Hadamard gates.
  cudaq::unset_noise();
  auto noiseless_counts = cudaq::sample(kernel);
  noiseless_counts.dump();
}

The following code illustrates how to run a simulation with a custom noise model.

// Compile and run with:
// ```
// nvq++ noise_modeling.cpp --target density-matrix-cpu -o noise.x && ./noise.x
// ```

#include "cudaq.h"

int main() {
  // Define a  kernel
  auto xgate = []() __qpu__ {
    cudaq::qubit q;
    x(q);
    mz(q);
  };

  // Run noise-less simulation
  auto counts = cudaq::sample(xgate);
  counts.dump();

  // Create a depolarizing Kraus channel made up of two Kraus operators.
  cudaq::kraus_channel depol({cudaq::complex{0.99498743710662, 0.0},
                              {0.0, 0.0},
                              {0.0, 0.0},
                              {0.99498743710662, 0.0}},

                             {cudaq::complex{0.0, 0.0},
                              {0.05773502691896258, 0.0},
                              {0.05773502691896258, 0.0},
                              {0.0, 0.0}},

                             {cudaq::complex{0.0, 0.0},
                              {0.0, -0.05773502691896258},
                              {0.0, 0.05773502691896258},
                              {0.0, 0.0}},

                             {cudaq::complex{0.05773502691896258, 0.0},
                              {0.0, 0.0},
                              {0.0, 0.0},
                              {-0.05773502691896258, 0.0}});

  // Create the noise model
  cudaq::noise_model noise;
  // Add the Kraus channel to the x operation on qubit 0.
  noise.add_channel<cudaq::types::x>({0}, depol);

  // Set the noise model
  cudaq::set_noise(noise);

  // Run the noisy simulation
  counts = cudaq::sample(xgate);
  counts.dump();

  // Unset the noise model when done. This is not necessary in this case but is
  // good practice in order to not interfere with future simulations.
  cudaq::unset_noise();
}

Using Quantum Hardware Providers

CUDA Quantum contains support for using a set of hardware providers. For more information about executing quantum kernels on different hardware backends, please take a look at CUDA Quantum Hardware Backends.

The following code illustrates how run kernels on Quantinuum’s backends.

// Compile and run with:
// ```
// nvq++ --target quantinuum --quantinuum-machine H1-2E quantinuum.cpp -o out.x
// ./out.x
// ```
// Assumes a valid set of credentials have been stored.
// To first confirm the correctness of the program locally,
// Add a --emulate to the `nvq++` command above.

#include <cudaq.h>
#include <fstream>

// Define a simple quantum kernel to execute on Quantinuum.
struct ghz {
  // Maximally entangled state between 5 qubits.
  auto operator()() __qpu__ {
    cudaq::qvector q(5);
    h(q[0]);
    for (int i = 0; i < 4; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  // Submit to Quantinuum asynchronously. E.g, continue executing
  // code in the file until the job has been returned.
  auto future = cudaq::sample_async(ghz{});
  // ... classical code to execute in the meantime ...

  // Can write the future to file:
  {
    std::ofstream out("saveMe.json");
    out << future;
  }

  // Then come back and read it in later.
  cudaq::async_result<cudaq::sample_result> readIn;
  std::ifstream in("saveMe.json");
  in >> readIn;

  // Get the results of the read in future.
  auto async_counts = readIn.get();
  async_counts.dump();

  // OR: Submit to Quantinuum synchronously. E.g, wait for the job
  // result to be returned before proceeding.
  auto counts = cudaq::sample(ghz{});
  counts.dump();
}

The following code illustrates how run kernels on IonQ’s backends.

// Compile and run with:
// ```
// nvq++ --target ionq ionq.cpp -o out.x && ./out.x
// ```
// This will submit the job to the IonQ ideal simulator target (default).
// Alternatively, we can enable hardware noise model simulation by specifying
// the `--ionq-noise-model`, e.g.,
// ```
// nvq++ --target ionq --ionq-machine simulator --ionq-noise-model aria-1
// ionq.cpp -o out.x && ./out.x
// ```
// where we set the noise model to mimic the 'aria-1' hardware device.
// Please refer to your IonQ Cloud dashboard for the list of simulator noise
// models.
// Note: `--ionq-machine simulator` is  optional since 'simulator' is the
// default configuration if not provided. Assumes a valid set of credentials
// have been stored.

#include <cudaq.h>
#include <fstream>

// Define a simple quantum kernel to execute on IonQ.
struct ghz {
  // Maximally entangled state between 5 qubits.
  auto operator()() __qpu__ {
    cudaq::qvector q(5);
    h(q[0]);
    for (int i = 0; i < 4; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    auto result = mz(q);
  }
};

int main() {
  // Submit to IonQ asynchronously. E.g, continue executing
  // code in the file until the job has been returned.
  auto future = cudaq::sample_async(ghz{});
  // ... classical code to execute in the meantime ...

  // Can write the future to file:
  {
    std::ofstream out("saveMe.json");
    out << future;
  }

  // Then come back and read it in later.
  cudaq::async_result<cudaq::sample_result> readIn;
  std::ifstream in("saveMe.json");
  in >> readIn;

  // Get the results of the read in future.
  auto async_counts = readIn.get();
  async_counts.dump();

  // OR: Submit to IonQ synchronously. E.g, wait for the job
  // result to be returned before proceeding.
  auto counts = cudaq::sample(ghz{});
  counts.dump();
}