CUDA Quantum in C++¶
Here we seek to get the new user started with some basic C++ examples using CUDA Quantum.
Introduction¶
Welcome to CUDA Quantum! We’re going to take a look at how to construct quantum programs using CUDA Quantum kernel expressions.
CUDA Quantum kernels are any typed callable in the language that is annotated with the __qpu__
attribute. Let’s take a look at a very
simple “Hello World” example, specifically a CUDA Quantum kernel that prepares a GHZ state on a programmer-specified number of qubits.
// Compile and run with:
// nvq++ static_kernel.cpp -o ghz.x && ./ghz.x
#include <cudaq.h>
// Define a CUDA Quantum kernel that is fully specified
// at compile time via templates.
template <std::size_t N>
struct ghz {
auto operator()() __qpu__ {
// Compile-time, std::array-like qreg.
cudaq::qreg<N> q;
h(q[0]);
for (int i = 0; i < N - 1; i++) {
x<cudaq::ctrl>(q[i], q[i + 1]);
}
mz(q);
}
};
int main() {
auto kernel = ghz<10>{};
auto counts = cudaq::sample(kernel);
counts.dump();
// Fine grain access to the bits and counts
for (auto &[bits, count] : counts) {
printf("Observed: %s, %lu\n", bits.data(), count);
}
return 0;
}
Here we see that we can define a custom struct
that is templated on a size_t
parameter.
Our kernel expression is free to use this template parameter in the allocation of a
compile-time-known register of qubits. Within the kernel, we are free to apply various quantum operations,
like a hadamard on qubit 0 h(q[0])
. Controlled operations are modifications of single-qubit
operations, like the x<cudaq::ctrl>(q[0],q[1])
operation to affect a controlled-X. We
can measure single qubits or entire registers.
In this example we are interested in sampling the final state produced by this CUDA Quantum kernel.
To do so, we leverage the generic cudaq::sample
function, which returns a data type
encoding the qubit measurement strings and the corresponding number of times that string
was observed (here the default number of shots is used, 1000
).
To compile and execute this code, we run the following
nvq++ static_kernel.cpp -o ghz.x
./ghz.x
Computing Expectation Values¶
CUDA Quantum provides generic library functions enabling one to compute expectation values of quantum spin operators with respect to a parameterized CUDA Quantum kernel. Let’s take a look at an example of this:
// Compile and run with:
// nvq++ expectation_values.cpp -o d2.x && ./d2.x
#include <cudaq.h>
#include <cudaq/algorithm.h>
// The example here shows a simple use case for the cudaq::observe()
// function in computing expected values of provided spin_ops.
struct ansatz {
auto operator()(double theta) __qpu__ {
cudaq::qreg q(2);
x(q[0]);
ry(theta, q[1]);
x<cudaq::ctrl>(q[1], q[0]);
}
};
int main() {
// Build up your spin op algebraically
using namespace cudaq::spin;
cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
.21829 * z(0) - 6.125 * z(1);
// Observe takes the kernel, the spin_op, and the concrete params for the
// kernel
double energy = cudaq::observe(ansatz{}, h, .59);
printf("Energy is %lf\n", energy);
return 0;
}
Here we define a parameterized CUDA Quantum kernel, a callable type named ansatz
that takes as
input a single angle theta
. This angle is used as part of a single ry
rotation.
In host code, we define a Hamiltonian operator we are interested in via the CUDA Quantun spin_op
type.
CUDA Quantum provides a generic function cudaq::observe
which takes a parameterized
kernel, the spin_op
whose expectation value we wish to compute, and the runtime
parameters at which we evaluate the parameterized kernel.
The return type of this function is an cudaq::observe_result
which contains all the data
from the execution, but is trivially convertible to a double, resulting in the expectation value we are interested in.
To compile and execute this code, we run the following
nvq++ expectation_values.cpp -o exp_vals.x
./exp_vals.x
Multi-control Synthesis¶
Now let’s take a look at how CUDA Quantum allows one to control a general unitary on an arbitrary number of control qubits. For this scenario, our general unitary can be described by another pre-defined CUDA Quantum kernel expression. Let’s take a look at the following example:
// Compile and run with:
// nvq++ multi_controlled_operations.cpp -o ccnot.x && ./ccnot.x
#include <cudaq.h>
#include <cudaq/algorithm.h>
// Here we demonstrate how one might apply multi-controlled
// operations on a general CUDA Quantum kernel.
struct ApplyX {
void operator()(cudaq::qubit &q) __qpu__ { x(q); }
};
struct ccnot_test {
// constrain the signature of the incoming kernel
void operator()(cudaq::takes_qubit auto &&apply_x) __qpu__ {
cudaq::qreg qs(3);
x(qs);
x(qs[1]);
// Control U (apply_x) on the first two qubits of
// the allocated register.
cudaq::control(apply_x, qs.front(2), qs[2]);
mz(qs);
}
};
int main() {
// We can achieve the same thing as above via
// a lambda expression.
auto ccnot = []() __qpu__ {
cudaq::qreg q(3);
x(q);
x(q[1]);
x<cudaq::ctrl>(q[0], q[1], q[2]);
mz(q);
};
auto counts = cudaq::sample(ccnot);
// Fine grain access to the bits and counts
for (auto &[bits, count] : counts) {
printf("Observed: %s, %lu\n", bits.data(), count);
}
auto counts2 = cudaq::sample(ccnot_test{}, ApplyX{});
// Fine grain access to the bits and counts
for (auto &[bits, count] : counts2) {
printf("Observed: %s, %lu\n", bits.data(), count);
}
}
In this example, we show 2 distinct ways for generating a Toffoli operation. The first one in host code
is the definition of a CUDA Quantum lambda that synthesizes a Toffoli via the general multi-control functionality
for any single-qubit quantum operation x<cudaq::ctrl>(q[0], q[1], q[2])
.
The second way to generate a Toffoli operation starts with a kernel that takes another kernel as input.
CUDA Quantum exposes a way to synthesize a control on any general unitary described as another kernel -
the cudaq::control()
call. Here we take as input a kernel that applies an X operation to
the given qubit. Within the control
call, we specify two control qubits, and the final target qubit.
This call requires trailing parameters that serve as the arguments for the applied kernel (apply_x
takes
a single target qubit).
To compile and execute this code, we run the following
nvq++ multi_controlled_operations.cpp -o mcx.x
./mcx.x
Simulations with cuQuantum¶
CUDA Quantum provides native support for cuQuantum-accelerated state vector and tensor network simulations. Let’s take a look at an example that is too large for a standard CPU-only simulator, but can be trivially simulated via a NVIDIA GPU-accelerated backend:
// Compile and run with:
// nvq++ cuquantum_backends.cpp -o dyn.x -qpu cuquantum && ./dyn.x
// This example is meant to demonstrate the cuQuantum
// target and its ability to easily handle a larger number
// of qubits compared the CPU-only backend.
// Without the `-qpu cuquantum` flag, this seems to hang, i.e.
// it takes a long time for the CPU-only backend to handle
// this number of qubits.
#include <cudaq.h>
// Define a quantum kernel with a runtime parameter
struct ghz {
auto operator()(const int N) __qpu__ {
// Dynamic, vector-like qreg
cudaq::qreg q(N);
h(q[0]);
for (int i = 0; i < N - 1; i++) {
x<cudaq::ctrl>(q[i], q[i + 1]);
}
mz(q);
}
};
int main() {
auto counts = cudaq::sample(ghz{}, 30);
counts.dump();
// Fine grain access to the bits and counts
for (auto &[bits, count] : counts) {
printf("Observed: %s, %lu\n", bits.data(), count);
}
return 0;
}
Here we generate a GHZ state on 30 qubits. To run with the built-in cuQuantum state
vector support, we pass the --qpu cuquantum
flag at compile time:
nvq++ --qpu cuquantum cuquantum_backends.cpp -o ghz.x
./ghz.x