12. Quantum Algorithmic Primitives¶
[1] The general philosophy of the CUDA-Q specification is that quantum device code should be encapsulated as stand-alone callable instances of generic signature, and that operations or primitive algorithms targeting a quantum coprocessor be implemented as adaptors on those callable instances. Adaptors, by definition, are generic functions that take any quantum kernel as input along with the runtime arguments for that kernel. Runtime arguments passed to adaptor functions flow through the adaptor to the provided kernel. This pattern allows general pre- and post-processing around concrete kernel execution.
12.1. cudaq::sample
¶
[1] A common task for near-term quantum execution is to sample the state of a given quantum circuit for a specified number of shots (circuit executions). The result of this task is typically a mapping of observed measurement bit strings to the number of times each was observed. This is typically termed the counts dictionary in the community.
[2] The CUDA-Q model enables this functionality via template functions within the
cudaq
namespace with the following structure:
template <typename ReturnType>
concept HasVoidReturnType = std::is_void_v<ReturnType>;
// Kernel only
template<typename QuantumKernel, typename... Args>
requires HasVoidReturnType<QuantumKernel, Args...>
sample_result sample(QuantumKernel&& kernel, Args&&... args);
// Specify shots
template<typename QuantumKernel, typename... Args>
requires HasVoidReturnType<QuantumKernel, Args...>
sample_result sample(std::size_t shots, QuantumKernel&& kernel, Args&&... args);
// Specify sample options (including shots and noise model)
template<typename QuantumKernel, typename... Args>
requires HasVoidReturnType<QuantumKernel, Args...>
sample_result sample(const sample_options &options,
QuantumKernel&& kernel, Args&&... args);
[3] This function takes as input a quantum kernel instance followed by the
concrete arguments at which the kernel should be invoked. CUDA-Q kernels
passed to this function must be entry-point kernels and return void
.
[4] Overloaded functions exist for specifying the number of shots to sample and the noise model to apply.
[5] The function returns an instance of the cudaq::sample_result
type which encapsulates
the counts dictionary produced by the sampling task. Programmers can
extract the result information in the following manner:
auto bell = []() __qpu__ { ... };
// Sample the state generated by bel
auto counts = cudaq::sample(bell)
// Print to standard out
counts.dump();
// Fine-grained access to the bits and counts
for (auto& [bits, count] : counts) {
printf("Observed: %s, %lu\n", bits, count);
}
@cudaq.kernel()
def bell():
...
# Sample the state generated by bell
counts = cudaq.sample(bell)
# Print to standard out
counts.dump()
# Fine-grained access to the bits and counts
for bits, count in counts:
print('Observed: {}, {}'.format(bits, count))
[6] CUDA-Q specifies the following structure for cudaq::sample_result
:
namespace cudaq {
using CountsDictionary = std::unordered_map<std::string, std::size_t>;
inline static const std::string GlobalRegisterName = "__global__";
class sample_result {
public:
sample_result() = default;
sample_result(const sample_result &);
~sample_result();
std::vector<std::string> register_names();
std::size_t count(std::string_view bitString,
const std::string_view registerName = GlobalRegisterName);
std::vector<std::string>
sequential_data(const std::string_view registerName = GlobalRegisterName);
CountsDictionary
to_map(const std::string_view registerName = GlobalRegisterName);
sample_result
get_marginal(const std::vector<std::size_t> &&marginalIndices,
const std::string_view registerName = GlobalRegisterName);
double expectation(const std::string_view registerName == GlobalRegisterName);
double probability(std::string_view bitString, const std::string_view registerName == GlobalRegisterName);
std::size_t size(const std::string_view registerName == GlobalRegisterName);
void dump();
void clear();
CountsDictionary::iterator begin();
CountsDictionary::iterator end();
};
}
[7] The sample_result
type enables one to encode measurement results from a
quantum circuit sampling task. It keeps track of a list of sample results, each
one corresponding to a measurement action during the sampling process and represented
by a unique register name. It also tracks a unique global register, the implicit sampling
of the state at the end of circuit execution. The API gives fine-grain access
to the measurement results for each register. To illustrate this, observe
auto kernel = []() __qpu__ {
cudaq::qubit q;
h(q);
auto reg1 = mz(q);
reset (q);
x(q);
};
cudaq::sample(kernel).dump();
@cudaq.kernel()
def kernel():
q = cudaq.qubit()
h(q)
reg1 = mz(q)
reset(q)
x(q)
cudaq.sample(kernel).dump()
should produce
{
__global__ : { 1:1000 }
reg1 : { 0:501 1:499 }
}
Here we see that we have measured a qubit in a uniform superposition to a
register named reg1
, and followed it with a reset and the application
of an NOT operation. The sample_result
returned for this sampling
tasks contains the default __global__
register as well as the user
specified reg1
register.
The contents of the __global__
register will depend on how your kernel
is written:
If no measurements appear in the kernel, then the
__global__
register is formed with implicit measurements being added for all the qubits defined in the kernel, and the measurements all occur at the end of the kernel. The order of the bits in the bitstring corresponds to the qubit allocation order specified in the kernel. That is - the[0]
element in the__global__
bitstring corresponds with the first declared qubit in the kernel. For example,
auto kernel = []() __qpu__ {
cudaq::qubit a, b;
x(a);
};
cudaq::sample(kernel).dump();
@cudaq.kernel
def kernel():
a, b = cudaq.qubit(), cudaq.qubit()
x(a)
cudaq.sample(kernel).dump()
should produce
{ __global__ : { 10:1000 } }
Conversely, if any measurements appear in the kernel, then only the measured qubits will appear in the
__global__
register. Similar to #1, the bitstring corresponds to the qubit allocation order specified in the kernel. Also (again, similar to #1), the values of the sampled qubits always correspond to the values at the end of the kernel execution. That is - if a qubit is measured in the middle of a kernel and subsequent operations change the state of the qubit, the qubit will be implicitly re-measured at the end of the kernel, and that re-measured value is the value that will appear in the__global__
register. For example,
auto kernel = []() __qpu__ {
cudaq::qubit a, b;
x(a);
mz(b);
mz(a);
};
cudaq::sample(kernel).dump();
@cudaq.kernel
def kernel():
a, b = cudaq.qubit(), cudaq.qubit()
x(a)
mz(b)
mz(a)
cudaq.sample(kernel).dump()
should produce
{ __global__ : { 10:1000 } }
Note
If you don’t specify any measurements in your kernel and allow the nvq++
compiler to perform passes that introduce ancilla qubits into your kernel, it
may be difficult to discern which qubits are the ancilla qubits vs which ones
are your qubits. In this case, it is recommended that you provide explicit
measurements in your kernel in order to only receive measurements from your
qubits and silently discard the measurements from the ancillary qubits.
[8] The API exposed by the sample_result
data type allows one to extract
the information contained at a variety of levels and for each available
register name. One can get the number of times a bit string was observed via
sample_result::count
, extract a std::unordered_map
representation via
sample_result::to_map
, get a new sample_result
instance over a subset of
measured qubits via sample_result::get_marginal
, and extract the
measurement data as it was produced sequentially (a vector of bit string observations
for each shot in the sampling process). One can also compute probabilities and expectation
values.
[9] There are specific requirements on input quantum kernels for the use of the
sample function which must be enforced by compiler implementations. The kernel
must be an entry-point kernel that returns void
.
[10] CUDA-Q also provides an asynchronous version of this function
(cudaq::sample_async
) which returns a sample_async_result
.
template<typename QuantumKernel, typename... Args>
async_sample_result sample_async(const std::size_t qpu_id, QuantumKernel&& kernel, Args&&... args);
Programmers can asynchronously launch sampling tasks on any qpu_id
.
[11] The async_sample_result
wraps a std::future<sample_result>
and exposes the same
get()
functionality to extract the results after asynchronous execution.
[12] For remote QPU systems with long queue times, the async_sample_result
type encodes job ID
information and can be persisted to file and loaded from file at a later time. After loading from file,
and when remote queue jobs are completed, one can invoke get()
and the results will
be retrieved and returned.
12.2. cudaq::observe
¶
[1] A common task in variational algorithms is the computation of the expected value of a given observable with respect to a parameterized quantum circuit (\(\langle H \rangle(𝚹) = \langle \psi(𝚹)|H|\psi(𝚹) \rangle\)).
[2] The cudaq::observe
function is provided to enable one to quickly compute
this expectation value via execution of the parameterized quantum circuit
with repeated measurements in the bases of the provided spin_op
terms. The
function has the following signature:
// Kernel only
template<typename QuantumKernel, typename... Args>
observe_result observe(QuantumKernel&&, cudaq::spin_op&, Args&&... args);
// Specify shots
template<typename QuantumKernel, typename... Args>
observe_result observe(std::size_t shots, QuantumKernel&&, cudaq::spin_op&, Args&&... args);
// Specify sample options (including shots and noise model)
template<typename QuantumKernel, typename... Args>
observe_result observe(const cudaq::observe_options &options,
QuantumKernel&&, cudaq::spin_op&, Args&&... args);
[3] cudaq::observe
takes as input an instantiated quantum kernel, the
cudaq::spin_op
whose expectation is requested, and the concrete
arguments used as input to the parameterized quantum kernel.
[4] cudaq::observe
returns an instance of the observe_result
type which can be implicitly
converted to a double
expectation value, but also retains all data directly
generated and used as part of that expectation value computation. The
observe_result
takes on the following form:
class observe_result {
public:
observe_result(double &e, spin_op &H);
observe_result(double &e, spin_op &H, MeasureCounts counts);
sample_results raw_data() { return data; };
operator double();
double expectation();
template <typename SpinOpType>
double expectation(SpinOpType term);
template <typename SpinOpType>
sample_result counts(SpinOpType term);
double id_coefficient()
void dump();
};
[5] The public API for observe_result
enables one to extract the
sample_result
data for each term in the provided spin_op
.
This return type can be used in the following way.
// I only care about the expected value, discard
// the fine-grain data produced
double expVal = cudaq::observe(kernel, spinOp, args...);
// I require the result with all generated data
auto result = cudaq::observe(kernel, spinOp, args...);
auto expVal = result.expectation();
auto X0X1Exp = result.expectation(x(0)*x(1));
auto X0X1Data = result.counts(x(0)*x(1));
result.dump();
# I require the result with all generated data
result = cudaq::observe(kernel, spinOp, *args)
expVal = result.expectation()
X0X1Exp = result.expectation(x(0)*x(1))
X0X1Data = result.counts(x(0)*x(1))
result.dump()
Here is an example of the utility of the cudaq::observe
function:
struct ansatz {
auto operator()(double theta) __qpu__ {
cudaq::qarray<2> q;
x(q[0]);
ry(theta, q[1]);
x<cudaq::ctrl>(q[1], q[0]);
}
};
int main() {
using namespace cudaq::spin; // make it easier to use pauli X,Y,Z below
spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
.21829 * z(0) - 6.125 * z(1);
double energy = cudaq::observe(ansatz{}, h, .59);
printf("Energy is %lf\n", energy);
return 0;
}
@cudaq.kernel()
def ansatz(theta : float):
q = cudaq.qvector(2)
x(q[0])
ry(theta, q[1])
x.ctrl(q[1], q[0])
h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
.21829 * z(0) - 6.125 * z(1)
energy = cudaq.observe(ansatz, h, .59).expectation()
print('Energy is {}'.format(energy))
[5] There are specific requirements on input quantum kernels for the use of the observe function which must be enforced by compiler implementations. The kernel must be an entry-point kernel that does not contain any conditional or measurement statements.
[6] By default on simulation backends, cudaq::observe
computes the true
analytic expectation value (i.e. without stochastic noise due to shots-based sampling).
If a specific shot count is provided then the returned expectation value will contain some
level of statistical noise. Overloaded observe
functions are provided to
specify the number of shots and/or specify the noise model to apply.
[7] CUDA-Q also provides an asynchronous version of this function
(cudaq::observe_async
) which returns a async_observe_result
.
template<typename QuantumKernel, typename... Args>
async_observe_result observe_async(const std::size_t qpu_id, QuantumKernel&& kernel, cudaq::spin_op&, Args&&... args);
Programmers can asynchronously launch sampling tasks on any qpu_id
.
[8] For remote QPU systems with long queue times, the async_observe_result
type encodes job ID
information for each execution and can be persisted to file and loaded from file at a later time. After loading from file,
and when remote queue jobs are completed, one can invoke get()
and the results will
be retrieved and returned.
12.3. cudaq::optimizer
(deprecated, functionality moved to CUDA-Q libraries)¶
The primary use case for cudaq::observe
is to leverage it as
the core of a broader objective function optimization workflow.
cudaq::observe
produces the expected value of a specified
spin_op
with respect to a given parameterized ansatz at a concrete
set of parameters, and often programmers will require an extremal value of that expected value
at a specific set of concrete parameters. This will directly require
abstractions for gradient-based and gradient-free optimization strategies.
The CUDA-Q model provides a cudaq::optimizer
data type that exposes
an optimize()
method that takes as input an
optimizable_function
to optimize and the number of independent
function dimensions. Implementations are free to implement this abstraction
in any way that is pertinent, but it is expected that most approaches will
enable optimization strategy extensibility. For example, programmers should
be able to instantiate a specific cudaq::optimizer
sub-type, thereby
dictating the underlying optimization algorithm in a type-safe manner.
Moreover, the optimizer should expose a public API of pertinent optimizer-specific
options that the programmer can customize.
CUDA-Q models the cudaq::optimizer
as follows:
namespace cudaq {
// Encode the optimal value and optimal parameters
using optimization_result = std::tuple<double, std::vector<double>>;
// Initialized with user specified callable of a specific signature
// Clients can query if the function computes gradients or not
class optimizable_function {
public:
template<typename Callable>
optimizable_function(Callable&&);
bool providesGradients() { return _providesGradients; }
double operator()(const std::vector<double> &x, std::vector<double> &dx);
};
class optimizer {
public:
virtual bool requiresGradients() = 0;
virtual optimization_result optimize(const int dimensions,
optimizable_function&& opt_function) = 0;
};
}
Here, optimization_result
should encode the optimal value and optimal
parameters achieved during the optimization workflow
(i.e. a tuple<double, std::vector<double>>
). The optimize method takes
as input the number of parameters (or dimensions of the objective function),
and a function-like object (i.e. std::function
or a lambda, something
optimizable_function
can be constructed from) that takes a
const std::vector<double>&
and std::vector<double>&
for the
function input parameters and gradient vector, respectively. The objective
function must return a double representing the scalar cost for the
objective function (e.g. the expected value from cudaq::observe()
).
Here is an example of how the cudaq::optimizer
is intended to be used:
auto ansatz = [](double theta, double phi) __qpu__ {...};
cudaq::spin_op H = ... ;
cudaq::optimizers::cobyla optimizer;
optimizer.max_eval = 200;
auto [opt_energy, opt_params] = optimizer.optimize(
2, [&](const std::vector<double> &x, std::vector<double> &grad_vec) {
return cudaq::observe(ansatz, H, x[0], x[1]);
});
12.4. cudaq::gradient
(deprecated, functionality moved to CUDA-Q libraries)¶
Typical optimization use cases will require the computation of gradients for the specified objective function. The gradient is a vector over all ansatz circuit parameters \(∂H(𝚹) / ∂𝚹_i\). There are a number of potential strategies for computing this gradient vector, but most require additional evaluations of the ansatz circuit on the quantum processor.
To enable true extensibility in gradient strategies, CUDA-Q programmers can
instantiate custom sub-types of the cudaq::gradient
type. The cudaq::gradient
type defines a compute(...)
method that takes a mutable reference to the
current gradient vector and is free to update that vector in a strategy-specific way.
The method also takes the current evaluation parameter vector, the cudaq::spin_op
used
in the current variational task, and the computed expected value at the given parameters.
The gradient strategy type takes the following form:
namespace cudaq {
class gradient {
public:
gradient(std::function<void(std::vector<double>)> &&kernel);
template <typename QuantumKernel, typename ArgsMapper>
gradient(QuantumKernel &&kernel, ArgsMapper &&argsMapper);
virtual void compute(std::vector<double>& x, std::vector<double> &dx,
spin_op& h, double exp_h) = 0;
virtual std::vector<double>
compute(const std::vector<double> &x,
std::function<double(std::vector<double>)> &func) = 0;
};
// gradient is intended for subclassing
class central_difference : public gradient {
public:
void compute(std::vector<double>& x, std::vector<double> &dx, spin_op& h,
double exp_h) override { ... }
};
}
The compute
function can make use of the quantum kernel parameterized ansatz, the
spin_op
for which the expected value is being computed, the
pre-computed expected value at the current iteration’s parameter, and the
concrete arguments for the given quantum kernel at this iteration.
A non-trivial aspect of the computation of gradients (in an extensible manner)
is that we model the gradient as a derivative over concrete parameters for the
circuit ansatz represented as a std::vector<double>
when the actual
quantum kernel may be defined with general variadic Args...
types.
To address this issue, programmers can provide a default translation
mechanism for mapping common quantum kernel ansatz functional expressions to a vector<double>
representation - the
ArgsMapper
callable template type. This type must implement the
std::tuple<Args...>(std::vector<double>&)
callable concept.
The overall CUDA-Q workflow for leveraging the cudaq::optimizer
will work as follows (here we demonstrate with an ansatz without the
default std::vector<double>
signature):
auto deuteron_n3_ansatz = [](double x0, double x1) __qpu__ {
cudaq::qarray<3> q;
x(q[0]);
ry(x0, q[1]);
ry(x1, q[2]);
x<cudaq::ctrl>(q[2], q[0]);
x<vctrl>(q[0], q[1]);
ry(-x0, q[1]);
x<cudaq::ctrl>(q[0], q[1]);
x<cudaq::ctrl>(q[1], q[0]);
};
cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
.21829 * z(0) - 6.125 * z(1);
cudaq::spin_op h3 = h + 9.625 - 9.625 * z(2) - 3.913119 * x(1) * x(2) -
3.913119 * y(1) * y(2);
// The above ansatz takes 2 doubles, not a single std::vector<double>, which
// the gradient type is expecting. So we must provide an ArgsMapper callable type
auto argsMapper = [](std::vector<double> x) {return std::make_tuple(x[0],x[1]);};
// Create the gradient strategy
cudaq::gradients::central_difference gradient(deuteron_n3_ansatz, argsMapper);
// Create the L-BFGS optimizer, requires gradients
cudaq::optimizers::lbfgs optimizer;
// Run the optimization routine.
auto [min_val, opt_params] = optimizer.optimize(
2, [&](const std::vector<double>& x, std::vector<double>& grad_vec) {
// Compute the cost, here its an energy
auto cost = cudaq::observe(deuteron_n3_ansatz, h3, x);
// Compute the gradient, results written to the grad_vec reference
gradient.compute(x, grad_vec, h3, cost);
// Return the cost to the optimizer
return cost;
});
// Print the results
printf("Optimizer found %lf at [%lf,%lf]\n", min_val, opt_params[0], opt_params[1]);