Quantum Algorithmic Primitives ****************************** **[1]** The general philosophy of the CUDA-Q specification is that quantum device code should be encapsulated as stand-alone callable instances of generic signature, and that operations or primitive algorithms targeting a quantum coprocessor be implemented as adaptors on those callable instances. Adaptors, by definition, are generic functions that take any quantum kernel as input along with the runtime arguments for that kernel. Runtime arguments passed to adaptor functions flow through the adaptor to the provided kernel. This pattern allows general pre- and post-processing around concrete kernel execution. .. _cudaq-sample-spec: :code:`cudaq::sample` ------------------------- **[1]** A common task for near-term quantum execution is to sample the state of a given quantum circuit for a specified number of shots (circuit executions). The result of this task is typically a mapping of observed measurement bit strings to the number of times each was observed. This is typically termed the counts dictionary in the community. **[2]** The CUDA-Q model enables this functionality via template functions within the :code:`cudaq` namespace with the following structure: .. code-block:: cpp template concept HasVoidReturnType = std::is_void_v; // Kernel only template requires HasVoidReturnType sample_result sample(QuantumKernel&& kernel, Args&&... args); // Specify shots template requires HasVoidReturnType sample_result sample(std::size_t shots, QuantumKernel&& kernel, Args&&... args); // Specify sample options (including shots and noise model) template requires HasVoidReturnType sample_result sample(const sample_options &options, QuantumKernel&& kernel, Args&&... args); **[3]** This function takes as input a quantum kernel instance followed by the concrete arguments at which the kernel should be invoked. CUDA-Q kernels passed to this function must be entry-point kernels and return :code:`void`. **[4]** Overloaded functions exist for specifying the number of shots to sample and the noise model to apply. **[5]** The function returns an instance of the :code:`cudaq::sample_result` type which encapsulates the counts dictionary produced by the sampling task. Programmers can extract the result information in the following manner: .. tab:: C++ .. code-block:: cpp auto bell = []() __qpu__ { ... }; // Sample the state generated by bel auto counts = cudaq::sample(bell) // Print to standard out counts.dump(); // Fine-grained access to the bits and counts for (auto& [bits, count] : counts) { printf("Observed: %s, %lu\n", bits, count); } .. tab:: Python .. code-block:: python @cudaq.kernel() def bell(): ... # Sample the state generated by bell counts = cudaq.sample(bell) # Print to standard out counts.dump() # Fine-grained access to the bits and counts for bits, count in counts: print('Observed: {}, {}'.format(bits, count)) **[6]** CUDA-Q specifies the following structure for :code:`cudaq::sample_result`: .. code-block:: cpp namespace cudaq { using CountsDictionary = std::unordered_map; inline static const std::string GlobalRegisterName = "__global__"; class sample_result { public: sample_result() = default; sample_result(const sample_result &); ~sample_result(); std::vector register_names(); std::size_t count(std::string_view bitString, const std::string_view registerName = GlobalRegisterName); std::vector sequential_data(const std::string_view registerName = GlobalRegisterName); CountsDictionary to_map(const std::string_view registerName = GlobalRegisterName); sample_result get_marginal(const std::vector &&marginalIndices, const std::string_view registerName = GlobalRegisterName); double expectation(const std::string_view registerName == GlobalRegisterName); double probability(std::string_view bitString, const std::string_view registerName == GlobalRegisterName); std::size_t size(const std::string_view registerName == GlobalRegisterName); void dump(); void clear(); CountsDictionary::iterator begin(); CountsDictionary::iterator end(); }; } **[7]** The :code:`sample_result` type enables one to encode measurement results from a quantum circuit sampling task. It keeps track of a list of sample results, each one corresponding to a measurement action during the sampling process and represented by a unique register name. It also tracks a unique global register, the implicit sampling of the state at the end of circuit execution. The API gives fine-grain access to the measurement results for each register. To illustrate this, observe .. tab:: C++ .. code-block:: cpp auto kernel = []() __qpu__ { cudaq::qubit q; h(q); auto reg1 = mz(q); reset (q); x(q); }; cudaq::sample(kernel).dump(); .. tab:: Python .. code-block:: python @cudaq.kernel() def kernel(): q = cudaq.qubit() h(q) reg1 = mz(q) reset(q) x(q) cudaq.sample(kernel).dump() should produce .. code-block:: bash { __global__ : { 1:1000 } reg1 : { 0:501 1:499 } } Here we see that we have measured a qubit in a uniform superposition to a register named :code:`reg1`, and followed it with a reset and the application of an NOT operation. The :code:`sample_result` returned for this sampling tasks contains the default :code:`__global__` register as well as the user specified :code:`reg1` register. The contents of the :code:`__global__` register will depend on how your kernel is written: 1. If no measurements appear in the kernel, then the :code:`__global__` register is formed with implicit measurements being added for *all* the qubits defined in the kernel, and the measurements all occur at the end of the kernel. The order of the bits in the bitstring corresponds to the qubit allocation order specified in the kernel. That is - the :code:`[0]` element in the :code:`__global__` bitstring corresponds with the first declared qubit in the kernel. For example, .. tab:: C++ .. code-block:: cpp auto kernel = []() __qpu__ { cudaq::qubit a, b; x(a); }; cudaq::sample(kernel).dump(); .. tab:: Python .. code-block:: python @cudaq.kernel def kernel(): a, b = cudaq.qubit(), cudaq.qubit() x(a) cudaq.sample(kernel).dump() should produce .. code-block:: bash { __global__ : { 10:1000 } } 2. Conversely, if any measurements appear in the kernel, then only the measured qubits will appear in the :code:`__global__` register. Similar to #1, the bitstring corresponds to the qubit allocation order specified in the kernel. Also (again, similar to #1), the values of the sampled qubits always correspond to the values *at the end of the kernel execution*. That is - if a qubit is measured in the middle of a kernel and subsequent operations change the state of the qubit, the qubit will be implicitly re-measured at the end of the kernel, and that re-measured value is the value that will appear in the :code:`__global__` register. For example, .. tab:: C++ .. code-block:: cpp auto kernel = []() __qpu__ { cudaq::qubit a, b; x(a); mz(b); mz(a); }; cudaq::sample(kernel).dump(); .. tab:: Python .. code-block:: python @cudaq.kernel def kernel(): a, b = cudaq.qubit(), cudaq.qubit() x(a) mz(b) mz(a) cudaq.sample(kernel).dump() should produce .. code-block:: bash { __global__ : { 10:1000 } } .. note:: If you don't specify any measurements in your kernel and allow the :code:`nvq++` compiler to perform passes that introduce ancilla qubits into your kernel, it may be difficult to discern which qubits are the ancilla qubits vs which ones are your qubits. In this case, it is recommended that you provide explicit measurements in your kernel in order to only receive measurements from your qubits and silently discard the measurements from the ancillary qubits. **[8]** The API exposed by the :code:`sample_result` data type allows one to extract the information contained at a variety of levels and for each available register name. One can get the number of times a bit string was observed via :code:`sample_result::count`, extract a `std::unordered_map` representation via :code:`sample_result::to_map`, get a new :code:`sample_result` instance over a subset of measured qubits via :code:`sample_result::get_marginal`, and extract the measurement data as it was produced sequentially (a vector of bit string observations for each shot in the sampling process). One can also compute probabilities and expectation values. **[9]** There are specific requirements on input quantum kernels for the use of the sample function which must be enforced by compiler implementations. The kernel must be an entry-point kernel that returns :code:`void`. **[10]** CUDA-Q also provides an asynchronous version of this function (:code:`cudaq::sample_async`) which returns a :code:`sample_async_result`. .. code-block:: cpp template async_sample_result sample_async(const std::size_t qpu_id, QuantumKernel&& kernel, Args&&... args); Programmers can asynchronously launch sampling tasks on any :code:`qpu_id`. **[11]** The :code:`async_sample_result` wraps a :code:`std::future` and exposes the same :code:`get()` functionality to extract the results after asynchronous execution. **[12]** For remote QPU systems with long queue times, the :code:`async_sample_result` type encodes job ID information and can be persisted to file and loaded from file at a later time. After loading from file, and when remote queue jobs are completed, one can invoke :code:`get()` and the results will be retrieved and returned. :code:`cudaq::observe` ------------------------- **[1]** A common task in variational algorithms is the computation of the expected value of a given observable with respect to a parameterized quantum circuit (:math:`\langle H \rangle(𝚹) = \langle \psi(𝚹)|H|\psi(𝚹) \rangle`). **[2]** The :code:`cudaq::observe` function is provided to enable one to quickly compute this expectation value via execution of the parameterized quantum circuit with repeated measurements in the bases of the provided :code:`spin_op` terms. The function has the following signature: .. code-block:: cpp // Kernel only template observe_result observe(QuantumKernel&&, cudaq::spin_op&, Args&&... args); // Specify shots template observe_result observe(std::size_t shots, QuantumKernel&&, cudaq::spin_op&, Args&&... args); // Specify sample options (including shots and noise model) template observe_result observe(const cudaq::observe_options &options, QuantumKernel&&, cudaq::spin_op&, Args&&... args); **[3]** :code:`cudaq::observe` takes as input an instantiated quantum kernel, the :code:`cudaq::spin_op` whose expectation is requested, and the concrete arguments used as input to the parameterized quantum kernel. **[4]** :code:`cudaq::observe` returns an instance of the :code:`observe_result` type which can be implicitly converted to a :code:`double` expectation value, but also retains all data directly generated and used as part of that expectation value computation. The :code:`observe_result` takes on the following form: .. code-block:: cpp class observe_result { public: observe_result(double &e, spin_op &H); observe_result(double &e, spin_op &H, MeasureCounts counts); sample_results raw_data() { return data; }; operator double(); double expectation(); template double expectation(SpinOpType term); template sample_result counts(SpinOpType term); double id_coefficient() void dump(); }; **[5]** The public API for :code:`observe_result` enables one to extract the :code:`sample_result` data for each term in the provided :code:`spin_op`. This return type can be used in the following way. .. tab:: C++ .. code-block:: cpp // I only care about the expected value, discard // the fine-grain data produced double expVal = cudaq::observe(kernel, spinOp, args...); // I require the result with all generated data auto result = cudaq::observe(kernel, spinOp, args...); auto expVal = result.expectation(); auto X0X1Exp = result.expectation(x(0)*x(1)); auto X0X1Data = result.counts(x(0)*x(1)); result.dump(); .. tab:: Python .. code-block:: python # I require the result with all generated data result = cudaq::observe(kernel, spinOp, *args) expVal = result.expectation() X0X1Exp = result.expectation(x(0)*x(1)) X0X1Data = result.counts(x(0)*x(1)) result.dump() Here is an example of the utility of the :code:`cudaq::observe` function: .. tab:: C++ .. code-block:: cpp struct ansatz { auto operator()(double theta) __qpu__ { cudaq::qarray<2> q; x(q[0]); ry(theta, q[1]); x(q[1], q[0]); } }; int main() { using namespace cudaq::spin; // make it easier to use pauli X,Y,Z below spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) + .21829 * z(0) - 6.125 * z(1); double energy = cudaq::observe(ansatz{}, h, .59); printf("Energy is %lf\n", energy); return 0; } .. tab:: Python .. code-block:: python @cudaq.kernel() def ansatz(theta : float): q = cudaq.qvector(2) x(q[0]) ry(theta, q[1]) x.ctrl(q[1], q[0]) h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) + .21829 * z(0) - 6.125 * z(1) energy = cudaq.observe(ansatz, h, .59).expectation() print('Energy is {}'.format(energy)) **[5]** There are specific requirements on input quantum kernels for the use of the observe function which must be enforced by compiler implementations. The kernel must be an entry-point kernel that does not contain any conditional or measurement statements. **[6]** By default on simulation backends, :code:`cudaq::observe` computes the true analytic expectation value (i.e. without stochastic noise due to shots-based sampling). If a specific shot count is provided then the returned expectation value will contain some level of statistical noise. Overloaded :code:`observe` functions are provided to specify the number of shots and/or specify the noise model to apply. **[7]** CUDA-Q also provides an asynchronous version of this function (:code:`cudaq::observe_async`) which returns a :code:`async_observe_result`. .. code-block:: cpp template async_observe_result observe_async(const std::size_t qpu_id, QuantumKernel&& kernel, cudaq::spin_op&, Args&&... args); Programmers can asynchronously launch sampling tasks on any :code:`qpu_id`. **[8]** For remote QPU systems with long queue times, the :code:`async_observe_result` type encodes job ID information for each execution and can be persisted to file and loaded from file at a later time. After loading from file, and when remote queue jobs are completed, one can invoke :code:`get()` and the results will be retrieved and returned. :code:`cudaq::optimizer` (deprecated, functionality moved to CUDA-Q libraries) ------------------------------------------------------------------------------------ The primary use case for :code:`cudaq::observe` is to leverage it as the core of a broader objective function optimization workflow. :code:`cudaq::observe` produces the expected value of a specified :code:`spin_op` with respect to a given parameterized ansatz at a concrete set of parameters, and often programmers will require an extremal value of that expected value at a specific set of concrete parameters. This will directly require abstractions for gradient-based and gradient-free optimization strategies. The CUDA-Q model provides a :code:`cudaq::optimizer` data type that exposes an :code:`optimize()` method that takes as input an :code:`optimizable_function` to optimize and the number of independent function dimensions. Implementations are free to implement this abstraction in any way that is pertinent, but it is expected that most approaches will enable optimization strategy extensibility. For example, programmers should be able to instantiate a specific :code:`cudaq::optimizer` sub-type, thereby dictating the underlying optimization algorithm in a type-safe manner. Moreover, the optimizer should expose a public API of pertinent optimizer-specific options that the programmer can customize. CUDA-Q models the :code:`cudaq::optimizer` as follows: .. code-block:: cpp namespace cudaq { // Encode the optimal value and optimal parameters using optimization_result = std::tuple>; // Initialized with user specified callable of a specific signature // Clients can query if the function computes gradients or not class optimizable_function { public: template optimizable_function(Callable&&); bool providesGradients() { return _providesGradients; } double operator()(const std::vector &x, std::vector &dx); }; class optimizer { public: virtual bool requiresGradients() = 0; virtual optimization_result optimize(const int dimensions, optimizable_function&& opt_function) = 0; }; } Here, :code:`optimization_result` should encode the optimal value and optimal parameters achieved during the optimization workflow (i.e. a :code:`tuple>`). The optimize method takes as input the number of parameters (or dimensions of the objective function), and a function-like object (i.e. :code:`std::function` or a lambda, something :code:`optimizable_function` can be constructed from) that takes a :code:`const std::vector&` and :code:`std::vector&` for the function input parameters and gradient vector, respectively. The objective function must return a double representing the scalar cost for the objective function (e.g. the expected value from :code:`cudaq::observe()`). Here is an example of how the :code:`cudaq::optimizer` is intended to be used: .. code-block:: cpp auto ansatz = [](double theta, double phi) __qpu__ {...}; cudaq::spin_op H = ... ; cudaq::optimizers::cobyla optimizer; optimizer.max_eval = 200; auto [opt_energy, opt_params] = optimizer.optimize( 2, [&](const std::vector &x, std::vector &grad_vec) { return cudaq::observe(ansatz, H, x[0], x[1]); }); :code:`cudaq::gradient` (deprecated, functionality moved to CUDA-Q libraries) ----------------------------------------------------------------------------------- Typical optimization use cases will require the computation of gradients for the specified objective function. The gradient is a vector over all ansatz circuit parameters :math:`∂H(𝚹) / ∂𝚹_i`. There are a number of potential strategies for computing this gradient vector, but most require additional evaluations of the ansatz circuit on the quantum processor. To enable true extensibility in gradient strategies, CUDA-Q programmers can instantiate custom sub-types of the :code:`cudaq::gradient` type. The :code:`cudaq::gradient` type defines a :code:`compute(...)` method that takes a mutable reference to the current gradient vector and is free to update that vector in a strategy-specific way. The method also takes the current evaluation parameter vector, the :code:`cudaq::spin_op` used in the current variational task, and the computed expected value at the given parameters. The gradient strategy type takes the following form: .. code-block:: cpp namespace cudaq { class gradient { public: gradient(std::function)> &&kernel); template gradient(QuantumKernel &&kernel, ArgsMapper &&argsMapper); virtual void compute(std::vector& x, std::vector &dx, spin_op& h, double exp_h) = 0; virtual std::vector compute(const std::vector &x, std::function)> &func) = 0; }; // gradient is intended for subclassing class central_difference : public gradient { public: void compute(std::vector& x, std::vector &dx, spin_op& h, double exp_h) override { ... } }; } The :code:`compute` function can make use of the quantum kernel parameterized ansatz, the :code:`spin_op` for which the expected value is being computed, the pre-computed expected value at the current iteration's parameter, and the concrete arguments for the given quantum kernel at this iteration. A non-trivial aspect of the computation of gradients (in an extensible manner) is that we model the gradient as a derivative over concrete parameters for the circuit ansatz represented as a :code:`std::vector` when the actual quantum kernel may be defined with general variadic :code:`Args...` types. To address this issue, programmers can provide a default translation mechanism for mapping common quantum kernel ansatz functional expressions to a :code:`vector` representation - the :code:`ArgsMapper` callable template type. This type must implement the :code:`std::tuple(std::vector&)` callable concept. The overall CUDA-Q workflow for leveraging the :code:`cudaq::optimizer` will work as follows (here we demonstrate with an ansatz without the default :code:`std::vector` signature): .. code-block:: cpp auto deuteron_n3_ansatz = [](double x0, double x1) __qpu__ { cudaq::qarray<3> q; x(q[0]); ry(x0, q[1]); ry(x1, q[2]); x(q[2], q[0]); x(q[0], q[1]); ry(-x0, q[1]); x(q[0], q[1]); x(q[1], q[0]); }; cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) + .21829 * z(0) - 6.125 * z(1); cudaq::spin_op h3 = h + 9.625 - 9.625 * z(2) - 3.913119 * x(1) * x(2) - 3.913119 * y(1) * y(2); // The above ansatz takes 2 doubles, not a single std::vector, which // the gradient type is expecting. So we must provide an ArgsMapper callable type auto argsMapper = [](std::vector x) {return std::make_tuple(x[0],x[1]);}; // Create the gradient strategy cudaq::gradients::central_difference gradient(deuteron_n3_ansatz, argsMapper); // Create the L-BFGS optimizer, requires gradients cudaq::optimizers::lbfgs optimizer; // Run the optimization routine. auto [min_val, opt_params] = optimizer.optimize( 2, [&](const std::vector& x, std::vector& grad_vec) { // Compute the cost, here its an energy auto cost = cudaq::observe(deuteron_n3_ansatz, h3, x); // Compute the gradient, results written to the grad_vec reference gradient.compute(x, grad_vec, h3, cost); // Return the cost to the optimizer return cost; }); // Print the results printf("Optimizer found %lf at [%lf,%lf]\n", min_val, opt_params[0], opt_params[1]);