ADAPT-QAOA algorithm

In this tutorial, we explain the ADAPT-QAOA algorithm introduced in this paper and show how to implement the algorithm in cudaq.

In QAOA (explained in this Tutorial), the variational Ansatz consists of \(p\) layers, each containing the cost Hamiltonian \(H_C\) and a mixer, \(H_M\).

\[\ket{\psi_p (\beta, \gamma)} = \left ( \prod_{k=1} ^p [e^{-i H_M \beta_k} e^{-i H_C \gamma_k}] \right ) \ket{\psi_{\mathrm{ref}}}\]

where \(\ket{\psi_{\mathrm{ref}}} = \ket{+}^{\otimes n}\), \(n\) is the number of qubits and \(\gamma, \beta\) are set of variational parameters. If these parameters are chosen such that

\[\braket{\psi_p (\beta, \gamma)| H_C |\psi_p (\beta, \gamma) }\]

is minimized, then the resulting energy and state provide an approximate solution to the optimization problem encoded in \(H_C\). In the standard QAOA Ansatz, the mixer is chosen to be a single-qubit X rotation applied to all qubits.

In ADAPT-QAOA, the single, fixed mixer \(H_M\) is replaced by a set of mixers \(A_k\) that change from one layer to the next.

\[\ket{\psi_p (\beta, \gamma) } = \left ( \prod_{k=1} ^p [e^{-i A_k \beta_k} e^{-i H_C \gamma_k}] \right ) \ket{\psi_{\mathrm{ref}}}\]

The ADAPT-QAOA algorithm can be summarized in three steps:

1- Define the operator set \(A_j\) (called the mixer pool, and where \(A_j = A^{\dagger}_j\) and select a suitable reference state to be the initial state: \(\ket{ \psi (0) }= \ket{\psi_{\mathrm{ref}}}\); \(\ket{\psi_{\mathrm{ref}}} = \ket{+}^{\otimes n}\) is chosen as in the standard QAOA

2- Prepare the current Ansatz \(\ket{\psi(k-1)}\) and measure the energy gradient with respect to the pool, the \(j\)th component of which is given by \(−i \braket{\psi(k-1)|e^{i H_C \gamma_k} [H_C,A_j] e^{−i H_C \gamma_k} |\psi(k-1)}\), where the new variational parameter \(\gamma_k\) is set to a predefined value \(\gamma_0\). For the measurement, we can decompose the commutator into linear combinations of Pauli strings and measure the expectation values of the strings using general variational quantum algorithm methods. If the norm of the gradient is below a predefined threshold, then the algorithm stops, and the current state and energy estimate approximate the desired solution. If the gradient threshold is not met, modify the Ansatz (k) by adding the operator, A(k) , associated with the largest max component of the gradient: \(\ket{\psi(k)}= e^ {−i A_\mathrm{max} \beta_k} e^{−i H_C \gamma_k} \ket{\psi(k-1)}\), where \(\beta_k\) is a new variational parameter

3- Optimize all parameters currently in the Ansatz \(\beta_m, \gamma_m = 1, 2, ...k\) such that \(\braket{\psi (k)|H_C|\psi(k)}\) is minimized, and return to the second step.

Below is a schematic representation of the ADAPT-QAOA algorithm explained above.

7ecb54d2b0a0448793c6a5502e391a3d

[ ]:
import numpy as np
import cudaq
from scipy.optimize import minimize
import random

cudaq.set_target("nvidia", option="fp64")

Simulation input:

The gradients used to select new operators are sensitive to the initial values for \(\gamma\). We initialize the parameter at \(\gamma_0 =0.01\) as explained in the paper.

[22]:
# The Max-cut graph:

# Example 1
#true_energy=-12.0
#nodes = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
#edges = [[0,4], [1,4], [0,6], [3,6], [5,6], [2,7], [1,8], [3,8], [5,8], [6,8], [7,8], [2,9], [8,9]]
#weight = [1.0] * len(edges)

# Examle 2:
#true_energy=-5.0
nodes = [0, 1, 2, 3, 4]
edges = [[0, 1], [1, 2], [2, 3], [3, 0], [2, 4], [3, 4]]
weight = [1.0] * len(edges)

# Define the number of qubits.

qubits_num = len(nodes)

# Predefined values to terminate iteration. These can be changed based on user preference
E_prev = 0.0
threshold = 1e-3   # This is a predefined value to stop the calculation. if norm of the gradient smaller than this value, calculation will be terminated.
e_stop = 1e-7      # This is a predefined value to stop the calculation. if dE <= e_stop, calculation stop. dE=current_erg-prev_erg

# Initiate the parameters gamma of the problem Hamiltonian for gradient calculation

init_gamma = 0.01

The problem Hamiltonian \(H_C\) of the max-cut graph:

\[H_C= - \frac{1}{2} \sum_{i,j} \omega_{i,j} (I - Z_i Z_j)\]
[23]:
def spin_ham (edges, weight):

    ham = 0

    count = 0
    for edge in range(len(edges)):

        qubitu = edges[edge][0]
        qubitv = edges[edge][1]
        # Add a term to the Hamiltonian for the edge (u,v)
        ham += 0.5 * (weight[count] * cudaq.spin.z(qubitu) * cudaq.spin.z(qubitv) -
                            weight[count] * cudaq.spin.i(qubitu) * cudaq.spin.i(qubitv))
        count+=1

    return ham

# Collect coefficients from a spin operator so we can pass them to a kernel
def term_coefficients(ham: cudaq.SpinOperator) -> list[complex]:
    result = []
    ham.for_each_term(lambda term: result.append(term.get_coefficient()))
    return result

# Collect Pauli words from a spin operator so we can pass them to a kernel
def term_words(ham: cudaq.SpinOperator) -> list[str]:
    result = []
    ham.for_each_term(lambda term: result.append(term.to_string(False)))
    return result

# Generate the Spin Hmiltonian:

ham = spin_ham(edges, weight)
ham_coef = term_coefficients(ham)
ham_word = term_words(ham)

Th operator pool \(A_j\):

Here, the mixer pool consists of single-qubit and multi-qubit entangling gates.

[24]:
def mixer_pool(qubits_num):
    op = []

    for i in range(qubits_num):
        op.append(cudaq.spin.x(i))
        op.append(cudaq.spin.y(i))
        op.append(cudaq.spin.z(i))

    for i in range(qubits_num-1):
        for j in range(i+1, qubits_num):
            op.append(cudaq.spin.x(i) * cudaq.spin.x(j))
            op.append(cudaq.spin.y(i) * cudaq.spin.y(j))
            op.append(cudaq.spin.z(i) * cudaq.spin.z(j))

            op.append(cudaq.spin.y(i) * cudaq.spin.z(j))
            op.append(cudaq.spin.z(i) * cudaq.spin.y(j))

            op.append(cudaq.spin.z(i) * cudaq.spin.x(j))
            op.append(cudaq.spin.x(i) * cudaq.spin.z(j))

            op.append(cudaq.spin.y(i) * cudaq.spin.x(j))
            op.append(cudaq.spin.x(i) * cudaq.spin.y(j))


    return op
[25]:
# Generate the mixer pool

pools = mixer_pool(qubits_num)
#print([op.to_string(False) for op in pools])
print('Number of pool operator: ', len(pools))
Number of pool operator:  105

The commutator \([H_C,A_j]\):

[26]:
# Generate the commutator [H,Ai]

def commutator(pools, ham):
    com_op = []

    for i in range(len(pools)):
    #for i in range(2):
        op = pools[i]
        com_op.append(ham * op - op * ham)

    return com_op

grad_op = commutator(pools, ham)
[27]:
###########################################
# Get the initial state (psi_ref)

@cudaq.kernel
def initial_state(qubits_num:int):
    qubits = cudaq.qvector(qubits_num)
    h(qubits)

state = cudaq.get_state(initial_state, qubits_num)

#print(state)
###############################################

# Circuit to compute the energy gradient with respect to the pool
@cudaq.kernel
def grad(state:cudaq.State, ham_word:list[cudaq.pauli_word], ham_coef: list[complex], init_gamma:float):
    q = cudaq.qvector(state)

    for i in range(len(ham_coef)):
        exp_pauli(init_gamma * ham_coef[i].real, q, ham_word[i])


# The qaoa circuit using the selected pool operator with max gradient

@cudaq.kernel
def kernel_qaoa(qubits_num:int, ham_word:list[cudaq.pauli_word], ham_coef:list[complex],
        mixer_pool:list[cudaq.pauli_word], gamma:list[float], beta:list[float], layer:list[int], num_layer:int):

    qubits = cudaq.qvector(qubits_num)

    h(qubits)

    idx = 0
    for p in range(num_layer):
        for i in range(len(ham_coef)):
            exp_pauli(gamma[p] * ham_coef[i].real, qubits, ham_word[i])

        for j in range(layer[p]):
            exp_pauli(beta[idx+j], qubits, mixer_pool[idx+j])
        idx = idx + layer[p]

Beginning of ADAPT-QAOA iteration:

[28]:
# Begining od ADAPT-QAOA iteration

beta = []
gamma = []

mixer_pool = []
layer = []


istep = 1
while True:

    print('Step: ', istep)

    #####################
    # compute the gradient and find the mixer pool with large values.
    # If norm is below the predefined threshold, stop calculation

    gradient_vec = []
    for op in grad_op:
        op = op * -1j
        gradient_vec.append(cudaq.observe(grad, op , state, ham_word, ham_coef, init_gamma).expectation())

    # Compute the norm of the gradient vector
    norm = np.linalg.norm(np.array(gradient_vec))
    print('Norm of the gradient: ', norm)

    if norm <= threshold:
        print('\n', 'Final Result: ', '\n')
        print('Final mixer_pool: ', mixer_pool)
        print('Number of layers: ', len(layer))
        print('Number of mixer pool in each layer: ', layer)
        print('Final Energy: ', result_vqe.fun)

        break

    else:
        temp_pool = []
        tot_pool = 0

        max_grad = np.max(np.abs(gradient_vec))

        for i in range(len(pools)):
            if np.abs(gradient_vec[i]) >= max_grad:
                tot_pool += 1
                temp_pool.append(pools[i])

        # Set the seed: Good for reproducibility of results
        random.seed(42)

        layer.append(1)
        random_mixer = random.choice(temp_pool)

        mixer_pool = mixer_pool + [random_mixer.to_string(False)]

        print('Mixer pool at step', istep)
        print(mixer_pool)

        num_layer = len(layer)
        print('Number of layers: ', num_layer)

        beta_count = layer[num_layer-1]
        init_beta = [0.0] * beta_count
        beta = beta + init_beta
        gamma = gamma + [init_gamma]
        theta = gamma + beta

        def cost(theta):

            theta = theta.tolist()
            gamma = theta[:num_layer]
            beta = theta[num_layer:]


            energy = cudaq.observe(kernel_qaoa, ham, qubits_num, ham_word, ham_coef,
                                    mixer_pool, gamma, beta, layer, num_layer).expectation()
            #print(energy)
            return energy

        result_vqe = minimize(cost, theta, method='BFGS', jac='2-point', tol=1e-5)
        print('Result from the step ', istep)
        print('Optmized Energy: ', result_vqe.fun)

        dE = np.abs(result_vqe.fun - E_prev)
        E_prev = result_vqe.fun
        theta = result_vqe.x.tolist()
        erg = result_vqe.fun

        print('dE= :', dE)
        gamma=theta[:num_layer]
        beta=theta[num_layer:]

        if dE <= e_stop:
            print('\n', 'Final Result: ', '\n')
            print('Final mixer_pool: ', mixer_pool)
            print('Number of layers: ', len(layer))
            print('Number of mixer pool in each layer: ', layer)
            print('Final Energy= ', erg)

            break

        else:

            # Compute the state of this current step for the gradient
            state = cudaq.get_state(kernel_qaoa, qubits_num, ham_word, ham_coef,mixer_pool, gamma, beta, layer, num_layer)
            #print('State at step ', istep)
            #print(state)
            istep+=1
            print('\n')


Step:  1
Norm of the gradient:  3.466322556913865
Mixer pool at step 1
['YZ']
Number of layers:  1
Result from the step  1
Optmized Energy:  -3.499999999999957
dE= : 3.499999999999957


Step:  2
Norm of the gradient:  3.162404152906434
Mixer pool at step 2
['YZ', 'IIZIY']
Number of layers:  2
Result from the step  2
Optmized Energy:  -3.999999999976162
dE= : 0.49999999997620526


Step:  3
Norm of the gradient:  1.4145319953645363
Mixer pool at step 3
['YZ', 'IIZIY', 'ZIIY']
Number of layers:  3
Result from the step  3
Optmized Energy:  -4.499999999993414
dE= : 0.500000000017252


Step:  4
Norm of the gradient:  0.01414371076153925
Mixer pool at step 4
['YZ', 'IIZIY', 'ZIIY', 'IIXIX']
Number of layers:  4
Result from the step  4
Optmized Energy:  -4.999999999996571
dE= : 0.5000000000031566


Step:  5
Norm of the gradient:  5.585137881516482e-06

 Final Result:

Final mixer_pool:  ['YZ', 'IIZIY', 'ZIIY', 'IIXIX']
Number of layers:  4
Number of mixer pool in each layer:  [1, 1, 1, 1]
Final Energy:  -4.999999999996571
[29]:
print('\n', 'Sampling the Final ADAPT QAOA circuit', '\n')
# Sample the circuit
count=cudaq.sample(kernel_qaoa, qubits_num, ham_word, ham_coef,mixer_pool, gamma, beta, layer, num_layer, shots_count=5000)
print('The most probable max cut: ', count.most_probable())
print('All bitstring from circuit sampling: ', count)

 Sampling the Final ADAPT QAOA circuit

The most probable max cut:  10100
All bitstring from circuit sampling:  { 01011:2438 10100:2562 }

[10]:
print(cudaq.__version__)
CUDA-Q Version cu12-latest (https://github.com/NVIDIA/cuda-quantum 5e9dc6aebbcf9c301808574d04dd4bc0e1d6dbdf)