Optimizers and Gradients¶
Many quantum algorithms require the optimization of quantum circuit parameters with respect to an expectation value. CUDA-Q provides a comprehensive suite of optimization tools for hybrid quantum-classical algorithms like VQE (Variational Quantum Eigensolver).
This notebook will demonstrate:
Built-in CUDA-Q Optimizers: Adam, SGD, SPSA, COBYLA, NelderMead, LBFGS, and GradientDescent
Optimizer Parameters: Detailed configuration options with defaults and tuning guidance
Gradient Strategies: CentralDifference, ForwardDifference, and ParameterShift
Third-Party Optimizers: Integration with SciPy
Parallel Parameter Shift: Multi-GPU gradient computation
CUDA-Q Optimizer Overview¶
CUDA-Q includes the following optimizers:
Gradient-Free Optimizers (no gradients required):¶
COBYLA: Constrained Optimization BY Linear Approximations
NelderMead: Simplex-based derivative-free optimizer
SPSA: Simultaneous Perturbation Stochastic Approximation (excellent for noisy functions)
Gradient-Based Optimizers (require gradients):¶
Adam: Adaptive Moment Estimation with momentum (recommended for most cases)
SGD: Stochastic Gradient Descent
LBFGS: Limited-memory BFGS quasi-Newton method
GradientDescent: Basic gradient descent
First, let’s set up the kernel and Hamiltonian that we’ll use throughout the examples.
[1]:
import cudaq
from cudaq import spin
import numpy as np
hamiltonian = 5.907 - 2.1433 * spin.x(0) * spin.x(1) - 2.1433 * spin.y(
0) * spin.y(1) + .21829 * spin.z(0) - 6.125 * spin.z(1)
@cudaq.kernel
def kernel(angles: list[float]):
qubits = cudaq.qvector(2)
x(qubits[0])
ry(angles[0], qubits[1])
x.ctrl(qubits[1], qubits[0])
initial_params = np.random.normal(0, np.pi, 1)
1. Built-in CUDA-Q Optimizers and Gradients¶
CUDA-Q provides several optimizers with configurable parameters. Let’s explore the most commonly used optimizers: Adam, SGD, and SPSA.
1.1 Adam Optimizer with Parameter Configuration¶
Adam (Adaptive Moment Estimation) combines momentum and adaptive learning rates for efficient optimization. It’s particularly effective for problems with noisy gradients.
Configurable Parameters: - step_size (default: 0.01): Learning rate for parameter updates - beta1 (default: 0.9): Exponential decay rate for first moment (momentum) - beta2 (default: 0.999): Exponential decay rate for second moment (adaptive learning) - epsilon (default: 1e-8): Small constant for numerical stability - batch_size (default: 1): Number of samples per batch - f_tol (default: 1e-4): Convergence tolerance - max_iterations: Maximum number of iterations -
initial_parameters: Starting parameter values
The optimizer and gradient are specified below. An objective function is defined which uses a lambda expression to evaluate the cost (a CUDA-Q observe expectation value). The gradient is calculated using the compute method.
[2]:
# Configure Adam optimizer with custom parameters
optimizer = cudaq.optimizers.Adam()
optimizer.step_size = 0.1 # Learning rate
optimizer.beta1 = 0.9 # First moment decay
optimizer.beta2 = 0.999 # Second moment decay
optimizer.epsilon = 1e-8 # Numerical stability
optimizer.max_iterations = 100 # Maximum iterations
optimizer.initial_parameters = initial_params # Set initial parameters
# Use CentralDifference gradient strategy
gradient = cudaq.gradients.CentralDifference()
def objective_function(parameter_vector: list[float],
hamiltonian=hamiltonian,
gradient_strategy=gradient,
kernel=kernel) -> tuple[float, list[float]]:
"""
Objective function for gradient-based optimizers.
Returns: (cost, gradient_vector)
"""
get_result = lambda parameter_vector: cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()
cost = get_result(parameter_vector)
gradient_vector = gradient_strategy.compute(parameter_vector, get_result, cost)
return cost, gradient_vector
Now run the optimizer to find the optimal energy and parameters. Adam will use adaptive learning rates for each parameter.
[3]:
energy, parameter = optimizer.optimize(dimensions=1, function=objective_function)
print(f"\n=== Adam Optimizer Results ===")
print(f"Minimized <H> = {energy:.6f}")
print(f"Optimal parameters: {[round(p, 6) for p in parameter]}")
=== Adam Optimizer Results ===
Minimized <H> = -1.744713
Optimal parameters: [-5.721116]
1.2 SGD (Stochastic Gradient Descent) Optimizer¶
SGD is a fundamental optimization algorithm that updates parameters by taking steps proportional to the negative gradient.
Configurable Parameters: - step_size (default: 0.01): Learning rate for parameter updates - batch_size (default: 1): Number of samples per batch - f_tol (default: 1e-4): Convergence tolerance - max_iterations: Maximum number of iterations - initial_parameters: Starting parameter values
SGD is simpler than Adam and can be effective when you understand your problem well enough to tune the learning rate appropriately.
[4]:
# Configure SGD optimizer
sgd_optimizer = cudaq.optimizers.SGD()
sgd_optimizer.step_size = 0.05 # Learning rate
sgd_optimizer.batch_size = 1 # Stochastic mode
sgd_optimizer.max_iterations = 100 # Maximum iterations
sgd_optimizer.f_tol = 1e-6 # Convergence tolerance
sgd_optimizer.initial_parameters = initial_params
# Run optimization
sgd_energy, sgd_params = sgd_optimizer.optimize(dimensions=1, function=objective_function)
print(f"\n=== SGD Optimizer Results ===")
print(f"Minimized <H> = {sgd_energy:.6f}")
print(f"Optimal parameters: {[round(p, 6) for p in sgd_params]}")
=== SGD Optimizer Results ===
Minimized <H> = -1.748865
Optimal parameters: [-5.688733]
1.3 SPSA (Simultaneous Perturbation Stochastic Approximation)¶
SPSA is a gradient-free stochastic optimization algorithm that is particularly useful for noisy objective functions (like quantum hardware with shot noise). It approximates gradients using simultaneous perturbations and requires only 2 function evaluations per iteration regardless of problem dimension.
Configurable Parameters: - step_size (default: 0.3): Evaluation step size for gradient approximation - gamma (default: 0.101): Scaling exponent for step size schedule - max_iterations: Maximum number of iterations - initial_parameters: Starting parameter values
Key Advantage: SPSA does not require gradients, making it ideal for noisy functions and quantum hardware.
[5]:
# Configure SPSA optimizer
spsa_optimizer = cudaq.optimizers.SPSA()
spsa_optimizer.step_size = 0.3 # Evaluation step size
spsa_optimizer.gamma = 0.101 # Scaling exponent
spsa_optimizer.max_iterations = 100 # Maximum iterations
spsa_optimizer.initial_parameters = initial_params
# Define gradient-free objective function
def spsa_objective(parameter_vector: list[float]) -> float:
"""
Objective function for gradient-free optimizers like SPSA.
Returns: cost only (no gradient)
"""
return cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()
# Run optimization
spsa_energy, spsa_params = spsa_optimizer.optimize(dimensions=1, function=spsa_objective)
print(f"\n=== SPSA Optimizer Results ===")
print(f"Minimized <H> = {round(spsa_energy, 6)}")
print(f"Optimal parameters: {[round(p, 6) for p in spsa_params]}")
=== SPSA Optimizer Results ===
Minimized <H> = -1.748668
Optimal parameters: [-5.681724]
2. Third-Party Optimizers¶
CUDA-Q optimizers can work alongside third-party optimization libraries like SciPy. This provides flexibility to use familiar optimization tools while leveraging CUDA-Q’s quantum simulation capabilities.
The same VQE procedure can be accomplished using SciPy. In this case, a simple cost function is defined and provided as input to the standard SciPy minimize function.
[6]:
from scipy.optimize import minimize
def cost(theta):
exp_val = cudaq.observe(kernel, hamiltonian, theta).expectation()
return exp_val
result = minimize(cost, initial_params ,method='COBYLA', options={'maxiter': 40})
print(result)
message: Optimization terminated successfully.
success: True
status: 1
fun: -1.748865011330396
x: [ 5.943e-01]
nfev: 26
maxcv: 0.0
3. Parallel Parameter Shift Gradients¶
CUDA-Q’s mqpu backend allows for parallel computation of parameter shift gradients using multiple simulated QPUs. Gradients computed this way can be used in any of the previously discussed optimization procedures. Below is an example demonstrating how parallel gradient evaluation can be used for a VQE procedure.
The parameter shift procedure computes two expectations values for each parameter shifted forwards and backwards. These are used to estimate the gradient contribution for that parameter.
The following code defines a function that takes a kernel, a Hamiltonian (spin operator), and the circuit parameters and produces a parameter shift gradient with shift epsilon. The first step of the function builds xplus and xminus , arrays consisting of the shifted parameters.
Next, a for loop iterates over all of the parameters and uses the cudaq.observe_async to compute the expectation value. This command also takes qpu_id as an in out which specifies the GPU that will be used to simulate the ith QPU. In the example below, four GPUs (simulated QPUs) are available so the gradient is batched over four devices.
The results are saved in the g_plus and g_minus lists, the elements of which are accessed with commands like g_plus[1].expectation() to compute the finite differences and construct the final gradient.
[7]:
import numpy as np
# cudaq.set_target('nvidia', option = 'mqpu')
num_qpus = 1
epsilon =np.pi/4
def batched_gradient_function(kernel, parameters, hamiltonian, epsilon):
# Prepare an array of parameters corresponding to the plus and minus shifts
x = np.tile(parameters, (len(parameters),1))
xplus = x + (np.eye(x.shape[0]) * epsilon)
xminus = x - (np.eye(x.shape[0]) * epsilon)
g_plus = []
g_minus = []
gradient = []
qpu_counter = 0 # Iterate over the number of GPU resources available
for i in range(x.shape[0]):
g_plus.append(cudaq.observe_async(kernel,hamiltonian, xplus[i], qpu_id = qpu_counter%num_qpus))
qpu_counter += 1
g_minus.append(cudaq.observe_async(kernel, hamiltonian, xminus[i], qpu_id = qpu_counter%num_qpus))
qpu_counter += 1
# Use the expectation values to compute the gradient
gradient = [(g_plus[i].get().expectation() - g_minus[i].get().expectation()) / (2*epsilon) for i in range(len(g_minus))]
return gradient
This function can be used in a VQE procedure as presented below. The batched_gradient_function is used to evaluate the gradient at each optimization step. This objective function returns the cost and gradient at the current parameter values and can be used with any SciPy optimizer that uses gradients (like L-BFGS-B).
[8]:
def objective_function(parameter_vector,
hamiltonian=hamiltonian,
kernel=kernel,
epsilon=epsilon):
"""
Objective function for VQE with parallel parameter shift gradients.
Computes both cost and gradient at the current parameter values.
"""
# Compute cost at current parameters
cost = cudaq.observe(kernel, hamiltonian, parameter_vector).expectation()
# Compute gradient at current parameters using parallel parameter shift
gradient_vector = batched_gradient_function(kernel, parameter_vector, hamiltonian, epsilon)
return cost, gradient_vector
[9]:
# Run VQE optimization with parallel parameter shift gradients
result_vqe = minimize(objective_function, initial_params, method='L-BFGS-B', jac=True, tol=1e-8, options={'maxiter': 50})
print("\n=== VQE with Parallel Parameter Shift Gradients ===")
print(f"Optimized energy: {result_vqe.fun:.6f}")
print(f"Optimal parameters: {result_vqe.x}")
print(f"Number of iterations: {result_vqe.nit}")
print(f"Success: {result_vqe.success}")