Executing Quantum Circuits

In CUDA-Q, there are 4 ways in which one can execute quantum kernels:

  1. sample: yields measurement counts

  2. run: yields individual return values from multiple executions

  3. observe: yields expectation values

  4. get_state: yields the quantum statevector of the computation

Asynchronous programming is a technique that enables your program to start a potentially long-running task and still be able to be responsive to other events while that task runs, rather than having to wait until that task has finished. Once that task has finished, your program is presented with the result. The most intensive task in the computation is the execution of the quantum kernel hence each execution function can be parallelized given access to multiple quantum processing units (multi-QPU) using: sample_async, run_async, observe_async and get_state_async. Since multi-QPU platforms are not yet feasible, we emulate each QPU with a GPU.

Sample

Quantum states collapse upon measurement and hence need to be sampled many times to gather statistics. The CUDA-Q sample call enables this:

[1]:
import cudaq
import numpy as np

qubit_count = 2

# Define the simulation target.
cudaq.set_target("qpp-cpu")

# Define a quantum kernel function.

@cudaq.kernel
def kernel(qubit_count: int):
    qvector = cudaq.qvector(qubit_count)

    # 2-qubit GHZ state.
    h(qvector[0])
    for i in range(1, qubit_count):
        x.ctrl(qvector[0], qvector[i])

    # If we dont specify measurements, all qubits are measured in
    # the Z-basis by default or we can manually specify it also
    # mz(qvector)


print(cudaq.draw(kernel, qubit_count))

result = cudaq.sample(kernel, qubit_count, shots_count=1000)

print(result)
     ╭───╮
q0 : ┤ h ├──●──
     ╰───╯╭─┴─╮
q1 : ─────┤ x ├
          ╰───╯

{ 00:492 11:508 }

Note that there is a subtle difference between how sample is executed with the target device set to a simulator or with the target device set to a QPU. In simulation mode, the quantum state is built once and then sampled \(s\) times where \(s\) equals the shots_count. In hardware execution mode, the quantum state collapses upon measurement and hence needs to be rebuilt over and over again.

There are a number of helpful tools that can be found in the API docs to process the Sample_Result object produced by sample.

Sample Async

sample also supports asynchronous execution for the arguments it accepts. One can parallelize over various kernels, variational parameters or even distribute shots counts over multiple QPUs.

Run

The run API executes a quantum kernel multiple times and returns each individual result. Unlike sample, which collects measurement statistics as counts, run preserves each individual return value from every execution. This is useful when you need to analyze the distribution of returned values rather than just aggregated measurement counts.

Key points about run: - Requires a kernel that returns a non-void value - Returns a list containing all individual execution results - Supports scalar types (bool, int, float) and custom data classes as return types

[2]:
import cudaq
from dataclasses import dataclass

# Define the simulation target
cudaq.set_target("qpp-cpu")


# Define a quantum kernel that returns an integer
@cudaq.kernel
def simple_ghz(num_qubits: int) -> int:
    # Allocate qubits
    qubits = cudaq.qvector(num_qubits)

    # Create GHZ state
    h(qubits[0])
    for i in range(1, num_qubits):
        x.ctrl(qubits[0], qubits[i])

    # Measure and return total number of qubits in state |1⟩
    result = 0
    for i in range(num_qubits):
        if mz(qubits[i]):
            result += 1

    return result


# Execute the kernel 20 times
num_qubits = 3
results = cudaq.run(simple_ghz, num_qubits, shots_count=20)

print(f"Executed {len(results)} shots")
print(f"Results: {results}")
print(f"Possible values: Either 0 or {num_qubits} due to GHZ state properties")

# Count occurrences of each result
value_counts = {}
for value in results:
    value_counts[value] = value_counts.get(value, 0) + 1

print("\nCounts of each result:")
for value, count in value_counts.items():
    print(f"{value}: {count} times")

Executed 20 shots
Results: [0, 0, 0, 0, 0, 3, 3, 3, 0, 0, 3, 0, 0, 3, 3, 3, 0, 0, 0, 0]
Possible values: Either 0 or 3 due to GHZ state properties

Counts of each result:
0: 13 times
3: 7 times

Return Custom Data Types

The run API also supports returning custom data types using Python’s data classes. This allows returning multiple values from your quantum computation in a structured way.

[3]:
import cudaq

from dataclasses import dataclass


# Define a custom dataclass to return from our quantum kernel
@dataclass(slots=True)
class MeasurementResult:
    first_qubit: bool
    last_qubit: bool
    total_ones: int


@cudaq.kernel
def bell_pair_with_data() -> MeasurementResult:
    # Create a bell pair
    qubits = cudaq.qvector(2)
    h(qubits[0])
    x.ctrl(qubits[0], qubits[1])

    # Measure both qubits
    first_result = mz(qubits[0])
    last_result = mz(qubits[1])

    # Return custom data structure with results
    total = 0
    if first_result:
        total = 1
    if last_result:
        total = total + 1

    return MeasurementResult(first_result, last_result, total)


# Run the kernel 10 times and get all results
results = cudaq.run(bell_pair_with_data, shots_count=10)

# Analyze the results
print("Individual measurement results:")
for i, res in enumerate(results):
    print(
        f"Shot {i}: {{{res.first_qubit}, {res.last_qubit}}}\ttotal ones={res.total_ones}"
    )

# Verify the Bell state correlations
correlated_count = sum(
    1 for res in results if res.first_qubit == res.last_qubit)
print(
    f"\nCorrelated measurements: {correlated_count}/{len(results)} ({correlated_count/len(results)*100:.1f}%)"
)
Individual measurement results:
Shot 0: {True, True}    total ones=2
Shot 1: {True, True}    total ones=2
Shot 2: {True, True}    total ones=2
Shot 3: {False, False}  total ones=0
Shot 4: {False, False}  total ones=0
Shot 5: {True, True}    total ones=2
Shot 6: {False, False}  total ones=0
Shot 7: {False, False}  total ones=0
Shot 8: {True, True}    total ones=2
Shot 9: {True, True}    total ones=2

Correlated measurements: 10/10 (100.0%)

Run Async

Similar to sample_async above, run also supports asynchronous execution for the arguments it accepts.

NOTE: Currently, ``run`` and ``run_async`` are only supported on simulator targets.

Observe

The observe function allows us to calculate expectation values. We must supply a spin operator in the form of a Hamiltonian, \(H\), from which we would like to calculate \(\bra{\psi}H\ket{\psi}\).

[4]:
from cudaq import spin

# Define a Hamiltonian in terms of Pauli Spin operators.
hamiltonian = spin.z(0) + spin.y(1) + spin.x(0) * spin.z(0)

# Compute the expectation value given the state prepared by the kernel.
result = cudaq.observe(kernel, hamiltonian, qubit_count).expectation()

print('<H> =', result)
<H> = 0.0

Observe Async

observe can be a time intensive task. We can parallelize the execution of observe via the arguments it accepts.

[5]:
# Set the simulation target to a multi-QPU platform
# cudaq.set_target("nvidia", option = 'mqpu')

# Measuring the expectation value of 2 different hamiltonians in parallel
hamiltonian_1 = spin.x(0) + spin.y(1) + spin.z(0)*spin.y(1)
# hamiltonian_2 = spin.z(1) + spin.y(0) + spin.x(1)*spin.x(0)

# Asynchronous execution on multiple qpus via nvidia gpus.
result_1 = cudaq.observe_async(kernel, hamiltonian_1, qubit_count, qpu_id=0)
# result_2 = cudaq.observe_async(kernel, hamiltonian_2, qubit_count, qpu_id=1)

# Retrieve results
print(result_1.get().expectation())
# print(result_2.get().expectation())
1.1102230246251565e-16

Above we parallelized the observe call over the hamiltonian parameter however we can parallelize over any of the argument it accepts by just iterating obver the qpu_id.

Get state

The get_state function gives us access to the quantum statevector of the computation. Remember, that this is only feasible in simulation mode.

[6]:
# Compute the statevector of the kernel
result = cudaq.get_state(kernel, qubit_count)

print(np.array(result))
[0.70710678+0.j 0.        +0.j 0.        +0.j 0.70710678+0.j]

The statevector generated by the get_state command follows Big-endian convention for associating numbers with their binary representations, which places the least significant bit on the left. That is, for the example of a 2-bit system, we have the following translation between integers and bits:

\[\begin{split}\begin{matrix} \text{Integer} & \text{Binary representation}\\ & \text{least signinificant bit on left}\\ 0 =\textcolor{red}{0}*2^0+\textcolor{blue}{0}*2^1 & \textcolor{red}{0}\textcolor{blue}{0} \\ 1 = \textcolor{red}{1}*2^0 + \textcolor{blue}{0} *2^1 & \textcolor{red}{1}\textcolor{blue}{0}\\ 2 = \textcolor{red}{0}*2^0 + \textcolor{blue}{1}*2^1 & \textcolor{red}{0}\textcolor{blue}{1} \\ 3 = \textcolor{red}{1}*2^0 + \textcolor{blue}{1}*2^1 & \textcolor{red}{1}\textcolor{blue}{1} \end{matrix}\end{split}\]

Get State Async

Similar to observe_async above, get_state also supports asynchronous execution for the arguments it accepts

[7]:
print(cudaq.__version__)
CUDA-Q Version proto-0.8.0-developer (https://github.com/NVIDIA/cuda-quantum cd3ef17fc8354e5e7428e3abd34f8d5e14c8b09a)