Dynamics Simulation¶

CUDA-Q enables the design, simulation and execution of quantum dynamics via the evolve API. Specifically, this API allows us to solve the time evolution of quantum systems or models. In the simulation mode, CUDA-Q provides the dynamics backend target, which is based on the cuQuantum library, optimized for performance and scale on NVIDIA GPU.

Quick Start¶

In the example below, we demonstrate a simple time evolution simulation workflow comprising of the following steps:

Define a quantum system model

A quantum system model is defined by a Hamiltonian. For example, a superconducting transmon qubit can be modeled by the following Hamiltonian

\[H = \frac{\omega_z}{2} \sigma_z + \omega_x \cos(\omega_d t)\sigma_x,\]

where \(\sigma_z\) and \(\sigma_x\) are Pauli Z and X operators, respectively.

Using CUDA-Q operator, the above time-dependent Hamiltonian can be set up as follows.

Python

omega_z = 6.5
omega_x = 4.0
omega_d = 0.5

import numpy as np
from cudaq import spin, ScalarOperator

# Qubit Hamiltonian
hamiltonian = 0.5 * omega_z * spin.z(0)
# Add modulated driving term to the Hamiltonian
hamiltonian += omega_x * ScalarOperator(lambda t: np.cos(omega_d * t)) * spin.x(
    0)

In particular, ScalarOperator provides an easy way to model arbitrary time-dependent control signals. Details about CUDA-Q operator, including builtin operators that it supports can be found here.

C++

    // Parameters
    double omega_z = 6.5;
    double omega_x = 4.0;
    double omega_d = 0.5;

    // Qubit Hamiltonian
    auto hamiltonian = spin_op(0.5 * omega_z * spin_op::z(0));

    // Time dependent modulation
    auto mod_func =
        [omega_d](const parameter_map &params) -> std::complex<double> {
      auto it = params.find("t");
      if (it != params.end()) {
        double t = it->second.real();
        const auto result = std::cos(omega_d * t);
        return result;
      }
      throw std::runtime_error("Cannot find the time parameter.");
    };

    hamiltonian += mod_func * spin_op::x(0) * omega_x;

Details about CUDA-Q operator, including builtin operators that it supports can be found here.

Setup the evolution simulation

The below code snippet shows how to simulate the time-evolution of the above system with cudaq.evolve.

Python

import cudaq
import cupy as cp
from cudaq.dynamics import Schedule

# Set the target to our dynamics simulator
cudaq.set_target("dynamics")

# Dimensions of sub-systems: a single two-level system.
dimensions = {0: 2}

# Initial state of the system (ground state).
rho0 = cudaq.State.from_data(
    cp.array([[1.0, 0.0], [0.0, 0.0]], dtype=cp.complex128))

# Schedule of time steps.
steps = np.linspace(0, t_final, n_steps)
schedule = Schedule(steps, ["t"])

# Run the simulation.
evolution_result = cudaq.evolve(
    hamiltonian,
    dimensions,
    schedule,
    rho0,
    observables=[spin.x(0), spin.y(0), spin.z(0)],
    collapse_operators=[],
    store_intermediate_results=cudaq.IntermediateResultSave.ALL)

C++

    double t_final = 1.0;
    int n_steps = 100;

    // Define dimensions of subsystem (single two-level system)
    cudaq::dimension_map dimensions = {{0, 2}};

    // Initial state (ground state)
    auto psi0 =
        cudaq::state::from_data(std::vector<std::complex<double>>{1.0, 0.0});

    // Schedule of time steps
    std::vector<double> steps(n_steps);
    for (int i = 0; i < n_steps; i++)
      steps[i] = i * t_final / (n_steps - 1);

    schedule schedule(steps, {"t"});

    // Numerical integrator
    // Here we choose a Runge-`Kutta` method for time evolution.
    cudaq::integrators::runge_kutta integrator(4, 0.01);

    // Observables to track
    auto observables = {cudaq::spin_op::x(0), cudaq::spin_op::y(0),
                        cudaq::spin_op::z(0)};

    // Run simulation
    // We evolve the system under the defined Hamiltonian. No collapsed
    // operators are provided (closed system evolution). The evolution returns
    // expectation values for all defined observables at each time step.
    auto evolution_result =
        cudaq::evolve(hamiltonian, dimensions, schedule, psi0, integrator, {},
                      observables, cudaq::IntermediateResultSave::All);

Specifically, we need to set up the simulation by providing:

The system model in terms of a Hamiltonian as well as any decoherence terms, so-called collapse_operators.
The dimensionality of component systems in the model. CUDA-Q evolve allows users to model arbitrary multi-level systems, such as photonic Fock space.
The initial quantum state.
The time schedule, aka time steps, of the evolution.
Any ‘observable’ operator that we want to measure the expectation value with respect to the evolving state.

Note

By default, evolve will only return the final state and expectation values. To save intermediate results (at each time step specified in the schedule), the store_intermediate_results flag must be set to True.

Retrieve and plot the results

Once the simulation is complete, we can retrieve the final state and the expectation values as well as intermediate values at each time step (with store_intermediate_results=cudaq.IntermediateResultSave.ALL).

Note

Storing intermediate states can be memory-intensive, especially for large systems. If you only need the intermediate expectation values, you can set store_intermediate_results to cudaq.IntermediateResultSave.EXPECTATION_VALUES (Python) / cudaq::IntermediateResultSave::ExpectationValue (C++) instead.

For example, we can plot the Pauli expectation value for the above simulation as follows.

Python

get_result = lambda idx, res: [
    exp_vals[idx].expectation() for exp_vals in res.expectation_values()
]

import matplotlib.pyplot as plt

plt.plot(steps, get_result(0, evolution_result))
plt.plot(steps, get_result(1, evolution_result))
plt.plot(steps, get_result(2, evolution_result))
plt.ylabel("Expectation value")
plt.xlabel("Time")
plt.legend(("Sigma-X", "Sigma-Y", "Sigma-Z"))

In particular, for each time step, evolve captures an array of expectation values, one for each observable. Hence, we convert them into sequences for plotting purposes.

C++

    // Extract and print results
    for (size_t i = 0; i < steps.size(); i++) {
      double ex =
          evolution_result.expectation_values.value()[0][i].expectation();
      double ey =
          evolution_result.expectation_values.value()[1][i].expectation();
      double ez =
          evolution_result.expectation_values.value()[2][i].expectation();
      std::cout << steps[i] << " " << ex << " " << ey << " " << ez << "\n";
    }

Examples that illustrate how to use the dynamics target are available in the CUDA-Q repository.

Operator¶

CUDA-Q provides builtin definitions for commonly-used operators, such as the ladder operators (\(a\) and \(a^\dagger\)) of a harmonic oscillator, the Pauli spin operators for a two-level system, etc.

Here is a list of those operators.

Builtin Operators¶
Name	Description
`identity`	Identity operator
`zero`	Zero or null operator
`annihilate`	Bosonic annihilation operator (\(a\))
`create`	Bosonic creation operator (\(a^\dagger\))
`number`	Number operator of a bosonic mode (equivalent to \(a^\dagger a\))
`parity`	Parity operator of a bosonic mode (defined as \(e^{i\pi a^\dagger a}\))
`displace`	Displacement operator of complex amplitude \(\alpha\) (`displacement`). It is defined as \(e^{\alpha a^\dagger - \alpha^* a}\).
`squeeze`	Squeezing operator of complex squeezing amplitude \(z\) (`squeezing`). It is defined as \(\exp(\frac{1}{2}(z^*a^2 - z a^{\dagger 2}))\).
`position`	Position operator (equivalent to \((a^\dagger + a)/2\))
`momentum`	Momentum operator (equivalent to \(i(a^\dagger - a)/2\))
`spin.x`	Pauli \(\sigma_x\) operator
`spin.y`	Pauli \(\sigma_y\) operator
`spin.z`	Pauli \(\sigma_z\) operator
`spin.plus`	Pauli raising (\(\sigma_+\)) operator
`spin.minus`	Pauli lowering (\(\sigma_-\)) operator

As an example, let’s look at the Jaynes-Cummings model, which describes the interaction between a two-level atom and a light (Boson) field.

Mathematically, the Hamiltonian can be expressed as

\[H = \omega_c a^\dagger a + \omega_a \frac{\sigma_z}{2} + \frac{\Omega}{2}(a\sigma_+ + a^\dagger \sigma_-).\]

This Hamiltonian can be converted to CUDA-Q Operator representation with

Python

from cudaq import operators

hamiltonian = omega_c * operators.create(1) * operators.annihilate(1) \
                + (omega_a / 2) * spin.z(0) \
                + (Omega / 2) * (operators.annihilate(1) * spin.plus(0) + operators.create(1) * spin.minus(0))

C++

    // Jaynes-Cummings Hamiltonian
    auto jc_hamiltonian =
        omega_c * boson_op::create(1) * boson_op::annihilate(1) +
        (omega_a / 2.0) * spin_op::z(0) +
        (Omega / 2.0) * (boson_op::annihilate(1) * spin_op::plus(0) +
                         boson_op::create(1) * spin_op::minus(0));

In the above code snippet, we map the cavity light field to degree index 1 and the two-level atom to degree index 0. The description of composite quantum system dynamics is independent from the Hilbert space of the system components. The latter is specified by the dimension map that is provided to the cudaq.evolve call.

Builtin operators support both dense and multi-diagonal sparse formats. Depending on the sparsity of operator matrix and/or the sub-system dimension, CUDA-Q will either use the dense or multi-diagonal data formats for optimal performance.

Specifically, the following environment variable options are applicable to the dynamics target. Any environment variables must be set prior to setting the target or running “import cudaq”.

**Additional environment variable options for the `dynamics` target**¶
Option	Value	Description
`CUDAQ_DYNAMICS_MIN_MULTIDIAGONAL_DIMENSION`	Non-negative number	The minimum sub-system dimension on which the operator acts to activate multi-diagonal data format. For example, if a minimum dimension configuration of `N` is set, all operators acting on degrees of freedom (sub-system) whose dimension is less than or equal to `N` would always use the dense format. The final data format to be used depends on the next configuration. The default is 4.
`CUDAQ_DYNAMICS_MAX_DIAGONAL_COUNT_FOR_MULTIDIAGONAL`	Non-negative number	The maximum number of diagonals for multi-diagonal representation. If the operator matrix has more diagonals than this value, the dense format will be used. Default is 1, i.e., operators with only one diagonal line (center, lower, or upper) will use the multi-diagonal sparse storage.

Time-Dependent Dynamics¶

In the previous examples of operator construction, we assumed that the systems under consideration were described by time-independent Hamiltonian. However, we may want to simulate systems whose Hamiltonian operators have explicit time dependence.

CUDA-Q provides multiple ways to construct time-dependent operators.

Time-dependent coefficient

CUDA-Q ScalarOperator can be used to wrap a Python/C++ function that returns the coefficient value at a specific time.

As an example, we will look at a time-dependent Hamiltonian of the form \(H = H_0 + f(t)H_1\), where \(f(t)\) is the time-dependent driving strength given as \(cos(\omega t)\).

The following code sets up the problem

Python

# Define the static (drift) and control terms
H0 = spin.z(0)
H1 = spin.x(0)
H = H0 + ScalarOperator(lambda t: np.cos(omega * t)) * H1

C++

    // Hamiltonian with driving frequency
    double omega = M_PI;
    auto H0 = spin_op::z(0);
    auto H1 = spin_op::x(0);
    auto mod_func =
        [omega](const std::unordered_map<std::string, std::complex<double>>
                    &parameters) {
          auto entry = parameters.find("t");
          if (entry == parameters.end())
            throw std::runtime_error("Cannot find value of expected parameter");
          const auto t = entry->second.real();
          return std::cos(omega * t);
        };
    auto driven_hamiltonian = H0 + mod_func * H1;

Time-dependent operator

We can also construct a time-dependent operator from a function that returns a complex matrix representing the time dynamics of that operator.

As an example, let’s looks at the displacement operator. It can be defined as follows:

Python

import numpy
import scipy
from cudaq import operators, NumericType
from numpy.typing import NDArray


def displacement_matrix(
        dimension: int,
        displacement: NumericType) -> NDArray[numpy.complexfloating]:
    """
    Returns the displacement operator matrix.
    Args:
        displacement: Amplitude of the displacement operator.
            See also https://en.wikipedia.org/wiki/Displacement_operator.
    """
    displacement = complex(displacement)
    term1 = displacement * operators.create(0).to_matrix({0: dimension})
    term2 = numpy.conjugate(displacement) * operators.annihilate(0).to_matrix(
        {0: dimension})
    return scipy.linalg.expm(term1 - term2)


# The second argument here indicates the defined operator
# acts on a single degree of freedom, which can have any dimension.
# An argument [2], for example, would indicate that it can only
# act on a single degree of freedom with dimension two.
operators.define("displace", [0], displacement_matrix)


def displacement(degree: int) -> operators.MatrixOperatorElement:
    """
    Instantiates a displacement operator acting on the given degree of freedom.
    """
    return operators.instantiate("displace", [degree])

C++

    auto displacement_matrix =
        [](const std::vector<int64_t> &dimensions,
           const std::unordered_map<std::string, std::complex<double>>
               &parameters) -> cudaq::complex_matrix {
      // Returns the displacement operator matrix.
      //  Args:
      //   - displacement: Amplitude of the displacement operator.
      // See also https://en.wikipedia.org/wiki/Displacement_operator.
      std::size_t dimension = dimensions[0];
      auto entry = parameters.find("displacement");
      if (entry == parameters.end())
        throw std::runtime_error("missing value for parameter 'displacement'");
      auto displacement_amplitude = entry->second;
      auto create = cudaq::complex_matrix(dimension, dimension);
      auto annihilate = cudaq::complex_matrix(dimension, dimension);
      for (std::size_t i = 0; i + 1 < dimension; i++) {
        create[{i + 1, i}] = std::sqrt(static_cast<double>(i + 1));
        annihilate[{i, i + 1}] = std::sqrt(static_cast<double>(i + 1));
      }
      auto term1 = displacement_amplitude * create;
      auto term2 = std::conj(displacement_amplitude) * annihilate;
      return (term1 - term2).exponential();
    };

    cudaq::matrix_handler::define("displace_op", {-1}, displacement_matrix);

    // Instantiate a displacement operator acting on the given degree of
    // freedom.
    auto displacement = [](std::size_t degree) {
      return cudaq::matrix_handler::instantiate("displace_op", {degree});
    };

The defined operator is parameterized by the displacement amplitude. To create simulate the evolution of an operator under a time dependent displacement amplitude, we can define how the amplitude changes in time:

Python

import cudaq

# Define a system consisting of a single degree of freedom (0) with dimension 3.
system_dimensions = {0: 3}
system_operator = displacement(0)

# Define the time dependency of the system operator as a schedule that linearly
# increases the displacement parameter from 0 to 1.
time_dependence = Schedule(numpy.linspace(0, 1, 100), ['displacement'])
initial_state = cudaq.State.from_data(
    numpy.ones(3, dtype=numpy.complex128) / numpy.sqrt(3))

# Simulate the evolution of the system under this time dependent operator.
cudaq.evolve(system_operator, system_dimensions, time_dependence, initial_state)

C++

    // Define a system consisting of a single degree of freedom (0) with
    // dimension 3.
    cudaq::dimension_map system_dimensions{{0, 3}};
    auto system_operator = displacement(0);

    // Define the time dependency of the system operator as a schedule that
    // linearly increases the displacement parameter from 0 to 1.
    cudaq::schedule time_dependence(cudaq::linspace(0, 1, 100),
                                    {"displacement"});
    const std::vector<std::complex<double>> state_vec(3, 1.0 / std::sqrt(3.0));
    auto initial_state = cudaq::state::from_data(state_vec);
    cudaq::integrators::runge_kutta integrator(4, 0.01);
    // Simulate the evolution of the system under this time dependent operator.
    cudaq::evolve(system_operator, system_dimensions, time_dependence,
                  initial_state, integrator);

Let’s say we want to add a squeezing term to the system operator. We can independently vary the squeezing amplitude and the displacement amplitude by instantiating a schedule with a custom function that returns the desired value for each parameter:

Python

system_operator = displacement(0) + operators.squeeze(0)


# Define a schedule such that displacement amplitude increases linearly in time
# but the squeezing amplitude decreases, that is follows the inverse schedule.
def parameter_values(time_steps):

    def compute_value(param_name, step_idx):
        match param_name:
            case 'displacement':
                return time_steps[int(step_idx)]
            case 'squeezing':
                return time_steps[-int(step_idx + 1)]
            case _:
                raise ValueError(f"value for parameter {param_name} undefined")

    return Schedule(range(len(time_steps)), system_operator.parameters.keys(),
                    compute_value)


time_dependence = parameter_values(numpy.linspace(0, 1, 100))
cudaq.evolve(system_operator, system_dimensions, time_dependence, initial_state)

C++

    auto hamiltonian = displacement(0) + cudaq::matrix_op::squeeze(0);

    // Define a schedule such that displacement amplitude increases linearly in
    // time but the squeezing amplitude decreases, that is follows the inverse
    // schedule. def parameter_values(time_steps):
    auto parameter_values = [](const std::vector<double> &time_steps) {
      auto compute_value = [time_steps](const std::string &param_name,
                                        const std::complex<double> &step) {
        int step_idx = (int)step.real();
        if (param_name == "displacement")
          return time_steps[step_idx];
        if (param_name == "squeezing")
          return time_steps[time_steps.size() - (step_idx + 1)];

        throw std::runtime_error("value for parameter " + param_name +
                                 " undefined");
      };

      std::vector<std::complex<double>> steps;
      for (int i = 0; i < time_steps.size(); ++i)
        steps.emplace_back(i);
      return cudaq::schedule(steps, {"displacement", "squeezing"},
                             compute_value);
    };

    auto time_dependence_param = parameter_values(cudaq::linspace(0, 1, 100));
    cudaq::evolve(hamiltonian, system_dimensions, time_dependence_param,
                  initial_state, integrator);

Compile and Run C++ program

C++

nvq++ --target dynamics dynamics.cpp -o dynamics && ./dynamics

Super-operator Representation¶

In the previous examples, we assumed that the system dynamics is driven by a Lindblad master equation, which is specified by the Hamiltonian operator and the collapse operators.

However, we may want to simulate an arbitrary state evolution equation, whereby the right-hand-side of the differential equation is provided as a generic super-operator.

CUDA-Q provides a SuperOperator (Python) / super_op (C++) class that can be used to represent the right-hand-side of the evolution equation. A super-operator can be constructed as a linear combination (sum) of left and/or right multiplication actions of Operator instances.

As an example, we will look at specifying the Schrodinger’s equation \(\frac{d|\Psi\rangle}{dt} = -i H |\Psi\rangle\) as a super-operator.

Python

import cudaq
from cudaq import spin, Schedule, RungeKuttaIntegrator
import numpy as np

hamiltonian = 2.0 * np.pi * 0.1 * spin.x(0)
steps = np.linspace(0, 1, 10)
schedule = Schedule(steps, ["t"])
dimensions = {0: 2}
# initial state
psi0 = cudaq.dynamics.InitialState.ZERO
# Create a super-operator that represents the evolution of the system
# under the Hamiltonian `-iH|psi>`, where `H` is the Hamiltonian.
se_super_op = cudaq.SuperOperator()
# Apply `-iH|psi>` super-operator
se_super_op += cudaq.SuperOperator.left_multiply(-1j * hamiltonian)
evolution_result = cudaq.evolve(se_super_op,
                                dimensions,
                                schedule,
                                psi0,
                                observables=[spin.z(0)],
                                store_intermediate_results=True,
                                integrator=RungeKuttaIntegrator())

C++

    const cudaq::dimension_map dims = {{0, 2}};
    cudaq::product_op<cudaq::matrix_handler> ham_ =
        2.0 * M_PI * 0.1 * cudaq::spin_op::x(0);
    cudaq::sum_op<cudaq::matrix_handler> ham(ham_);
    constexpr int numSteps = 10;
    cudaq::schedule schedule(cudaq::linspace(0.0, 1.0, numSteps), {"t"});
    auto initialState =
        cudaq::state::from_data(std::vector<std::complex<double>>{1.0, 0.0});
    cudaq::integrators::runge_kutta integrator(1, 0.001);
    // Create a super-operator to evolve the system under the Schrödinger
    // equation `-iH * |psi>`, where `H` is the Hamiltonian.
    cudaq::super_op sup;
    // Apply `-iH * |psi>` superop
    sup +=
        cudaq::super_op::left_multiply(std::complex<double>(0.0, -1.0) * ham);
    auto result = cudaq::evolve(sup, dims, schedule, initialState, integrator,
                                {cudaq::spin_op::z(0)},
                                cudaq::IntermediateResultSave::All);

The super-operator, once constructed, can be used in the evolve API instead of the Hamiltonian and collapse operators as shown in the above examples.

Numerical Integrators¶

For Python, CUDA-Q provides a set of numerical integrators, to be used with the dynamics backend target.

Numerical Integrators¶
Name	Description
`RungeKuttaIntegrator`	Explicit 4th-order Runge-Kutta method (default integrator)
`ScipyZvodeIntegrator`	Complex-valued variable-coefficient ordinary differential equation solver (provided by SciPy)
`CUDATorchDiffEqDopri5Integrator`	Runge-Kutta of order 5 of Dormand-Prince-Shampine (provided by `torchdiffeq`)
`CUDATorchDiffEqAdaptiveHeunIntegrator`	Runge-Kutta of order 2 (provided by `torchdiffeq`)
`CUDATorchDiffEqBosh3Integrator`	Runge-Kutta of order 3 of Bogacki-Shampine (provided by `torchdiffeq`)
`CUDATorchDiffEqDopri8Integrator`	Runge-Kutta of order 8 of Dormand-Prince-Shampine (provided by `torchdiffeq`)
`CUDATorchDiffEqEulerIntegrator`	Euler method (provided by `torchdiffeq`)
`CUDATorchDiffEqExplicitAdamsIntegrator`	Explicit Adams-Bashforth method (provided by `torchdiffeq`)
`CUDATorchDiffEqImplicitAdamsIntegrator`	Implicit Adams-Bashforth-Moulton method (provided by `torchdiffeq`)
`CUDATorchDiffEqMidpointIntegrator`	Midpoint method (provided by `torchdiffeq`)
`CUDATorchDiffEqRK4Integrator`	Fourth-order Runge-Kutta with 3/8 rule (provided by `torchdiffeq`)

Note

To use Torch-based integrators, users need to install torchdiffeq (e.g., with pip install torchdiffeq). This is an optional dependency of CUDA-Q, thus will not be installed by default.

Note

If you are using CUDA 12.8 on Blackwell, you may need to install nightly torch.

See Blackwell Torch Dependencies for more information.

Warning

Torch-based integrators require a CUDA-enabled Torch installation. Depending on your platform (e.g., aarch64), the default Torch pip package may not have CUDA support.

The below command can be used to verify your installation:

python3 -c "import torch; print(torch.version.cuda)"

If the output is a ‘None’ string, it indicates that your Torch installation does not support CUDA. In this case, you need to install a CUDA-enabled Torch package via other mechanisms, e.g., building Torch from source or using their Docker images.

For C++, CUDA-Q provides Runge-Kutta integrator, to be used with the dynamics backend target.

Numerical Integrators¶
Name	Description
`runge_kutta`	1st-order (Euler method), 2nd-order (Midpoint method), and 4th-order (classical Runge-Kutta method).

Batch simulation¶

CUDA-Q dynamics target supports batch simulation, which allows users to run multiple simulations simultaneously. This batching capability applies to (1) multiple initial states and/or (2) multiple Hamiltonians.

Batching can significantly improve performance when simulating many small identical system dynamics, e.g., parameter sweeping or tomography.

For example, we can simulate the time evolution of multiple initial states with the same Hamiltonian as follows:

Python

import cudaq
import cupy as cp
import numpy as np
from cudaq import spin, Schedule, RungeKuttaIntegrator
# Set the target to our dynamics simulator
cudaq.set_target("dynamics")

# Qubit Hamiltonian
hamiltonian = 2 * np.pi * 0.1 * spin.x(0)

# Dimensions of sub-system. We only have a single degree of freedom of dimension 2 (two-level system).
dimensions = {0: 2}

# Initial states in the `SIC-POVM` set: https://en.wikipedia.org/wiki/SIC-POVM
psi_1 = cudaq.State.from_data(cp.array([1.0, 0.0], dtype=cp.complex128))
psi_2 = cudaq.State.from_data(
    cp.array([1.0 / np.sqrt(3.0), np.sqrt(2.0 / 3.0)], dtype=cp.complex128))
psi_3 = cudaq.State.from_data(
    cp.array([
        1.0 / np.sqrt(3.0),
        np.sqrt(2.0 / 3.0) * np.exp(1j * 2.0 * np.pi / 3.0)
    ],
             dtype=cp.complex128))
psi_4 = cudaq.State.from_data(
    cp.array([
        1.0 / np.sqrt(3.0),
        np.sqrt(2.0 / 3.0) * np.exp(1j * 4.0 * np.pi / 3.0)
    ],
             dtype=cp.complex128))

# We run the evolution for all the SIC state to determine the process tomography.
sic_states = [psi_1, psi_2, psi_3, psi_4]
# Schedule of time steps.
steps = np.linspace(0, 10, 101)
schedule = Schedule(steps, ["time"])

# Run the batch simulation.
evolution_results = cudaq.evolve(
    hamiltonian,
    dimensions,
    schedule,
    sic_states,
    observables=[spin.x(0), spin.y(0), spin.z(0)],
    collapse_operators=[],
    store_intermediate_results=cudaq.IntermediateResultSave.EXPECTATION_VALUE,
    integrator=RungeKuttaIntegrator())

C++

  // Qubit Hamiltonian
  auto hamiltonian = 2 * M_PI * 0.1 * cudaq::spin_op::x(0);

  // A single qubit with dimension 2.
  cudaq::dimension_map dimensions = {{0, 2}};

  // Initial states in the `SIC-POVM` set:
  // https://en.wikipedia.org/wiki/SIC-POVM
  auto psi_1 =
      cudaq::state::from_data(std::vector<std::complex<double>>{1.0, 0.0});
  auto psi_2 = cudaq::state::from_data(std::vector<std::complex<double>>{
      1.0 / std::sqrt(3.0), std::sqrt(2.0 / 3.0)});
  auto psi_3 = cudaq::state::from_data(std::vector<std::complex<double>>{
      1.0 / std::sqrt(3.0),
      std::sqrt(2.0 / 3.0) *
          std::exp(std::complex<double>{0.0, 1.0} * 2.0 * M_PI / 3.0)});
  auto psi_4 = cudaq::state::from_data(std::vector<std::complex<double>>{
      1.0 / std::sqrt(3.0),
      std::sqrt(2.0 / 3.0) *
          std::exp(std::complex<double>{0.0, 1.0} * 4.0 * M_PI / 3.0)});
  // We run the evolution for all the SIC state to determine the process
  // tomography.
  std::vector<cudaq::state> sic_states = {psi_1, psi_2, psi_3, psi_4};

  // Schedule of time steps.
  std::vector<double> steps = cudaq::linspace(0.0, 10.0, 101);
  cudaq::schedule schedule(steps);

  // A default Runge-`Kutta` integrator
  cudaq::integrators::runge_kutta integrator;

  // Run the batch simulation.
  auto evolve_results = cudaq::evolve(
      hamiltonian, dimensions, schedule, sic_states, integrator, {},
      {cudaq::spin_op::x(0), cudaq::spin_op::y(0), cudaq::spin_op::z(0)},
      cudaq::IntermediateResultSave::ExpectationValue);

Similarly, we can also batch simulate the time evolution of multiple Hamiltonians as follows:

Python

import cudaq
import cupy as cp
import numpy as np
from cudaq import spin, Schedule, ScalarOperator, RungeKuttaIntegrator
# Set the target to our dynamics simulator
cudaq.set_target("dynamics")

# Dimensions of sub-system.
dimensions = {0: 2}

# Qubit resonant frequency
omega_z = 10.0 * 2 * np.pi

# Transverse term
omega_x = 2 * np.pi

# Harmonic driving frequency (sweeping in the +/- 10% range around the resonant frequency).
omega_drive = np.linspace(0.1 * omega_z, 1.1 * omega_z, 16)

# Initial state of the system (ground state).
psi0 = cudaq.State.from_data(cp.array([1.0, 0.0], dtype=cp.complex128))

# Batch the Hamiltonian operator together
hamiltonians = [
    0.5 * omega_z * spin.z(0) + omega_x *
    ScalarOperator(lambda t, omega=omega: np.cos(omega * t)) * spin.x(0)
    for omega in omega_drive
]

# Initial states for each Hamiltonian in the batch.
# Here, we use the ground state for all Hamiltonians.
initial_states = [psi0] * len(hamiltonians)

# Schedule of time steps.
steps = np.linspace(0, 0.5, 5000)
schedule = Schedule(steps, ["t"])

# Run the batch simulation.
evolution_results = cudaq.evolve(
    hamiltonians,
    dimensions,
    schedule,
    initial_states,
    observables=[spin.x(0), spin.y(0), spin.z(0)],
    collapse_operators=[],
    store_intermediate_results=cudaq.IntermediateResultSave.EXPECTATION_VALUE,
    integrator=RungeKuttaIntegrator())

In this example, we show the most generic batching capability, where each Hamiltonian in the batch corresponds to a specific initial state. In other words, the vector of Hamiltonians and the vector of initial states are of the same length. If only one initial state is provided, it will be used for all Hamiltonians in the batch.

C++

  // Dimensions of sub-system
  cudaq::dimension_map dimensions = {{0, 2}};
  // Qubit resonant frequency
  const double omega_z = 10.0 * 2 * M_PI;
  // Transverse driving term
  const double omega_x = 2 * M_PI;
  // Harmonic driving frequency (sweeping in the +/- 10% range around the
  // resonant frequency).
  const auto omega_drive = cudaq::linspace(0.9 * omega_z, 1.1 * omega_z, 16);
  const auto zero_state =
      cudaq::state::from_data(std::vector<std::complex<double>>{1.0, 0.0});
  // List of Hamiltonians to be batched together
  std::vector<cudaq::spin_op> hamiltonians;

  for (const auto &omega : omega_drive) {
    auto mod_func =
        [omega](const cudaq::parameter_map &params) -> std::complex<double> {
      auto it = params.find("t");
      if (it != params.end()) {
        double t = it->second.real();
        return std::cos(omega * t);
      }
      throw std::runtime_error("Cannot find the time parameter.");
    };

    // Add the Hamiltonian for each drive frequency to the batch.
    hamiltonians.emplace_back(0.5 * omega_z * cudaq::spin_op::z(0) +
                              mod_func * cudaq::spin_op::x(0) * omega_x);
  }

  // The qubit starts in the |0> state for all operators in the batch.
  std::vector<cudaq::state> initial_states(hamiltonians.size(), zero_state);
  // Schedule of time steps
  const std::vector<double> steps = cudaq::linspace(0.0, 0.5, 5000);
  // The schedule carries the time parameter `labelled` `t`, which is used by
  // the callback.
  cudaq::schedule schedule(steps, {"t"});

  // A default Runge-`Kutta` integrator (4`th` order) with time step `dt`
  // depending on the schedule.
  cudaq::integrators::runge_kutta integrator;

  // Run the batch simulation.
  auto evolve_results = cudaq::evolve(
      hamiltonians, dimensions, schedule, initial_states, integrator, {},
      {cudaq::spin_op::x(0), cudaq::spin_op::y(0), cudaq::spin_op::z(0)},
      cudaq::IntermediateResultSave::ExpectationValue);

The results of the batch simulation will be returned as a list of evolve result objects, one for each Hamiltonian in the batch. For example, we can extract the time evolution results of the expectation values for each Hamiltonian in the batch as follows:

Python

# The results of the batched evolution is an array of evolution results,
# one for each Hamiltonian operator in the batch.

# For example, we can split the results into separate arrays for each observable.
all_exp_val_x = []
all_exp_val_y = []
all_exp_val_z = []
# Iterate over the evolution results in the batch:
for evolution_result in evolution_results:
    # Extract the expectation values for each observable at the respective Hamiltonian operator in the batch.
    exp_val_x = [
        exp_vals[0].expectation()
        for exp_vals in evolution_result.expectation_values()
    ]
    exp_val_y = [
        exp_vals[1].expectation()
        for exp_vals in evolution_result.expectation_values()
    ]
    exp_val_z = [
        exp_vals[2].expectation()
        for exp_vals in evolution_result.expectation_values()
    ]

    # Append the results to the respective lists.
    # These will be nested lists, where each inner list corresponds to the results for a specific Hamiltonian operator in the batch.
    all_exp_val_x.append(exp_val_x)
    all_exp_val_y.append(exp_val_y)
    all_exp_val_z.append(exp_val_z)

The expectation values are returned as a list of lists, where each inner list corresponds to the expectation values for the observables at each time step for the respective Hamiltonian in the batch.

all_exp_val_x = [[0.0, ...], [0.0, ...], ..., [0.0, ...]]
all_exp_val_y = [[0.0, ...], [0.0, ...], ..., [0.0, ...]]
all_exp_val_z = [[1.0, ...], [1.0, ...], ..., [1.0, ...]]

C++

  // The results of the batched evolution is an array of evolution results, one
  // for each Hamiltonian operator in the batch.

  // For example, we can split the results into separate arrays for each
  // observable. These will be nested lists, where each inner list corresponds
  // to the results for a specific Hamiltonian operator in the batch.
  std::vector<std::vector<double>> all_exp_val_x;
  std::vector<std::vector<double>> all_exp_val_y;
  std::vector<std::vector<double>> all_exp_val_z;
  // Iterate over the evolution results in the batch:
  for (auto &evolution_result : evolve_results) {
    // Extract the expectation values for each observable at the respective
    // Hamiltonian operator in the batch.
    std::vector<double> exp_val_x, exp_val_y, exp_val_z;
    for (auto &exp_vals : evolution_result.expectation_values.value()) {
      exp_val_x.push_back(exp_vals[0].expectation());
      exp_val_y.push_back(exp_vals[1].expectation());
      exp_val_z.push_back(exp_vals[2].expectation());
    }

    // Append the results to the respective lists.
    all_exp_val_x.push_back(exp_val_x);
    all_exp_val_y.push_back(exp_val_y);
    all_exp_val_z.push_back(exp_val_z);
  }

The expectation values are returned as a list of lists, where each inner list corresponds to the expectation values for the observables at each time step for the respective Hamiltonian in the batch.

all_exp_val_x = [[0.0, ...], [0.0, ...], ..., [0.0, ...]]
all_exp_val_y = [[0.0, ...], [0.0, ...], ..., [0.0, ...]]
all_exp_val_z = [[1.0, ...], [1.0, ...], ..., [1.0, ...]]

Collapse operators and super-operators can also be batched in a similar manner. Specifically, if the collapse_operators parameter is a nested list of operators (\(\{\{L\}_1, \{\{L\}_2, ...\}\)), then each set of collapsed operators in the list will be applied to the corresponding Hamiltonian in the batch.

In order for all Hamiltonians to be batched, they must have the same structure, i.e., same number of product terms and those terms must act on the same degrees of freedom. The order of the terms in the Hamiltonian does not matter, nor do the coefficient values/callback functions and the specific operators on those product terms. Here are a couple of examples of Hamiltonians that can or cannot be batched:

First Hamiltonian	Second Hamiltonian	Batchable?
\(H_1 = \omega_1 \sigma_z(0)\)	\(H_2 = \omega_2 \sigma_z(0)\)	Yes (different coefficients, same operator)
\(H_1 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_x(1)\)	\(H_2 = \omega_z \sigma_z(0) + \sin(\omega_xt) \sigma_x(1)\)	Yes (same structure, different callback coefficients)
\(H_1 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_x(1)\)	\(H_2 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_y(1)\)	Yes (different operators on the same degree of freedom)
\(H_1 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_x(1)\)	\(H_2 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_x(1) + \cos(\omega_yt) \sigma_y(1)\)	No (different number of product terms)
\(H_1 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_{xx}(0, 1)\)	\(H_2 = \omega_z \sigma_z(0) + \cos(\omega_xt) \sigma_x(0)\sigma_x(1)\)	No (different structures, two-body operators vs. tensor product of single-body operators)

When the Hamiltonians are not batchable, CUDA-Q will still run the simulations, but each Hamiltonian will be simulated separately in a sequential manner. CUDA-Q will log a warning “The input Hamiltonian and collapse operators are not compatible for batching. Running the simulation in non-batched mode.”, when that happens.

Note

Depending on the number of Hamiltonian operators together with factors such as the integrator, schedule step size, and whether intermediate results are stored, the batch simulation can be memory-intensive. If you encounter out-of-memory issues, the max_batch_size parameter can be used to limit the number of Hamiltonians that are batched together in one run. For example, if you set max_batch_size=2, then we will run the simulations in batches of 2 Hamiltonians at a time, i.e., the first two Hamiltonians will be simulated together, then the next two, and so on.

Python

# Run the batch simulation with a maximum batch size of 2.
# This means that the evolution will be performed in batches of 2 Hamiltonian operators at a time, which can be useful for memory management or
# performance tuning.
results = cudaq.evolve(
    hamiltonians,
    dimensions,
    schedule,
    initial_states,
    observables=[spin.x(0), spin.y(0), spin.z(0)],
    collapse_operators=[],
    store_intermediate_results=cudaq.IntermediateResultSave.EXPECTATION_VALUE,
    integrator=RungeKuttaIntegrator(),
    max_batch_size=2)  # Set the maximum batch size to 2

C++

  // Run the batch simulation with a maximum batch size of 2.
  // This means that the evolution will be performed in batches of 2 Hamiltonian
  // operators at a time, which can be useful for memory management or
  // performance tuning.
  auto results = cudaq::evolve(
      hamiltonians, dimensions, schedule, initial_states, integrator, {},
      {cudaq::spin_op::x(0), cudaq::spin_op::y(0), cudaq::spin_op::z(0)},
      cudaq::IntermediateResultSave::ExpectationValue, /*max_batch_size=*/2);

Multi-GPU Multi-Node Execution¶

CUDA-Q dynamics target supports parallel execution on multiple GPUs. To enable parallel execution, the application must initialize MPI as follows.

Python

cudaq.mpi.initialize()

# Set the target to our dynamics simulator
cudaq.set_target("dynamics")

# Initial state (expressed as an enum)
psi0 = cudaq.dynamics.InitialState.ZERO

# Run the simulation
evolution_result = cudaq.evolve(
    H,
    dimensions,
    schedule,
    psi0,
    observables=[],
    collapse_operators=[],
    store_intermediate_results=cudaq.IntermediateResultSave.NONE,
    integrator=RungeKuttaIntegrator())

cudaq.mpi.finalize()

mpiexec -np <N> python3 program.py

where N is the number of processes.

C++

    cudaq::mpi::initialize();
    // Initial state (expressed as an enum)
    auto psi0 = cudaq::InitialState::ZERO;

    // Run the simulation
    auto evolution_result =
        cudaq::evolve(H, dimensions, schedule, psi0, integrator);

    cudaq::mpi::finalize();

nvq++ --target dynamics example.cpp -o a.out
mpiexec -np <N> a.out

where N is the number of processes.

By initializing the MPI execution environment (via cudaq.mpi.initialize()) in the application code and invoking it via an MPI launcher, we have activated the multi-node multi-GPU feature of the dynamics target. Specifically, it will detect the number of processes (GPUs) and distribute the computation across all available GPUs.

Note

The number of MPI processes must be a power of 2, one GPU per process.

Note

Not all integrators are capable of handling distributed state. Errors will be raised if parallel execution is activated but the selected integrator does not support distributed state.

Examples¶

The Dynamics Examples section of the docs contains a number of excellent dynamics examples demonstrating how to simulate basic physics models, specific qubit modalities, and utilize multi-GPU multi-Node capabilities.