Simulations with cuQuantum

CUDA-Q provides support for cuQuantum-accelerated state vector and tensor network simulations. Let’s take a look at an example that is too large for a standard CPU-only simulator, but can be trivially simulated via a NVIDIA GPU-accelerated backend:

# This example is meant to demonstrate the cuQuantum
# GPU-accelerated backends and their ability to easily handle
# a larger number of qubits compared the CPU-only backend.
#
# This will take a noticeably longer time to execute on
# CPU-only backends.

import cudaq

qubit_count = 5
# We can set a larger `qubit_count` if running on a GPU backend.
# qubit_count = 28


@cudaq.kernel
def kernel(qubit_count: int):
    qvector = cudaq.qvector(qubit_count)
    h(qvector)
    for qubit in range(qubit_count - 1):
        x.ctrl(qvector[qubit], qvector[qubit + 1])
    mz(qvector)


result = cudaq.sample(kernel, qubit_count, shots_count=100)

if (not cudaq.mpi.is_initialized()) or (cudaq.mpi.rank() == 0):
    print(result)

Here we generate a GHZ state on 28 qubits. The built-in cuQuantum state vector backend is selected by default if a local GPU is present. Alternatively, the target may be manually set through the cudaq.set_target("nvidia") command.

// Compile and run with:
// ```
// nvq++ cuquantum_backends.cpp -o dyn.x --target nvidia && ./dyn.x
// ```

// This example is meant to demonstrate the cuQuantum
// GPU-accelerated backends and their ability to easily handle
// a larger number of qubits compared the CPU-only backend.

// On CPU-only backends, this seems to hang, i.e., it takes a long
// time to handle this number of qubits.

#include <cudaq.h>

// Define a quantum kernel with a runtime parameter
struct ghz {
  auto operator()(const int N) __qpu__ {

    // Dynamically sized vector of qubits
    cudaq::qvector q(N);
    h(q[0]);
    for (int i = 0; i < N - 1; i++) {
      x<cudaq::ctrl>(q[i], q[i + 1]);
    }
    mz(q);
  }
};

int main() {
  auto counts = cudaq::sample(/*shots=*/100, ghz{}, 28);

  if (!cudaq::mpi::is_initialized() || cudaq::mpi::rank() == 0) {
    counts.dump();

    // Fine grain access to the bits and counts
    for (auto &[bits, count] : counts) {
      printf("Observed: %s, %lu\n", bits.data(), count);
    }
  }

  return 0;
}

Here we generate a GHZ state on 28 qubits. To run with the built-in cuQuantum state vector support, we pass the --target nvidia flag at compile time:

nvq++ --target nvidia cuquantum_backends.cpp -o ghz.x
./ghz.x

Alternatively, we can set the environment variable CUDAQ_DEFAULT_SIMULATOR to nvidia.