NVIDIA Quantum Cloud¶
NVIDIA Quantum Cloud (NVQC) offers universal access to the world’s most powerful computing platform, for every quantum researcher to do their life’s work. To learn more about NVQC visit this link.
Apply for early access here. Access to the Quantum Cloud early access program requires an NVIDIA Developer account.
Quick Start¶
Once you have been approved for an early access to NVQC, you will be able to follow these instructions to use it.
1. Follow the instructions in your NVQC Early Access welcome email to obtain an API Key for NVQC. You can also find the instructions here (link available only for approved users)
Set the environment variable
NVQC_API_KEY
to the API Key obtained above.
export NVQC_API_KEY=<your NVQC API key>
You may wish to persist that environment variable between bash sessions, e.g., by adding it to your $HOME/.bashrc
file.
Run your first NVQC example
The following is a typical CUDA-Q kernel example.
By selecting the nvqc
target, the quantum circuit simulation will run on NVQC in the cloud, rather than running locally.
import cudaq
cudaq.set_target("nvqc")
num_qubits = 25
# Define a simple quantum kernel to execute on NVQC.
kernel = cudaq.make_kernel()
qubits = kernel.qalloc(num_qubits)
# Maximally entangled state between 25 qubits.
kernel.h(qubits[0])
for i in range(num_qubits - 1):
kernel.cx(qubits[i], qubits[i + 1])
kernel.mz(qubits)
counts = cudaq.sample(kernel)
print(counts)
[2024-03-14 19:26:31.438] Submitting jobs to NVQC service with 1 GPU(s). Max execution time: 3600 seconds (excluding queue wait time).
================ NVQC Device Info ================
GPU Device Name: "NVIDIA H100 80GB HBM3"
CUDA Driver Version / Runtime Version: 12.2 / 11.8
Total global memory (GB): 79.1
Memory Clock Rate (MHz): 2619.000
GPU Clock Rate (MHz): 1980.000
==================================================
{ 1111111111111111111111111:486 0000000000000000000000000:514 }
#include <cudaq.h>
// Define a simple quantum kernel to execute on NVQC.
struct ghz {
// Maximally entangled state between 25 qubits.
auto operator()() __qpu__ {
constexpr int NUM_QUBITS = 25;
cudaq::qvector q(NUM_QUBITS);
h(q[0]);
for (int i = 0; i < NUM_QUBITS - 1; i++) {
x<cudaq::ctrl>(q[i], q[i + 1]);
}
auto result = mz(q);
}
};
int main() {
auto counts = cudaq::sample(ghz{});
counts.dump();
}
The code above is saved in nvqc_intro.cpp
and compiled with the following command, targeting the nvqc
platform
nvq++ nvqc_intro.cpp -o nvqc_intro.x --target nvqc
./nvqc_intro.x
[2024-03-14 19:25:05.545] Submitting jobs to NVQC service with 1 GPU(s). Max execution time: 3600 seconds (excluding queue wait time).
================ NVQC Device Info ================
GPU Device Name: "NVIDIA H100 80GB HBM3"
CUDA Driver Version / Runtime Version: 12.2 / 11.8
Total global memory (GB): 79.1
Memory Clock Rate (MHz): 2619.000
GPU Clock Rate (MHz): 1980.000
==================================================
{
__global__ : { 1111111111111111111111111:487 0000000000000000000000000:513 }
result : { 1111111111111111111111111:487 0000000000000000000000000:513 }
}
Simulator Backend Selection¶
NVQC hosts all CUDA-Q simulator backends (see CUDA-Q Simulation Backends).
You may use the NVQC backend
(Python) or --nvqc-backend
(C++) option to select the simulator to be used by the service.
For example, to request the tensornet
simulator backend, the user can do the following for C++ or Python.
cudaq.set_target("nvqc", backend="tensornet")
nvq++ nvqc_sample.cpp -o nvqc_sample.x --target nvqc --nvqc-backend tensornet
Note
By default, the single-GPU single-precision custatevec-fp32
simulator backend will be selected if backend information is not specified.
Multiple GPUs¶
Some CUDA-Q simulator backends are capable of multi-GPU distribution as detailed in CUDA-Q Simulation Backends.
For example, the nvidia-mgpu
backend can partition and distribute state vector simulation to multiple GPUs to simulate
a larger number of qubits, whose state vector size grows beyond the memory size of a single GPU.
To select a specific number of GPUs on the NVQC managed service, the following ngpus
(Python) or --nvqc-ngpus
(C++) option can be used.
cudaq.set_target("nvqc", backend="nvidia-mgpu", ngpus=4)
nvq++ nvqc_sample.cpp -o nvqc_sample.x --target nvqc --nvqc-backend nvidia-mgpu --nvqc-ngpus 4
Note
If your NVQC subscription does not contain service instances that have the specified number of GPUs, you may encounter the following error.
Unable to find NVQC deployment with 16 GPUs.
Available deployments have {1, 2, 4, 8} GPUs.
Please check your `ngpus` value (Python) or `--nvqc-ngpus` value (C++).
Note
Not all simulator backends are capable of utilizing multiple GPUs. When requesting a multiple-GPU service with a single-GPU simulator backend, you might encounter the following log message:
The requested backend simulator (custatevec-fp32) is not capable of using all 4 GPUs requested.
Only one GPU will be used for simulation.
Please refer to CUDA-Q documentation for a list of multi-GPU capable simulator backends.
Consider removing the ngpus
value (Python) or --nvqc-ngpus
value (C++) to use the default of 1 GPU
if you don’t need to use a multi-GPU backend to better utilize NVQC resources.
Please refer to the table below for a list of backend simulator names along with its multi-GPU capability.
Name |
Description |
GPU Accelerated |
Multi-GPU |
---|---|---|---|
|
CPU-only state vector simulator |
no |
no |
|
CPU-only density matrix simulator |
no |
no |
|
Single-precision |
yes |
no |
|
Double-precision |
yes |
no |
|
Double-precision |
yes |
yes |
|
Double-precision |
yes |
no |
|
Double-precision |
yes |
yes |
Multiple QPUs Asynchronous Execution¶
NVQC provides scalable QPU virtualization services, whereby clients can submit asynchronous jobs simultaneously to NVQC. These jobs are handled by a pool of service worker instances.
For example, in the following code snippet, using the nqpus
(Python) or --nvqc-nqpus
(C++) configuration option,
the user instantiates 3 virtual QPU instances to submit simulation jobs to NVQC
calculating the expectation value along with parameter-shift gradients simultaneously.
import cudaq
from cudaq import spin
import math
# Use NVQC with 3 virtual QPUs
cudaq.set_target("nvqc", nqpus=3)
print("Number of QPUs:", cudaq.get_target().num_qpus())
# Create the parameterized ansatz
kernel, theta = cudaq.make_kernel(float)
qreg = kernel.qalloc(2)
kernel.x(qreg[0])
kernel.ry(theta, qreg[1])
kernel.cx(qreg[1], qreg[0])
# Define its spin Hamiltonian.
hamiltonian = (5.907 - 2.1433 * spin.x(0) * spin.x(1) -
2.1433 * spin.y(0) * spin.y(1) + 0.21829 * spin.z(0) -
6.125 * spin.z(1))
def opt_gradient(parameter_vector):
# Evaluate energy and gradient on different remote QPUs
# (i.e., concurrent job submissions to NVQC)
energy_future = cudaq.observe_async(kernel,
hamiltonian,
parameter_vector[0],
qpu_id=0)
plus_future = cudaq.observe_async(kernel,
hamiltonian,
parameter_vector[0] + 0.5 * math.pi,
qpu_id=1)
minus_future = cudaq.observe_async(kernel,
hamiltonian,
parameter_vector[0] - 0.5 * math.pi,
qpu_id=2)
return (energy_future.get().expectation(), [
(plus_future.get().expectation() - minus_future.get().expectation()) /
2.0
])
optimizer = cudaq.optimizers.LBFGS()
optimal_value, optimal_parameters = optimizer.optimize(1, opt_gradient)
print("Ground state energy =", optimal_value)
print("Optimal parameters =", optimal_parameters)
#include <cudaq.h>
#include <cudaq/algorithm.h>
#include <cudaq/gradients.h>
#include <cudaq/optimizers.h>
#include <iostream>
int main() {
using namespace cudaq::spin;
cudaq::spin_op h = 5.907 - 2.1433 * x(0) * x(1) - 2.1433 * y(0) * y(1) +
.21829 * z(0) - 6.125 * z(1);
auto [ansatz, theta] = cudaq::make_kernel<double>();
auto q = ansatz.qalloc();
auto r = ansatz.qalloc();
ansatz.x(q);
ansatz.ry(theta, r);
ansatz.x<cudaq::ctrl>(r, q);
// Run VQE with a gradient-based optimizer.
// Delegate cost function and gradient computation across different NVQC-based
// QPUs.
// Note: this needs to be compiled with `--nvqc-nqpus 3` create 3 virtual
// QPUs.
cudaq::optimizers::lbfgs optimizer;
auto [opt_val, opt_params] = optimizer.optimize(
/*dim=*/1, /*opt_function*/ [&](const std::vector<double> ¶ms,
std::vector<double> &grads) {
// Queue asynchronous jobs to do energy evaluations across multiple QPUs
auto energy_future =
cudaq::observe_async(/*qpu_id=*/0, ansatz, h, params[0]);
const double paramShift = M_PI_2;
auto plus_future = cudaq::observe_async(/*qpu_id=*/1, ansatz, h,
params[0] + paramShift);
auto minus_future = cudaq::observe_async(/*qpu_id=*/2, ansatz, h,
params[0] - paramShift);
grads[0] = (plus_future.get().expectation() -
minus_future.get().expectation()) /
2.0;
return energy_future.get().expectation();
});
std::cout << "Minimum energy = " << opt_val << " (expected -1.74886).\n";
}
The code above is saved in nvqc_vqe.cpp
and compiled with the following command, targeting the nvqc
platform with 3 virtual QPUs.
nvq++ nvqc_vqe.cpp -o nvqc_vqe.x --target nvqc --nvqc-nqpus 3
./nvqc_vqe.x
Note
The NVQC managed-service has a pool of worker instances processing incoming requests on a first-come-first-serve basis. Thus, the attainable speedup using multiple virtual QPUs vs. sequential execution on a single QPU depends on the NVQC service load. For example, if the number of free workers is greater than the number of requested virtual QPUs, a linear (ideal) speedup could be achieved. On the other hand, if all the service workers are busy, multi-QPU distribution may not deliver any substantial speedup.
FAQ¶
How do I get more information about my NVQC API submission?
The environment variable NVQC_LOG_LEVEL
can be used to turn on and off
certain logs. There are three levels:
Info (
info
): basic information about NVQC is logged to the console. This is the default.Off (
off
or0
): disable all NVQC logging.Trace: (
trace
): log additional information for each NVQC job execution (including timing)
I want to persist my API key to a configuration file.
You may persist your NVQC API Key to a credential configuration file in lieu of
using the NVQC_API_KEY
environment variable.
The configuration file can be generated as follows, replacing
the api_key
value with your NVQC API Key.
echo "key: <api_key>" >> $HOME/.nvqc_config