Tensor Network Simulators¶

CUDA-Q provides a couple of tensor-network simulator backends accelerated with the cuTensorNet library. Detailed technical information on the simulator can be found here. These backends are available for use from both C++ and Python.

Tensor network simulators are suitable for large-scale simulation of certain classes of quantum circuits involving many qubits beyond the memory limit of state vector based simulators. For example, computing the expectation value of a Hamiltonian via cudaq::observe can be performed efficiently, thanks to cuTensorNet contraction optimization capability. On the other hand, conditional circuits, i.e., those with mid-circuit measurements or reset, despite being supported by both backends, may result in poor performance.

Multi-GPU multi-node¶

The tensornet backend represents quantum states and circuits as tensor networks in an exact form (no approximation). Measurement samples and expectation values are computed via tensor network contractions. This backend supports multi-GPU, multi-node distribution of tensor operations required to evaluate and simulate the circuit.

The code:tensornet target supports both single and double floating point precision.

To execute a program on the tensornet target using a single GPU, use the following commands:

Python

Double Precision (Default):

python3 program.py [...] --target tensornet

Single Precision:

python3 program.py [...] --target tensornet --target-option fp32

The target can also be defined in the application code by calling

cudaq.set_target('tensornet')

for the default double-precision setting, or

cudaq.set_target('tensornet', option='fp32')

for the single-precision setting.

If a target is set in the application code, this target will override the --target command line flag given during program invocation.

C++

nvq++ --target tensornet program.cpp [...] -o program.x
./program.x

If you have multiple GPUs available on your system, you can use MPI to automatically distribute parallelization across the visible GPUs.

Note

If you installed the CUDA-Q Python wheels, distribution across multiple GPUs is currently not supported for this backend. We will add support for it in future releases. For more information, see this GitHub issue.

Use the following commands to enable distribution across multiple GPUs (adjust the value of the -np flag as needed to reflect available GPU resources on your system):

Python

mpiexec -np 2 python3 program.py [...] --target tensornet

In addition to using MPI in the simulator, you can use it in your application code by installing mpi4py, and invoking the program with the command

mpiexec -np 2 python3 -m mpi4py program.py [...] --target tensornet

C++

nvq++ --target tensornet program.cpp [...] -o program.x
mpiexec -np 2 ./program.x

Note

MPI parallelization on the tensornet backend requires CUDA-Q’s MPI support. Please refer to the instructions on how to enable MPI parallelization within CUDA-Q. CUDA-Q containers are shipped with a pre-built MPI plugin; hence no additional setup is needed.

Note

If the CUTENSORNET_COMM_LIB environment variable is set following the activation procedure described in the cuTensorNet documentation, the cuTensorNet MPI plugin will take precedence over the builtin support from CUDA-Q.

Specific aspects of the simulation can be configured by setting the following of environment variables:

`CUDA_VISIBLE_DEVICES=X`: Makes the process only see GPU X on multi-GPU nodes. Each MPI process must only see its own dedicated GPU. For example, if you run 8 MPI processes on a DGX system with 8 GPUs, each MPI process should be assigned its own dedicated GPU via CUDA_VISIBLE_DEVICES when invoking mpiexec (or mpirun) commands.
`OMP_PLACES=cores`: Set this environment variable to improve CPU parallelization.
`OMP_NUM_THREADS=X`: To enable CPU parallelization, set X to NUMBER_OF_CORES_PER_NODE/NUMBER_OF_GPUS_PER_NODE.
`CUDAQ_TENSORNET_CONTROLLED_RANK=X`: Specify the number of controlled qubits whereby the full tensor body of the controlled gate is expanded. If the number of controlled qubits is greater than this value, the gate is applied as a controlled tensor operator to the tensor network state. Default value is 1.
`CUDAQ_TENSORNET_OBSERVE_CONTRACT_PATH_REUSE=X`: Set this environment variable to TRUE (ON) or FALSE (OFF) to enable or disable contraction path reuse when computing expectation values. Default is OFF.
`CUDAQ_TENSORNET_NUM_HYPER_SAMPLES=X`: Specify the number of hyper samples used in the tensor network contraction path finder. Default value is 8 if not specified.

Note

This backend requires an NVIDIA GPU and CUDA runtime libraries. If you do not have these dependencies installed, you may encounter an error stating Invalid simulator requested. See the section Dependencies and Compatibility for more information about how to install dependencies.

Matrix product state¶

The tensornet-mps backend is based on the matrix product state (MPS) representation of the state vector/wave function, exploiting the sparsity in the tensor network via tensor decomposition techniques such as QR and SVD. As such, this backend is an approximate simulator, whereby the number of singular values may be truncated to keep the MPS size tractable. The tensornet-mps backend only supports single-GPU simulation. Its approximate nature allows the tensornet-mps backend to handle a large number of qubits for certain classes of quantum circuits on a relatively small memory footprint.

The code:tensornet-mps target supports both single and double floating point precision.

To execute a program on the tensornet-mps target, use the following commands:

Python

Double Precision (Default):

python3 program.py [...] --target tensornet-mps

Single Precision:

python3 program.py [...] --target tensornet-mps --target-option fp32

The target can also be defined in the application code by calling

cudaq.set_target('tensornet-mps')

for the default double-precision setting, or

cudaq.set_target('tensornet-mps', option='fp32')

for the single-precision setting.

If a target is set in the application code, this target will override the --target command line flag given during program invocation.

C++

Double Precision (Default):

nvq++ --target tensornet-mps program.cpp [...] -o program.x
./program.x

Single Precision:

nvq++ --target tensornet-mps --target-option fp32 program.cpp [...] -o program.x
./program.x

Specific aspects of the simulation can be configured by defining the following environment variables:

`CUDAQ_MPS_MAX_BOND=X`: The maximum number of singular values to keep (fixed extent truncation). Default: 64.
`CUDAQ_MPS_ABS_CUTOFF=X`: The cutoff for the largest singular value during truncation. Eigenvalues that are smaller will be trimmed out. Default: 1e-5.
`CUDAQ_MPS_RELATIVE_CUTOFF=X`: The cutoff for the maximal singular value relative to the largest eigenvalue. Eigenvalues that are smaller than this fraction of the largest singular value will be trimmed out. Default: 1e-5
`CUDAQ_MPS_SVD_ALGO=X`: The SVD algorithm to use. Valid values are: GESVD (QR algorithm), GESVDJ (Jacobi method), GESVDP (polar decomposition), GESVDR (randomized methods). Default: GESVDJ.
`CUDAQ_MPS_GAUGE=X`: The optional gauge option to improve accuracy of the MPS simulation. Valid values are: FREE (gauge is disabled) or SIMPLE (simple update algorithm). By default, no gauge configuration is set, thus the default cuquantum MPS setting will be used (see cuquantum doc).

Note

This backend requires an NVIDIA GPU and CUDA runtime libraries. If you do not have these dependencies installed, you may encounter an error stating Invalid simulator requested. See the section Dependencies and Compatibility for more information about how to install dependencies.

Note

The parallelism of Jacobi method (the default CUDAQ_MPS_SVD_ALGO setting) gives GPU better performance on small and medium size matrices. If you expect a large number of singular values (e.g., increasing the CUDAQ_MPS_MAX_BOND setting), please adjust the CUDAQ_MPS_SVD_ALGO setting accordingly.

Note

Both tensornet-mps and tensornet backends will allocate a scratch space on GPU device memory for their operations. For example, the scratch space can be used to store the contracted reduced density matrix to generate measurement bit strings.

By default, these backends reserve 50% of free memory for its scratch space. This ratio can be customized using the CUDAQ_TENSORNET_SCRATCH_SIZE_PERCENTAGE environment variable. Valid setting must be between 5% and 95%. Users may encounter runtime errors, e.g., insufficient workspace or CUDA memory allocation errors, when setting CUDAQ_TENSORNET_SCRATCH_SIZE_PERCENTAGE toward its limits.

Note

All floating-point data, e.g., gate matrices, noise channel Kraus operator matrices, contracted state vector, etc., are converted to the target’s precision setting, if not already in that precision format. Hence, users would need to take into account potential precision lost when using the single precision setting.

Fermioniq¶

Fermioniq offers a cloud-based tensor-network emulation platform, Ava, for the approximate simulation of large-scale quantum circuits beyond the memory limit of state vector and exact tensor network based methods.

The level of approximation can be controlled by setting the bond dimension: larger values yield more accurate simulations at the expense of slower computation time. For a detailed description of Ava users are referred to the online documentation.

Users of CUDA-Q can access a simplified version of the full Fermioniq emulator (Ava) from either C++ or Python. This version currently supports emulation of quantum circuits without noise, and can return measurement samples and/or compute expectation values of observables.

Note

In order to use the Fermioniq emulator, users must provide access credentials. These can be requested by contacting info@fermioniq.com

The credentials must be set via two environment variables: FERMIONIQ_ACCESS_TOKEN_ID and FERMIONIQ_ACCESS_TOKEN_SECRET.

Python

The target to which quantum kernels are submitted can be controlled with the cudaq.set_target() function.

cudaq.set_target('fermioniq')

You will have to specify a remote configuration id for the Fermioniq backend during compilation.

cudaq.set_target("fermioniq",**{
    "remote_config": remote_config_id
})

For a comprehensive list of all remote configurations, please contact Fermioniq directly.

When your organization requires you to define a project id, you have to specify the project id during compilation.

cudaq.set_target("fermioniq",**{
    "project_id": project_id
})

C++

To target quantum kernel code for execution in the Fermioniq backends, pass the flag --target fermioniq to the nvq++ compiler. CUDA-Q will authenticate via the Fermioniq REST API using the environment variables set earlier.

nvq++ --target fermioniq src.cpp ...

You will have to specify a remote configuration id for the Fermioniq backend during compilation.

nvq++ --target fermioniq --fermioniq-remote-config <remote_config_id> src.cpp ...

For a comprehensive list of all remote configurations, please contact Fermioniq directly.

When your organization requires you to define a project id, you have to specify the project id during compilation.

nvq++ --target fermioniq --fermioniq-project-id <project_id> src.cpp ...

To specify the bond dimension, you can pass the fermioniq-bond-dim parameter.

nvq++ --target fermioniq --fermioniq-bond-dim 10 src.cpp ...