1. Machine Model

CUDA Quantum presumes the existence of one or more classical host processors, zero or more NVIDIA graphics processing units (GPUs), and zero or more quantum processing units (QPUs). Each QPU is composed of a classical quantum control system (distributed FPGAs, GPUs, etc.) and a register of quantum bits (qubits) whose state evolves via signals from the classical control system. This model enables potential quantum process parallelism (concurrent execution of quantum circuits on independent QPUs), dependent quantum parallelism via quantum message passing and inter-QPU entanglement, and QPU thread-level parallelism (the ability for concurrent execution of independent quantum circuits on a single QPU qubit connectivity fabric). The model assumes a classical memory space for the host processor which inherits the C++ memory model semantics. The model assumes a classical memory space for each control system driving the evolution of the multi-qubit state. This control system memory space should enable primitive arithmetic variable declarations, stores, and loads, as well as qubit measurement persistence and loading for fast-feedback and conditional circuit execution. The quantum memory space on an individual QPU is modeled as an infinite register of logical qubits and physical connectivity constraints are hidden from the programmer. Note that compiler implementations of the CUDA Quantum model are free to enable developer access to the details of the backend’s qubit connectivity for the purpose of developing novel placement strategies.

CUDA Quantum considers general D-level quantum information systems, qudits. Qudits are non-copyable, and can be allocated in chunks, called quantum containers. Quantum containers come in two flavors - memory owning and non-owning (views). Moreover, the size of a quantum container can be specified at either compile-time or dynamically at runtime. As all qudits are non-copyable, qudits and containers thereof can only be passed by reference. Each qudit is assigned a unique identifying index, and upon deallocation that index is returned and can be reused by subsequent allocations. Deallocation occurs implicitly when the qudit goes out of scope. Uncomputation of qudit state should occur automatically via compiler implementations of the CUDA Quantum model, but programmers should be able to specify manual uncomputation at qudit instantiation.

The CUDA Quantum model considers both remotely hosted QPU execution models as well as tightly-coupled quantum-classical architectures. Remotely hosted models support batch circuit execution where each circuit may contain simple classical-quantum integration (primitive measure-reset and simple conditional statements). Tightly-coupled execution models provide streaming instruction execution, measure-reset, and fast-feedback of qubit measurement results. This multi-modal execution model directly influences the syntax and semantics of quantum kernel expressions and their associated host-code context.