CUDA-Q QEC C++ API
Code
-
class code : public cudaqx::extension_point<code, const heterogeneous_map&>
Base class for quantum error correcting codes in CUDA-Q.
This class provides the core interface and functionality for implementing quantum error correcting codes in CUDA-Q. It defines the basic operations that any QEC code must support and provides infrastructure for syndrome measurement and error correction experiments.
To implement a new quantum error correcting code:
Create a new class that inherits from code
Implement the protected virtual methods:
Define quantum kernels for each required logical operation (these are the fault tolerant logical operation implementations)
Register the operations in your constructor using the operation_encodings map on the base class
Register your new code type using CUDAQ_REGISTER_TYPE
Example implementation:
__qpu__ void x_kernel(patch p); __qpu__ void z_kernel(patch p); class my_code : public qec::code { protected: std::size_t get_num_data_qubits() const override { return 7; } std::size_t get_num_ancilla_qubits() const override { return 6; } std::size_t get_num_ancilla_x_qubits() const override { return 3; } std::size_t get_num_ancilla_z_qubits() const override { return 3; } public: my_code(const heterogeneous_map& options) : code() { // Can use user-specified options, e.g. auto d = options.get<int>("distance"); operation_encodings.insert(std::make_pair(operation::x, x_kernel)); operation_encodings.insert(std::make_pair(operation::z, z_kernel)); // Register other required operations... // Define the default stabilizers! m_stabilizers = qec::stabilizers({"XXXX", "ZZZZ"}); } CUDAQ_EXTENSION_CUSTOM_CREATOR_FUNCTION( my_code, static std::unique_ptr<qec::code> create(const heterogeneous_map &options) { return std::make_unique<my_code>(options); } ) }; CUDAQ_REGISTER_TYPE(my_code)
Supported quantum operations for error correcting codes
Subclassed by cudaq::qec::repetition::repetition, cudaq::qec::steane::steane, cudaq::qec::surface_code::surface_code
Public Types
-
using one_qubit_encoding = cudaq::qkernel<void(patch)>
Type alias for single qubit quantum kernels.
-
using two_qubit_encoding = cudaq::qkernel<void(patch, patch)>
Type alias for two qubit quantum kernels.
-
using stabilizer_round = cudaq::qkernel<std::vector<cudaq::measure_result>(patch, const std::vector<std::size_t>&, const std::vector<std::size_t>&)>
Type alias for stabilizer measurement kernels.
-
using encoding = std::variant<one_qubit_encoding, two_qubit_encoding, stabilizer_round>
Type alias for quantum operation encodings.
Public Functions
-
virtual std::size_t get_num_data_qubits() const = 0
Get the number of physical data qubits needed for the code.
- Returns:
Number of data qubits
-
virtual std::size_t get_num_ancilla_qubits() const = 0
Get the total number of ancilla qubits needed.
- Returns:
Total number of ancilla qubits
-
virtual std::size_t get_num_ancilla_x_qubits() const = 0
Get number of ancilla qubits needed for X stabilizer measurements.
- Returns:
Number of X-type ancilla qubits
-
virtual std::size_t get_num_ancilla_z_qubits() const = 0
Get number of ancilla qubits needed for Z stabilizer measurements.
- Returns:
Number of Z-type ancilla qubits
-
virtual std::size_t get_num_x_stabilizers() const = 0
Get number of X stabilizer that can be measured.
- Returns:
Number of X-type stabilizers
-
virtual std::size_t get_num_z_stabilizers() const = 0
Get number of Z stabilizer that can be measured.
- Returns:
Number of Z-type stabilizers
-
cudaqx::tensor<uint8_t> get_parity() const
Get the full parity check matrix H = (Hx | Hz)
- Returns:
Tensor representing the parity check matrix
-
cudaqx::tensor<uint8_t> get_parity_x() const
Get the X component of the parity check matrix.
- Returns:
Tensor representing Hx
-
cudaqx::tensor<uint8_t> get_parity_z() const
Get the Z component of the parity check matrix.
- Returns:
Tensor representing Hz
-
cudaqx::tensor<uint8_t> get_pauli_observables_matrix() const
Get Lx stacked on Lz.
- Returns:
Tensor representing pauli observables
-
cudaqx::tensor<uint8_t> get_observables_x() const
Get the Lx observables.
- Returns:
Tensor representing Lx
-
cudaqx::tensor<uint8_t> get_observables_z() const
Get the Lz observables.
- Returns:
Tensor representing Lz
-
inline const std::vector<cudaq::spin_op_term> &get_stabilizers() const
Get the stabilizer generators.
- Returns:
Reference to stabilizers
Public Static Functions
-
static std::unique_ptr<code> get(const std::string &name, const std::vector<cudaq::spin_op_term> &stabilizers, const heterogeneous_map options = {})
Factory method to create a code instance with specified stabilizers.
- Parameters:
name – Name of the code to create
stabilizers – Stabilizer generators for the code
options – Optional code-specific configuration options
- Returns:
Unique pointer to created code instance
-
struct patch
Represents a logical qubit patch for quantum error correction.
This type is for CUDA-Q kernel code only.
This structure defines a patch of qubits used in quantum error correction codes. It consists of data qubits and ancilla qubits for X and Z stabilizer measurements.
-
class repetition : public cudaq::qec::code
Implementation of the repetition quantum error correction code.
Public Functions
-
repetition(const heterogeneous_map&)
Constructs a repetition code instance.
-
repetition(const heterogeneous_map&)
-
class steane : public cudaq::qec::code
Steane code implementation.
Public Functions
-
steane(const heterogeneous_map&)
Constructor for the Steane code.
-
steane(const heterogeneous_map&)
-
class stabilizer_grid
Generates and keeps track of the 2d grid of stabilizers in the rotated surface code. Following same layout convention as in: https://arxiv.org/abs/2311.10687 Grid layout is arranged from left to right, top to bottom (row major storage) grid_length = 4 example:
Each entry on the grid can be an X stabilizer, Z stabilizer, or empty, as is needed on the edges. The grid length of 4 corresponds to a distance 3 surface code, which results in:(0,0) (0,1) (0,2) (0,3) (1,0) (1,1) (1,2) (1,3) (2,0) (2,1) (2,2) (2,3) (3,0) (3,1) (3,2) (3,3)
e(0,0) e(0,1) Z(0,2) e(0,3) X(1,0) Z(1,1) X(1,2) e(1,3) e(2,0) X(2,1) Z(2,2) X(2,3) e(3,0) Z(3,1) e(3,2) e(3,3)
This is seen through the
print_stabilizer_grid()member function. To get rid of the empty sites, theprint_stabilizer_coords()function is used:Z(0,2) X(1,0) Z(1,1) X(1,2) X(2,1) Z(2,2) X(2,3) Z(3,1)
and to get the familiar visualization of the distance three surface code, the
print_stabilizer_indicesresults in:Z0 X0 Z1 X1 X2 Z2 X3 Z3
The data qubits are located at the four corners of each of the weight-4 stabilizers. They are also organized with index increasing from left to right, top to bottom:
d0 d1 d2 d3 d4 d5 d6 d7 d8
Public Functions
-
stabilizer_grid(uint32_t distance)
Construct the grid from the code’s distance.
-
stabilizer_grid()
Empty constructor.
-
void print_stabilizer_grid() const
Print a 2d grid of stabilizer roles.
-
void print_stabilizer_coords() const
Print a 2d grid of stabilizer coords.
-
void print_stabilizer_indices() const
Print a 2d grid of stabilizer indices.
-
void print_data_grid() const
Print a 2d grid of data qubit indices.
-
void print_stabilizer_maps() const
Print the coord <–> indices maps.
-
void print_stabilizers() const
Print the stabilizers in sparse pauli format.
-
std::vector<cudaq::spin_op_term> get_spin_op_stabilizers() const
Get the stabilizers as a vector of cudaq::spin_op_terms.
-
std::vector<cudaq::spin_op_term> get_spin_op_observables() const
Get the observables as a vector of cudaq::spin_op_terms.
Public Members
-
uint32_t distance = 0
The distance of the code determines the number of data qubits per dimension.
-
uint32_t grid_length = 0
length of the stabilizer grid for distance = d data qubits, the stabilizer grid has length d+1
-
std::vector<surface_role> roles
flattened vector of the stabilizer grid sites roles’ grid idx -> role stored in row major order
-
std::vector<vec2d> x_stab_coords
x stab index -> 2d coord
-
std::vector<vec2d> z_stab_coords
z stab index -> 2d coord
-
std::map<vec2d, size_t> x_stab_indices
2d coord -> z stab index
-
std::map<vec2d, size_t> z_stab_indices
2d coord -> z stab index
-
std::vector<vec2d> data_coords
data index -> 2d coord data qubits are in an offset 2D coord system from stabilizers
-
std::map<vec2d, size_t> data_indices
2d coord -> data index
-
std::vector<std::vector<size_t>> x_stabilizers
Each element is an X stabilizer specified by the data qubits it has support on In surface code, can have weight 2 or weight 4 stabs So {x,z}_stabilizer[i].size() == 2 || 4.
-
std::vector<std::vector<size_t>> z_stabilizers
Each element is an Z stabilizer specified by the data qubits it has support on.
-
stabilizer_grid(uint32_t distance)
-
class surface_code : public cudaq::qec::code
surface_code implementation
Public Functions
-
surface_code(const heterogeneous_map&)
Constructor for the surface_code.
- CUDAQ_EXTENSION_CUSTOM_CREATOR_FUNCTION (surface_code, static std::unique_ptr< cudaq::qec::code > create(const cudaqx::heterogeneous_map &options) { return std::make_unique< surface_code >(options);}) stabilizer_grid grid
Extension creator function for the surface_code.
Grid to keep track of topological arrangement of qubits.
-
surface_code(const heterogeneous_map&)
Detector Error Model
-
struct detector_error_model
A detector error model (DEM) for a quantum error correction circuit. A DEM can be created from a QEC circuit and a noise model. It contains information about which errors flip which detectors. This is used by the decoder to help make predictions about observables flips.
Shared size parameters among the matrix types.
detector_error_matrix:num_detectors x num_error_mechanisms [d, e]error_rates:num_error_mechanismsobservables_flips_matrix:num_observables x num_error_mechanisms [k, e]
Note
The C++ API for this class may change in the future. The Python API is more likely to be backwards compatible.
Public Functions
-
std::size_t num_detectors() const
Return the number of rows in the detector_error_matrix.
-
std::size_t num_error_mechanisms() const
Return the number of columns in the detector_error_matrix, error_rates, and observables_flips_matrix.
-
std::size_t num_observables() const
Return the number of rows in the observables_flips_matrix.
-
void canonicalize_for_rounds(uint32_t num_syndromes_per_round)
Put the detector_error_matrix into canonical form, where the rows and columns are ordered in a way that is amenable to the round-based decoding process.
Public Members
-
cudaqx::tensor<uint8_t> detector_error_matrix
The detector error matrix is a specific kind of circuit-level parity-check matrix where each row represents a detector, and each column represents an error mechanism. The entries of this matrix are H[i,j] = 1 if detector i is triggered by error mechanism j, and 0 otherwise.
-
std::vector<double> error_rates
The list of weights has length equal to the number of columns of
detector_error_matrix, which assigns a likelihood to each error mechanism.
-
cudaqx::tensor<uint8_t> observables_flips_matrix
The observables flips matrix is a specific kind of circuit-level parity- check matrix where each row represents a Pauli observable, and each column represents an error mechanism. The entries of this matrix are O[i,j] = 1 if Pauli observable i is flipped by error mechanism j, and 0 otherwise.
-
std::optional<std::vector<std::size_t>> error_ids
Error mechanism ID. From a probability perspective, each error mechanism ID is independent of all other error mechanism ID. For all errors with the same ID, only one of them can happen. That is - the errors containing the same ID are correlated with each other.
-
cudaq::qec::detector_error_model cudaq::qec::dem_from_memory_circuit(const code &code, operation statePrep, std::size_t numRounds, cudaq::noise_model &noise)
Given a memory circuit setup, generate a DEM.
- Parameters:
code – QEC Code to sample
statePrep – Initial state preparation operation
numRounds – Number of stabilizer measurement rounds
noise – Noise model to apply
- Returns:
Detector error model
-
detector_error_model cudaq::qec::x_dem_from_memory_circuit(const code &code, operation statePrep, std::size_t numRounds, cudaq::noise_model &noise)
Given a memory circuit setup, generate a DEM for X stabilizers.
- Parameters:
code – QEC Code to sample
statePrep – Initial state preparation operation
numRounds – Number of stabilizer measurement rounds
noise – Noise model to apply
- Returns:
Detector error model
-
detector_error_model cudaq::qec::z_dem_from_memory_circuit(const code &code, operation statePrep, std::size_t numRounds, cudaq::noise_model &noise)
Given a memory circuit setup, generate a DEM for Z stabilizers.
- Parameters:
code – QEC Code to sample
statePrep – Initial state preparation operation
numRounds – Number of stabilizer measurement rounds
noise – Noise model to apply
- Returns:
Detector error model
Decoder Interfaces
-
class decoder : public cudaqx::extension_point<decoder, const cudaqx::tensor<uint8_t>&, const cudaqx::heterogeneous_map&>
The
decoderbase class should be subclassed by specific decoder implementations. Theheterogeneous_mapprovides a placeholder for arbitrary constructor parameters that can be unique to each specific decoder.Public Functions
-
decoder(const cudaqx::tensor<uint8_t> &H)
Constructor.
- Parameters:
H – Decoder’s parity check matrix represented as a tensor. The tensor is required be rank 2 and must be of dimensions
syndrome_sizexblock_size. will use the sameH.
-
virtual decoder_result decode(const std::vector<float_t> &syndrome) = 0
Decode a single syndrome.
- Parameters:
syndrome – A vector of syndrome measurements where the floating point value is the probability that the syndrome measurement is a |1>. The length of the syndrome vector should be equal to
syndrome_size.- Returns:
Vector of length
block_sizewith soft probabilities of errors in each index.
-
virtual decoder_result decode(const cudaqx::tensor<uint8_t> &syndrome)
Decode a single syndrome.
- Parameters:
syndrome – An order-1 tensor of syndrome measurements where a 1 bit represents that the syndrome measurement is a |1>. The length of the syndrome vector should be equal to
syndrome_size.- Returns:
Vector of length
block_sizeof errors in each index.
-
virtual std::future<decoder_result> decode_async(const std::vector<float_t> &syndrome)
Decode a single syndrome.
- Parameters:
syndrome – A vector of syndrome measurements where the floating point value is the probability that the syndrome measurement is a |1>.
- Returns:
std::future of a vector of length
block_sizewith soft probabilities of errors in each index.
-
virtual std::vector<decoder_result> decode_batch(const std::vector<std::vector<float_t>> &syndrome)
Decode multiple independent syndromes (may be done in serial or parallel depending on the specific implementation)
- Parameters:
syndrome – A vector of
Nsyndrome measurements where the floating point value is the probability that the syndrome measurement is a |1>.- Returns:
2-D vector of size
Nxblock_sizewith soft probabilities of errors in each index.
-
uint32_t get_num_msyn_per_decode() const
Get the number of measurement syndromes per decode call. This depends on D_sparse, so you must have called set_D_sparse() first.
-
void set_O_sparse(const std::vector<std::vector<uint32_t>> &O_sparse)
Set the observable matrix.
-
void set_O_sparse(const std::vector<int64_t> &O_sparse)
Set the observable matrix, using a single long vector with -1 as row terminators.
-
void set_D_sparse(const std::vector<std::vector<uint32_t>> &D_sparse)
Set the D_sparse matrix.
-
void set_D_sparse(const std::vector<int64_t> &D_sparse)
Set the D_sparse matrix, using a single long vector with -1 as row terminators.
-
void set_decoder_id(uint32_t decoder_id)
Set the decoder id.
-
uint32_t get_decoder_id() const
Get the decoder id.
-
bool enqueue_syndrome(const uint8_t *syndrome, std::size_t syndrome_length)
Enqueue a syndrome for decoding (pointer version)
- Returns:
True if enough syndromes have been enqueued to trigger a decode.
-
bool enqueue_syndrome(const std::vector<uint8_t> &syndrome)
Enqueue a syndrome for decoding (vector version)
- Returns:
True if enough syndromes have been enqueued to trigger a decode.
-
const uint8_t *get_obs_corrections() const
Get the current observable corrections.
-
std::size_t get_num_observables() const
Get the number of observables.
-
void clear_corrections()
Clear any stored corrections.
-
void reset_decoder()
Reset the decoder, clearing all per-shot memory and corrections.
-
inline virtual bool supports_graph_dispatch() const
Returns true if this decoder supports graph-based realtime dispatch via capture_decode_graph().
-
inline virtual void *capture_decode_graph(int reserved_sms = 0)
Capture a CUDA graph for realtime dispatch.
Returns a pointer to a cudaq::qec::realtime::graph_resources struct (caller must include realtime/graph_resources.h to interpret it). Returns nullptr if graph dispatch is not supported. The decoder retains ownership of the returned pointer.
-
inline virtual void release_decode_graph(void *graph_resources)
Release graph resources previously returned by capture_decode_graph().
-
virtual ~decoder() = default
Destructor.
-
virtual std::string get_version() const
Get the version of the decoder. Subclasses that are not part of the standard GitHub repo should override this to provide a more tailored version string.
- Returns:
A string containing the version of the decoder
Public Static Functions
-
static std::unique_ptr<decoder> get(const std::string &name, const cudaqx::tensor<uint8_t> &H, const cudaqx::heterogeneous_map ¶m_map = cudaqx::heterogeneous_map())
This
getoverload supports default values.
-
decoder(const cudaqx::tensor<uint8_t> &H)
-
struct decoder_result
Decoder results.
Public Members
-
bool converged = false
Whether or not the decoder converged.
-
std::vector<float_t> result
Vector of length
block_sizewith soft probabilities of errors in each index.
-
std::optional<cudaqx::heterogeneous_map> opt_results
Optional additional results from the decoder stored in a heterogeneous map. For equality comparison, this field is treated as a boolean flag - two decoder_results are considered equal only if both have empty opt_results (either std::nullopt or an empty map). If either result has non-empty opt_results, they are considered not equal.
-
bool converged = false
Built-in Decoders
NVIDIA QLDPC Decoder
- class nv_qldpc_decoder
A general purpose Quantum Low-Density Parity-Check Decoder (QLDPC) decoder based on GPU accelerated belief propagation (BP). Since belief propagation is an iterative method, decoding can be improved with a second-stage post-processing step. Optionally, ordered statistics decoding (OSD) can be chosen to perform the second stage of decoding.
An [[n,k,d]] quantum error correction (QEC) code encodes k logical qubits into an n qubit data block, with a code distance d. Quantum low-density parity-check (QLDPC) codes are characterized by sparse parity-check matrices (or Tanner graphs), corresponding to a bounded number of parity checks per data qubit.
Requires a CUDA-Q compatible GPU. See the CUDA-Q GPU Compatibility List for a list of valid GPU configurations.
References: Decoding Across the Quantum LDPC Code Landscape
Note
It is required to create decoders with the
get_decoderAPI from the CUDA-QX extension points API, such asimport cudaq_qec as qec import numpy as np H = np.array([[1, 0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0, 1], [0, 0, 1, 0, 1, 1, 1]], dtype=np.uint8) # sample 3x7 PCM opts = dict() # see below for options # Note: H must be in row-major order. If you use # `scipy.sparse.csr_matrix.todense()` to get the parity check # matrix, you must specify todense(order='C') to get a row-major # matrix. nvdec = qec.get_decoder('nv-qldpc-decoder', H, **opts)
std::size_t block_size = 7; std::size_t syndrome_size = 3; cudaqx::tensor<uint8_t> H; std::vector<uint8_t> H_vec = {1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1}; H.copy(H_vec.data(), {syndrome_size, block_size}); cudaqx::heterogeneous_map nv_custom_args; nv_custom_args.insert("use_osd", true); // See below for options auto nvdec = cudaq::qec::get_decoder("nv-qldpc-decoder", H, nv_custom_args);
Note
The
"nv-qldpc-decoder"implements thecudaq_qec.Decoderinterface for Python and thecudaq::qec::decoderinterface for C++, so it supports all the methods in those respective classes.- Parameters:
H – Parity check matrix (tensor format)
params –
Heterogeneous map of parameters:
use_sparsity(bool): Whether or not to use a sparse matrix solvererror_rate(double): Probability of an error (in 0-1 range) on a block data bit (defaults to 0.001)error_rate_vec(double): Vector of length “block size” containing the probability of an error (in 0-1 range) on a block data bit (defaults to 0.001). This overrideserror_rate.max_iterations(int): Maximum number of BP iterations to perform (defaults to 30)n_threads(int): Number of CUDA threads to use for the GPU decoder (defaults to smart selection based on parity matrix size)use_osd(bool): Whether or not to use an OSD post processor if the initial BP algorithm fails to converge on a solutionosd_method(int): 1=OSD-0, 2=Exhaustive, 3=Combination Sweep (defaults to 1). Ignored unlessuse_osdis true.osd_order(int): OSD postprocessor order (defaults to 0). Ref: Decoding Across the Quantum LDPC Code LandscapeFor
osd_method=2(Exhaustive), the number of possible permutations searched after OSD-0 grows by 2^osd_order.For
osd_method=3(Combination Sweep), this is the λ parameter. All weight 1 permutations and the first λ bits worth of weight 2 permutations are searched after OSD-0. This is (syndrome_length - block_size + λ * (λ - 1) / 2) additional permutations.For other
osd_methodvalues, this is ignored.
bp_batch_size(int): Number of syndromes that will be decoded in parallel for the BP decoder (defaults to 1)osd_batch_size(int): Number of syndromes that will be decoded in parallel for OSD (defaults to the number of concurrent threads supported by the hardware)iter_per_check(int): Number of iterations between BP convergence checks (defaults to 1, and max ismax_iterations). Introduced in 0.4.0.clip_value(float): Value to clip the BP messages to. Should be a non-negative value (defaults to 0.0, which disables clipping). Introduced in 0.4.0.repeatable(bool): Whether to make the BP algorithm (and Relay BP algorithm if enabled) bit-for-bit repeatable. Defaults to False. You must setclip_valueto a non-zero value to use this option. Setting this option to True makes it run approximately 5-10% slower, but you are guaranteed to get repeatable results, which is often useful for both timing and detailed syndrome analysis. Introduced in 0.6.0.bp_method(int): Core BP algorithm to use (defaults to 0). Introduced in 0.4.0, expanded in 0.5.0:0: sum-product
1: min-sum (introduced in 0.4.0)
2: min-sum+mem (uniform memory strength, requires
use_sparsity=True. Introduced in 0.5.0)3: min-sum+dmem (disordered memory strength, requires
use_sparsity=True. Introduced in 0.5.0)
composition(int): Iteration strategy (defaults to 0). Introduced in 0.5.0:0: Standard (single run)
1: Sequential relay (multiple gamma legs). Requires:
bp_method=3,use_sparsity=True, andsrelay_config
scale_factor(float): The scale factor to use for min-sum. Defaults to 1.0. When set to 0.0, the scale factor is dynamically computed based on the number of iterations. Introduced in 0.4.0.proc_float(string): The processing float type to use. Defaults to “fp64”. Valid values are “fp32” and “fp64”. Introduced in 0.5.0.gamma0(float): Memory strength parameter. Required forbp_method=2, and forcomposition=1(sequential relay). Introduced in 0.5.0.gamma_dist(vector<float>): Gamma distribution interval [min, max] for disordered memory strength. Required forbp_method=3ifexplicit_gammasnot provided. Introduced in 0.5.0.explicit_gammas(vector<vector<float>>): Explicit gamma values for each variable node. Forbp_method=3withcomposition=0, provide a 2D vector where each row hasblock_sizecolumns. Forcomposition=1(Sequential relay), providenum_setsrows (one per relay leg). Overridesgamma_distif provided. Introduced in 0.5.0.srelay_config(heterogeneous_map): Sequential relay configuration (required forcomposition=1). Contains the following parameters. Introduced in 0.5.0:pre_iter(int): Number of pre-iterations to run before relay legsnum_sets(int): Number of relay sets (legs) to runstopping_criterion(string): When to stop relay legs:”All”: Run all legs
”FirstConv”: Stop relay after first convergence
”NConv”: Stop after N convergences (requires
stop_nconvparameter)
stop_nconv(int): Number of convergences to wait for before stopping (required only whenstopping_criterion="NConv")
Note
Starting in version 0.6.0, convergence during the
pre_iterphase counts as a successful convergence towards the stopping criteria. Prior to 0.6.0, convergence during pre-iterations did not count.bp_seed(int): Seed for random number generation used inbp_method=3(disordered memory BP). Optional parameter, defaults to 42 if not provided. Introduced in 0.5.0.opt_results(heterogeneous_map): Optional results to return. This field can be left empty if no additional results are desired. Choices are:bp_llr_history(int): Return the lastbp_llr_historyiterations of the BP LLR history. Minimum value is 0 and maximum value is max_iterations. The actual number of returned iterations might be fewer thanbp_llr_historyif BP converges before the requested number of iterations. Introduced in 0.4.0. Note: Not supported forcomposition=1.num_iter(bool): If true, return the number of BP iterations run. Introduced in 0.5.0.
Sliding Window Decoder
- class sliding_window
The Sliding Window Decoder is a wrapper around a standard decoder that introduces two key differences:
1. Sliding Window Decoding: The decoding process is performed incrementally, one window at a time. The window size is specified by the user. This allows decoding to begin before all syndromes have been received, potentially reducing overall latency in multi-round QEC codes.
2. Partial Syndrome Support: Unlike standard decoders, the
decodefunction (and its variants likedecode_batch) can accept partial syndromes. If partial syndromes are provided, the return vector will be empty, the decoder will not complete the processing and remain in an intermediate state, awaiting future syndromes. The return vector is only non-empty once enough data has been provided to match the original syndrome size (calculated from the Parity Check Matrix).Sliding window decoders are advantageous in QEC codes subject to circuit-level noise across multiple syndrome extraction rounds. These decoders permit syndrome processing to begin before the complete syndrome measurement sequence is obtained, potentially reducing the overall decoding latency. However, this approach introduces a trade-off: the reduction in latency typically comes at the cost of increased logical error rates. Therefore, the viability of sliding window decoding depends critically on the specific code parameters, noise model, and latency requirements of the system under consideration.
Sliding window decoding imposes only a single structural constraint on the parity check matrices: each syndrome extraction round must produce a constant number of syndrome measurements. Notably, the decoder makes no assumptions about temporal correlations or periodicity in the underlying noise process.
Streaming Syndrome Interface
For real-time applications, the decoder provides an
enqueue_syndrome()method that accepts syndrome data one round at a time. This allows the host to feed syndrome measurements as they arrive without waiting for all rounds to complete. The decoder automatically manages internal buffering and triggers window decodes at appropriate boundaries.References: Toward Low-latency Iterative Decoding of QLDPC Codes Under Circuit-Level Noise
Note
It is required to create decoders with the
get_decoderAPI from the CUDA-QX extension points API, such asimport cudaq import cudaq_qec as qec import numpy as np cudaq.set_target('stim') num_rounds = 5 code = qec.get_code('surface_code', distance=num_rounds) noise = cudaq.NoiseModel() noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.001), 1) statePrep = qec.operation.prep0 dem = qec.z_dem_from_memory_circuit(code, statePrep, num_rounds, noise) inner_decoder_params = {'use_osd': True, 'max_iterations': 50} opts = { 'error_rate_vec': np.array(dem.error_rates), 'window_size': 1, 'num_syndromes_per_round': dem.detector_error_matrix.shape[0] // num_rounds, 'inner_decoder_name': 'single_error_lut', 'inner_decoder_params': inner_decoder_params, } swdec = qec.get_decoder('sliding_window', dem.detector_error_matrix, **opts)
#include "cudaq/qec/code.h" #include "cudaq/qec/decoder.h" #include "cudaq/qec/experiments.h" #include "common/NoiseModel.h" int main() { // Generate a Detector Error Model. int num_rounds = 5; auto code = cudaq::qec::get_code( "surface_code", cudaqx::heterogeneous_map{{"distance", num_rounds}}); cudaq::noise_model noise; noise.add_all_qubit_channel("x", cudaq::depolarization2(0.001), 1); auto statePrep = cudaq::qec::operation::prep0; auto dem = cudaq::qec::z_dem_from_memory_circuit(*code, statePrep, num_rounds, noise); // Use the DEM to create a sliding window decoder. auto inner_decoder_params = cudaqx::heterogeneous_map{{"use_osd", true}, {"max_iterations", 50}}; auto opts = cudaqx::heterogeneous_map{ {"error_rate_vec", dem.error_rates}, {"window_size", 1}, {"num_syndromes_per_round", dem.detector_error_matrix.shape()[0] / num_rounds}, {"inner_decoder_name", "single_error_lut"}, {"inner_decoder_params", inner_decoder_params}}; auto swdec = cudaq::qec::get_decoder("sliding_window", dem.detector_error_matrix, opts); return 0; }
Note
The
"sliding_window"decoder implements thecudaq_qec.Decoderinterface for Python and thecudaq::qec::decoderinterface for C++, so it supports all the methods in those respective classes.- Parameters:
H – Parity check matrix (tensor format)
params –
Heterogeneous map of parameters:
error_rate_vec(double): Vector of length “block size” containing the probability of an error (in 0-1 range). This vector is used to populate theerror_rate_vecparameter for the inner decoder (automatically sliced correctly according to each window).window_size(int): The number of rounds of syndrome data in each window. (Defaults to 1.)step_size(int): The number of rounds to advance the window by each time. (Defaults to 1.)num_syndromes_per_round(int): The number of syndromes per round. (Must be provided.)straddle_start_round(bool): When forming a window, should error mechanisms that span the start round and any preceding rounds be included? (Defaults to False.)straddle_end_round(bool): When forming a window, should error mechanisms that span the end round and any subsequent rounds be included? (Defaults to True.)inner_decoder_name(string): The name of the inner decoder to use.inner_decoder_params(Python dict or C++heterogeneous_map): A dictionary of parameters to pass to the inner decoder.
TensorRT Decoder
- class trt_decoder
A GPU-accelerated quantum error correction decoder based on NVIDIA TensorRT. This decoder leverages TensorRT’s optimized inference engine to perform fast neural network-based decoding of quantum error correction syndromes.
The TRT decoder supports loading pre-trained neural network models in ONNX format or directly loading pre-built TensorRT engine files for maximum performance. It automatically optimizes the model for the target GPU architecture and supports various precision modes (FP16, BF16, INT8, FP8) to balance accuracy and speed.
Neural network-based decoders can be trained to perform syndrome decoding for specific quantum error correction codes and noise models. The TRT decoder provides a high-performance inference engine for these models, with automatic CUDA graph optimization for reduced latency.
Requires a CUDA-capable GPU and TensorRT installation. See the CUDA-Q GPU Compatibility List for a list of valid GPU configurations.
Note
It is required to create decoders with the
get_decoderAPI from the CUDA-QX extension points API, such asimport cudaq_qec as qec import numpy as np # Create a simple parity check matrix (not used by the TRT decoder) H = np.array([[1, 0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0, 1], [0, 0, 1, 0, 1, 1, 1]], dtype=np.uint8) # Option 1: Load from ONNX model (builds TRT engine) trt_dec = qec.get_decoder('trt_decoder', H, onnx_load_path='model.onnx', precision='fp16', engine_save_path='model.engine') # Option 2: Load pre-built TRT engine (faster startup) trt_dec = qec.get_decoder('trt_decoder', H, engine_load_path='model.engine')
#include "cudaq/qec/decoder.h" std::size_t block_size = 7; std::size_t syndrome_size = 3; cudaqx::tensor<uint8_t> H; // Create a simple parity check matrix (not used by the TRT decoder) std::vector<uint8_t> H_vec = {1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1}; H.copy(H_vec.data(), {syndrome_size, block_size}); // Option 1: Load from ONNX model (builds TRT engine) cudaqx::heterogeneous_map params1; params1.insert("onnx_load_path", "model.onnx"); params1.insert("precision", "fp16"); params1.insert("engine_save_path", "model.engine"); auto trt_dec1 = cudaq::qec::get_decoder("trt_decoder", H, params1); // Option 2: Load pre-built TRT engine (faster startup) cudaqx::heterogeneous_map params2; params2.insert("engine_load_path", "model.engine"); auto trt_dec2 = cudaq::qec::get_decoder("trt_decoder", H, params2);
Note
The
"trt_decoder"implements thecudaq_qec.Decoderinterface for Python and thecudaq::qec::decoderinterface for C++, so it supports all the methods in those respective classes.Note
The parity check matrix
His not used by the TRT decoder. The neural network model encodes the decoding logic, so the parity check matrix is only required to satisfy the decoder interface. You can pass any valid parity check matrix of appropriate dimensions.Note
Batch Processing: The TRT decoder automatically handles batch size optimization. Models trained with batch_size > 1 will receive zero-padded inputs when using
decode()on a single syndrome. When usingdecode_batch(), provide syndromes in multiples of the model’s batch size for optimal performance.- Parameters:
H – Parity check matrix (tensor format). Note: This parameter is not used by the TRT decoder but is required by the decoder interface.
params –
Heterogeneous map of parameters:
Required (choose one):
onnx_load_path(string): Path to ONNX model file. The decoder will build a TensorRT engine from this model. Cannot be used together withengine_load_path.engine_load_path(string): Path to pre-built TensorRT engine file. Provides faster initialization since the engine is already optimized. Cannot be used together withonnx_load_path.
Optional:
engine_save_path(string): Path to save the built TensorRT engine. Only applicable when usingonnx_load_path. Saving the engine allows for faster initialization in subsequent runs by usingengine_load_path.precision(string): Precision mode for inference (defaults to “best”). Valid options:”fp16”: Use FP16 (half precision) - good balance of speed and accuracy
”bf16”: Use BF16 (bfloat16) - available on newer GPUs (Ampere+)
”int8”: Use INT8 quantization - fastest but requires calibration
”fp8”: Use FP8 precision - available on Hopper GPUs
”tf32”: Use TensorFloat-32 - available on Ampere+ GPUs
”noTF32”: Disable TF32 and use standard FP32
”best”: Let TensorRT automatically choose the best precision (default)
Note: If the requested precision is not supported by the hardware, the decoder will fall back to FP32 with a warning.
memory_workspace(size_t): Memory workspace size in bytes for TensorRT engine building (defaults to 1GB = 1073741824 bytes). Larger workspaces may allow TensorRT to explore more optimization strategies.use_cuda_graph(bool): Enable CUDA graph optimization for improved performance (defaults to True). CUDA graphs capture inference operations and replay them with reduced kernel launch overhead, providing ~20% speedup. The optimization is applied automatically on the first decode call. Automatically disabled for models with dynamic shapes or multiple optimization profiles. Set to False to force traditional execution path.batch_size(automatic): The decoder automatically detects the model’s batch size from the first input dimension. For models with batch_size > 1, thedecode()method automatically zero-pads single syndromes to fill the batch. Thedecode_batch()method requires the number of syndromes to be an integral multiple of the model’s batch size.
Real-Time Decoding
The Real-Time Decoding API enables low-latency error correction on quantum hardware by allowing CUDA-Q quantum kernels to interact with decoders during circuit execution. This API is designed for use cases where corrections must be calculated and applied within qubit coherence times.
The real-time decoding system supports simulation environments for local testing and hardware integration (e.g., on Quantinuum’s Helios QPU).
Core Decoding Functions
These functions can be called from within CUDA-Q quantum kernels (__qpu__ functions) to interact with real-time decoders.
-
void cudaq::qec::decoding::enqueue_syndromes(std::uint64_t decoder_id, const std::vector<cudaq::measure_result> &syndromes, std::uint64_t tag = 0)
Enqueue syndromes for decoding.
- Parameters:
decoder_id – The ID of the decoder to use.
syndromes – The syndromes to enqueue.
tag – The tag to use for the syndrome (currently useful for logging only)
-
std::vector<bool> cudaq::qec::decoding::get_corrections(std::uint64_t decoder_id, std::uint64_t return_size, bool reset = false)
Get the corrections for a given decoder.
- Parameters:
decoder_id – The ID of the decoder to use.
return_size – The number of bits to return (in bits). This is expected to match the number of observables in the decoder.
reset – Whether to reset the decoder corrections after retrieving them.
- Returns:
The corrections (detected bit flips) for the given decoder, based on all of the decoded syndromes since the last time any corrections were reset.
Configuration API
The configuration API enables setting up decoders before circuit execution. Decoders are configured using YAML files or programmatically constructed configuration objects.
-
int cudaq::qec::decoding::config::configure_decoders(multi_decoder_config &config)
Configure the decoders (
multi_decoder_configvariant). This function configures both local decoders, and if running on remote target hardware, will submit the configuration to the remote target for further processing.- Parameters:
config – The configuration to use.
- Returns:
0 on success, non-zero on failure.
-
int cudaq::qec::decoding::config::configure_decoders_from_file(const char *config_file)
Configure the decoders from a file. This function configures both local decoders, and if running on remote target hardware, will submit the configuration to the remote target for further processing.
- Parameters:
config_file – The file to read the configuration from.
- Returns:
0 on success, non-zero on failure.
-
int cudaq::qec::decoding::config::configure_decoders_from_str(const char *config_str)
Configure the decoders from a string. This function configures both local decoders, and if running on remote target hardware, will submit the configuration to the remote target for further processing.
- Parameters:
config_str – The string to read the configuration from.
- Returns:
0 on success, non-zero on failure.
Helper Functions
Real-time decoding requires converting matrices to sparse format for efficient decoder configuration. The following utility functions are essential:
cudaq::qec::pcm_to_sparse_vec()for converting a dense PCM to a sparse PCM.Usage in real-time decoding:
config.H_sparse = cudaq::qec::pcm_to_sparse_vec(dem.detector_error_matrix); config.O_sparse = cudaq::qec::pcm_to_sparse_vec(dem.observables_flips_matrix);
cudaq::qec::pcm_from_sparse_vec()for converting a sparse PCM to a dense PCM.cudaq::qec::generate_timelike_sparse_detector_matrix()for generating a sparse detector matrix.Usage in real-time decoding:
config.D_sparse = cudaq::qec::generate_timelike_sparse_detector_matrix( numSyndromesPerRound, numRounds, false);
See also Parity Check Matrix Utilities for additional PCM manipulation functions.
Realtime Pipeline API
The realtime pipeline API provides the reusable host-side runtime for
low-latency QEC pipelines that combine GPU inference with optional CPU
post-processing. The published reference is generated from
cudaq/qec/realtime/pipeline.h.
Note
This API is experimental and subject to change.
Configuration
-
struct core_pinning
CPU core affinity settings for pipeline threads.
-
struct pipeline_stage_config
Configuration for a single pipeline stage.
Public Members
-
int num_workers = 8
Number of GPU worker threads (max 64).
-
int num_slots = 32
Number of ring buffer slots.
-
size_t slot_size = 16384
Size of each ring buffer slot in bytes.
-
core_pinning cores
CPU core affinity settings.
-
void *external_ringbuffer = nullptr
When non-null, the pipeline uses this caller-owned ring buffer (cudaq_ringbuffer_t*) instead of allocating its own. The caller is responsible for lifetime. ring_buffer_injector is unavailable in this mode (the FPGA/emulator owns the producer side).
-
int num_workers = 8
GPU Stage
-
struct gpu_worker_resources
Per-worker GPU resources returned by the gpu_stage_factory.
Each worker owns a captured CUDA graph, a dedicated stream, and optional pre/post launch callbacks for DMA staging or result extraction.
Public Members
-
cudaGraphExec_t graph_exec = nullptr
Instantiated CUDA graph for this worker.
-
cudaStream_t stream = nullptr
Dedicated CUDA stream for graph launches.
-
void (*pre_launch_fn)(void *user_data, void *slot_dev, cudaStream_t stream) = nullptr
Optional callback invoked before graph launch (e.g. DMA copy).
-
void *pre_launch_data = nullptr
Opaque user data passed to
pre_launch_fn.
-
void (*post_launch_fn)(void *user_data, void *slot_dev, cudaStream_t stream) = nullptr
Optional callback invoked after graph launch.
-
void *post_launch_data = nullptr
Opaque user data passed to
post_launch_fn.
-
uint32_t function_id = 0
RPC function ID that this worker handles.
-
void *user_context = nullptr
Opaque user context passed to cpu_stage_callback.
-
cudaGraphExec_t graph_exec = nullptr
-
using cudaq::qec::realtime::experimental::gpu_stage_factory = std::function<gpu_worker_resources(int worker_id)>
Factory called once per worker during start().
- Param worker_id:
Zero-based worker index assigned by the pipeline.
- Return:
GPU resources for the given worker. Any handles, callbacks, and user data returned here must remain valid until the pipeline stops.
CPU Stage
-
struct cpu_stage_context
Context passed to the CPU stage callback for each completed GPU workload.
The callback reads
gpu_output, performs post-processing (e.g. MWPM decoding), and writes the result intoresponse_buffer.Public Members
-
int worker_id
Index of the worker thread invoking this callback.
-
int origin_slot
Ring buffer slot that originated this request.
-
const void *gpu_output
Pointer to GPU inference output (nullptr in poll mode).
-
size_t gpu_output_size
Size of GPU output in bytes.
-
void *response_buffer
Destination buffer for the RPC response.
-
size_t max_response_size
Maximum number of bytes that can be written to
response_buffer.
-
void *user_context
Opaque user context from gpu_worker_resources::user_context.
-
int worker_id
-
using cudaq::qec::realtime::experimental::cpu_stage_callback = std::function<size_t(const cpu_stage_context &ctx)>
CPU stage callback type.
- Param ctx:
Poll-mode view of the current worker state and response buffer.
- Return:
Number of bytes written into
ctx.response_buffer. Return 0 if no GPU result is ready yet (poll again). Return DEFERRED_COMPLETION to release the worker immediately while deferring slot completion to a later complete_deferred() call.
-
static constexpr size_t cudaq::qec::realtime::experimental::DEFERRED_COMPLETION = SIZE_MAX
Sentinel return value from cpu_stage_callback: release the worker (idle_mask) but do NOT signal slot completion (tx_flags). The caller is responsible for calling realtime_pipeline::complete_deferred(slot) once the deferred work (e.g. a separate decode thread) finishes.
Completion
-
struct completion
Metadata for a completed (or errored) pipeline request.
-
using cudaq::qec::realtime::experimental::completion_callback = std::function<void(const completion &c)>
Callback invoked by the consumer thread for each completed request.
- Param c:
Metadata for the completed or errored request.
Ring Buffer Injector
-
class ring_buffer_injector
Writes RPC-framed requests into the pipeline’s ring buffer, simulating FPGA DMA deposits.
Created via realtime_pipeline::create_injector(). The parent realtime_pipeline must outlive the injector. Not available when the pipeline is configured with an external ring buffer.
Public Functions
-
~ring_buffer_injector()
Destroy the injector state.
-
ring_buffer_injector(ring_buffer_injector&&) noexcept
Move-construct an injector.
-
ring_buffer_injector &operator=(ring_buffer_injector&&) noexcept
Move-assign an injector.
-
bool try_submit(uint32_t function_id, const void *payload, size_t payload_size, uint64_t request_id)
Try to submit a request without blocking.
- Parameters:
function_id – RPC function identifier.
payload – Pointer to the payload data.
payload_size – Size of the payload in bytes.
request_id – Caller-assigned request identifier.
- Returns:
True if accepted, false if all slots are busy (backpressure).
-
void submit(uint32_t function_id, const void *payload, size_t payload_size, uint64_t request_id)
Submit a request, spinning until a slot becomes available.
- Parameters:
function_id – RPC function identifier.
payload – Pointer to the payload data.
payload_size – Size of the payload in bytes.
request_id – Caller-assigned request identifier.
-
~ring_buffer_injector()
Pipeline
-
class realtime_pipeline
Orchestrates GPU inference and CPU post-processing for low-latency realtime QEC decoding.
The pipeline manages a ring buffer, a host dispatcher thread, per-worker GPU streams with captured CUDA graphs, optional CPU worker threads for post-processing (e.g. PyMatching), and a consumer thread for completion signaling. It supports both an internal ring buffer (for software testing via ring_buffer_injector) and an external ring buffer (for FPGA RDMA).
Public Functions
-
explicit realtime_pipeline(const pipeline_stage_config &config)
Construct a pipeline and allocate ring buffer resources.
Note
Construction allocates the backing ring buffer or binds the caller-provided external ring so ringbuffer_bases can be queried before start.
- Parameters:
config – Stage configuration (slots, slot size, workers, etc.).
-
~realtime_pipeline()
Stop the pipeline if needed and release owned resources.
-
void set_gpu_stage(gpu_stage_factory factory)
Register the GPU stage factory. Must be called before start().
- Parameters:
factory – Callback that returns gpu_worker_resources per worker.
-
void set_cpu_stage(cpu_stage_callback callback)
Register the CPU worker callback. Must be called before start().
- Parameters:
callback – Function invoked by each worker thread to poll for and process completed GPU workloads. If not set, the pipeline operates in GPU-only mode with completion signaled via cudaLaunchHostFunc.
-
void set_completion_handler(completion_callback handler)
Register the completion callback. Must be called before start().
- Parameters:
handler – Function invoked by the consumer thread for each completed or errored request.
-
void start()
Allocate resources, build dispatcher config, and spawn all threads.
- Throws:
std::logic_error – If the GPU stage factory was not registered.
std::logic_error – If GPU-only mode is requested with an external ring buffer.
-
void stop()
Signal shutdown, join all threads, free resources.
Note
Safe to call multiple times. Subsequent calls are no-ops once the pipeline has fully stopped.
-
ring_buffer_injector create_injector()
Create a software injector for testing without FPGA hardware.
- Throws:
std::logic_error – if the pipeline uses an external ring buffer.
- Returns:
A ring_buffer_injector bound to this pipeline’s ring buffer.
-
void complete_deferred(int slot)
Signal that deferred processing for a slot is complete.
Call from any thread after the cpu_stage callback returned DEFERRED_COMPLETION and the deferred work has finished writing the response into the slot’s ring buffer area.
- Parameters:
slot – Ring buffer slot index to complete.
-
ring_buffer_bases ringbuffer_bases() const
Return the host and device base addresses of the RX data ring.
Note
In external-ring mode these pointers are the caller-provided ring addresses. In internal mode they refer to the owned mapped ring buffer.
- Returns:
Struct containing both base pointers.
-
struct ring_buffer_bases
Host and device base addresses of the RX data ring.
-
struct Stats
Pipeline throughput and backpressure statistics.
-
explicit realtime_pipeline(const pipeline_stage_config &config)
-
struct Stats
Pipeline throughput and backpressure statistics.
Public Members
-
uint64_t submitted
Total requests submitted to the ring buffer.
-
uint64_t completed
Total requests that completed (success or error).
-
uint64_t dispatched
Total packets dispatched by the host dispatcher.
-
uint64_t backpressure_stalls
Cumulative producer backpressure stalls.
-
uint64_t submitted
-
struct ring_buffer_bases
Host and device base addresses of the RX data ring.
Public Members
-
uint8_t *rx_data_host
Host-mapped base pointer for the RX data ring.
-
uint8_t *rx_data_dev
Device-mapped base pointer for the RX data ring.
-
uint8_t *rx_data_host
Parity Check Matrix Utilities
-
std::vector<std::vector<std::uint32_t>> cudaq::qec::dense_to_sparse(const cudaqx::tensor<uint8_t> &pcm)
Return a sparse representation of the PCM.
- Parameters:
pcm – The PCM to convert to a sparse representation.
- Returns:
A vector of vectors that sparsely represents the PCM. The size of the outer vector is the number of columns in the PCM, and the i-th element contains an inner vector of the row indices of the non-zero elements in the i-th column of the PCM.
-
cudaqx::tensor<uint8_t> cudaq::qec::generate_random_pcm(std::size_t n_rounds, std::size_t n_errs_per_round, std::size_t n_syndromes_per_round, int weight, std::mt19937_64 &&rng)
Generate a random PCM with the given parameters.
- Parameters:
n_rounds – The number of rounds in the PCM.
n_errs_per_round – The number of errors per round in the PCM.
n_syndromes_per_round – The number of syndromes per round in the PCM.
weight – The column weight of the PCM.
rng – The random number generator to use (e.g. std::mt19937_64(your_seed))
- Returns:
A random PCM with the given parameters.
-
std::vector<std::int64_t> cudaq::qec::generate_timelike_sparse_detector_matrix(std::uint32_t num_syndromes_per_round, std::uint32_t num_rounds, bool include_first_round = false)
Generate a sparse detector matrix for a given number of syndromes per round and number of rounds. Timelike here means that each round of syndrome bits are xor’d against the preceding round.
- Parameters:
num_syndromes_per_round – The number of syndromes per round.
num_rounds – The number of rounds.
include_first_round – Whether to include the first round in the detector matrix.
- Returns:
The detector matrix format is CSR-like, with -1 values indicating the end of a row.
-
std::vector<std::int64_t> cudaq::qec::generate_timelike_sparse_detector_matrix(std::uint32_t num_syndromes_per_round, std::uint32_t num_rounds, std::vector<std::int64_t> first_round_matrix)
Generate a sparse detector matrix for a given number of syndromes per round and number of rounds. Timelike here means that each round of syndrome bits are xor’d against the preceding round. The first round is supplied by the user, to allow for a mixture of detectors and non-detectors.
- Parameters:
num_syndromes_per_round – The number of syndromes per round.
num_rounds – The number of rounds.
first_round_matrix – User specified detector matrix for the first round.
- Returns:
The detector matrix format is CSR-like, with -1 values indicating the end of a row.
-
std::tuple<cudaqx::tensor<uint8_t>, std::uint32_t, std::uint32_t> cudaq::qec::get_pcm_for_rounds(const cudaqx::tensor<uint8_t> &pcm, std::uint32_t num_syndromes_per_round, std::uint32_t start_round, std::uint32_t end_round, bool straddle_start_round = false, bool straddle_end_round = false)
Get a sub-PCM for a range of rounds. It is recommended (but not required) that you call sort_pcm_columns() before calling this function.
- Parameters:
pcm – The PCM to get a sub-PCM for.
num_syndromes_per_round – The number of syndromes per round.
start_round – The start round (0-based).
end_round – The end round (0-based).
straddle_start_round – Whether to include columns that straddle the start_round (defaults to false)
straddle_end_round – Whether to include columns that straddle the end_round (defaults to false)
- Returns:
A tuple with the new PCM with the columns in the range [start_round, end_round], the first column included, and the last column included.
-
std::vector<std::uint32_t> cudaq::qec::get_sorted_pcm_column_indices(const std::vector<std::vector<std::uint32_t>> &row_indices, std::uint32_t num_syndromes_per_round = 0)
Return a vector of column indices that would sort the PCM columns in topological order.
This function tries to make a matrix that is close to a block diagonal matrix from its input. Columns are first sorted by the index of the first non-zero entry in the column, and if those match, then they are sorted by the index of the last non-zero entry in the column. This ping pong continues for the indices of the second non-zero element and the second-to-last non-zero element, and so forth.
- Parameters:
row_indices – For each column, a vector of row indices that have a non-zero value in that column.
num_syndromes_per_round – The number of syndromes per round. (Defaults to 0, which means that no secondary per-round sorting will occur.)
-
std::vector<std::uint32_t> cudaq::qec::get_sorted_pcm_column_indices(const cudaqx::tensor<uint8_t> &pcm, std::uint32_t num_syndromes_per_round = 0)
Return a vector of column indices that would sort the PCM columns in topological order.
- Parameters:
num_syndromes_per_round – The number of syndromes per round. (Defaults to 0, which means that no secondary per-round sorting will occur.)
-
std::pair<cudaqx::tensor<uint8_t>, std::vector<std::uint32_t>> cudaq::qec::pcm_extend_to_n_rounds(const cudaqx::tensor<uint8_t> &pcm, std::size_t num_syndromes_per_round, std::uint32_t n_rounds)
Extend a PCM to the given number of rounds.
- Parameters:
pcm – The PCM to extend.
num_syndromes_per_round – The number of syndromes per round.
n_rounds – The number of rounds to extend the PCM to.
- Returns:
A pair of the new PCM and the list of column indices from the original PCM that were used to form the new PCM.
-
cudaqx::tensor<uint8_t> cudaq::qec::pcm_from_sparse_vec(const std::vector<std::int64_t> &sparse_vec, std::size_t num_rows, std::size_t num_cols)
Return a PCM from a sparse representation.
- Parameters:
sparse_vec – The sparse representation of the PCM, where -1 separates rows.
num_rows – The number of rows in the PCM.
num_cols – The number of columns in the PCM.
- Returns:
A PCM tensor.
-
bool cudaq::qec::pcm_is_sorted(const cudaqx::tensor<uint8_t> &pcm, std::uint32_t num_syndromes_per_round = 0)
Check if a PCM is sorted.
- Parameters:
pcm – The PCM to check.
num_syndromes_per_round – The number of syndromes per round.
- Returns:
True if the PCM is sorted, false otherwise.
-
std::vector<std::int64_t> cudaq::qec::pcm_to_sparse_vec(const cudaqx::tensor<uint8_t> &pcm)
Return a sparse representation of the PCM.
- Parameters:
pcm – The PCM to convert to a sparse representation.
- Returns:
A vector of integers that represents the PCM in a sparse format, where -1 separates rows.
-
cudaqx::tensor<uint8_t> cudaq::qec::reorder_pcm_columns(const cudaqx::tensor<uint8_t> &pcm, const std::vector<std::uint32_t> &column_order, uint32_t row_begin = 0, uint32_t row_end = std::numeric_limits<uint32_t>::max())
Reorder the columns of a PCM according to the given column order. Note: this may return a subset of the columns in the original PCM if the
column_orderdoes not contain all of the columns in the original PCM.- Parameters:
pcm – The PCM to reorder.
column_order – The column order to use for reordering.
row_begin – The first row to include in the reordering. Leave at the default value to include all rows.
row_end – The last row to include in the reordering. Leave at the default value to include all rows.
- Returns:
A new PCM with the columns reordered according to the given column order.
-
cudaqx::tensor<uint8_t> cudaq::qec::shuffle_pcm_columns(const cudaqx::tensor<uint8_t> &pcm, std::mt19937_64 &&rng)
Randomly permute the columns of a PCM.
- Parameters:
pcm – The PCM to permute.
rng – The random number generator to use (e.g. std::mt19937_64(your_seed))
- Returns:
A new PCM with the columns permuted randomly.
-
std::pair<cudaqx::tensor<uint8_t>, std::vector<double>> cudaq::qec::simplify_pcm(const cudaqx::tensor<uint8_t> &pcm, const std::vector<double> &weights, std::uint32_t num_syndromes_per_round = 0)
Simplify a PCM by removing duplicate columns and 0-weight columns, and combine the probability weight vectors accordingly.
- Parameters:
pcm – The PCM to simplify.
weights – The probability weight vectors to combine.
- Returns:
A new PCM with the columns sorted in topological order, and the probability weight vectors combined accordingly.
Common
-
enum class cudaq::qec::operation
Enum describing all supported logical operations.
Values:
-
enumerator x
Logical X gate.
-
enumerator y
Logical Y gate.
-
enumerator z
Logical Z gate.
-
enumerator h
Logical Hadamard gate.
-
enumerator s
Logical S gate.
-
enumerator cx
Logical controlled-X gate.
-
enumerator cy
Logical controlled-Y gate.
-
enumerator cz
Logical controlled-Z gate.
-
enumerator stabilizer_round
Stabilizer measurement round.
-
enumerator prep0
Prepare logical |0⟩ state.
-
enumerator prep1
Prepare logical |1⟩ state.
-
enumerator prepp
Prepare logical |+⟩ state.
-
enumerator prepm
Prepare logical |-⟩ state.
-
enumerator x
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_code_capacity(const cudaqx::tensor<uint8_t> &H, std::size_t numShots, double error_probability)
Sample syndrome measurements with code capacity noise.
- Parameters:
H – Parity check matrix of a QEC code
numShots – Number of measurement shots
error_probability – Probability of bit flip on data
- Returns:
Tuple containing syndrome measurements and data qubit measurements
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_code_capacity(const cudaqx::tensor<uint8_t> &H, std::size_t numShots, double error_probability, unsigned seed)
Sample syndrome measurements with code capacity noise.
- Parameters:
H – Parity check matrix of a QEC code
numShots – Number of measurement shots
error_probability – Probability of bit flip on data
seed – RNG seed for reproducible experiments
- Returns:
Tuple containing syndrome measurements and data qubit measurements
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_code_capacity(const code &code, std::size_t numShots, double error_probability)
Sample syndrome measurements with code capacity noise.
- Parameters:
code – QEC Code to sample
numShots – Number of measurement shots
error_probability – Probability of bit flip on data
- Returns:
Tuple containing syndrome measurements and data qubit measurements
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_code_capacity(const code &code, std::size_t numShots, double error_probability, unsigned seed)
Sample syndrome measurements with code capacity noise.
- Parameters:
code – QEC Code to sample
numShots – Number of measurement shots
error_probability – Probability of bit flip on data
seed – RNG seed for reproducible experiments
- Returns:
Tuple containing syndrome measurements and data qubit measurements
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_memory_circuit(const code &code, std::size_t numShots, std::size_t numRounds = 1)
Sample syndrome measurements starting from |0⟩ state.
- Parameters:
numShots – Number of measurement shots
numRounds – Number of stabilizer measurement rounds
- Returns:
Tuple containing syndrome measurements and data qubit measurements (mz for z basis state prep, mx for x basis)
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_memory_circuit(const code &code, std::size_t numShots, std::size_t numRounds, cudaq::noise_model &noise)
Sample syndrome measurements from |0⟩ state with noise.
- Parameters:
numShots – Number of measurement shots
numRounds – Number of stabilizer measurement rounds
noise – Noise model to apply
- Returns:
Tuple containing syndrome measurements and data qubit measurements (mz for z basis state prep, mx for x basis)
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_memory_circuit(const code &code, operation statePrep, std::size_t numShots, std::size_t numRounds = 1)
Sample syndrome measurements from the memory circuit.
- Parameters:
statePrep – Initial state preparation operation
numShots – Number of measurement shots
numRounds – Number of stabilizer measurement rounds
- Returns:
Tuple containing syndrome measurements and data qubit measurements (mz for z basis state prep, mx for x basis)
-
std::tuple<cudaqx::tensor<uint8_t>, cudaqx::tensor<uint8_t>> cudaq::qec::sample_memory_circuit(const code &code, operation statePrep, std::size_t numShots, std::size_t numRounds, cudaq::noise_model &noise)
Sample syndrome measurements with circuit-level noise.
- Parameters:
statePrep – Initial state preparation operation
numShots – Number of measurement shots
numRounds – Number of stabilizer measurement rounds
noise – Noise model to apply
- Returns:
Tuple containing syndrome measurements and data qubit measurements (mz for z basis state prep, mx for x basis)