CUDA-Q QEC Python API

Code

Detector Error Model

Decoder Interfaces

Built-in Decoders

NVIDIA QLDPC Decoder

class nv_qldpc_decoder

A general purpose Quantum Low-Density Parity-Check Decoder (QLDPC) decoder based on GPU accelerated belief propagation (BP). Since belief propagation is an iterative method, decoding can be improved with a second-stage post-processing step. Optionally, ordered statistics decoding (OSD) can be chosen to perform the second stage of decoding.

An [[n,k,d]] quantum error correction (QEC) code encodes k logical qubits into an n qubit data block, with a code distance d. Quantum low-density parity-check (QLDPC) codes are characterized by sparse parity-check matrices (or Tanner graphs), corresponding to a bounded number of parity checks per data qubit.

Requires a CUDA-Q compatible GPU. See the CUDA-Q GPU Compatibility List for a list of valid GPU configurations.

References: Decoding Across the Quantum LDPC Code Landscape

Note

It is required to create decoders with the get_decoder API from the CUDA-QX extension points API, such as

import cudaq_qec as qec
import numpy as np
H = np.array([[1, 0, 0, 1, 0, 1, 1],
              [0, 1, 0, 1, 1, 0, 1],
              [0, 0, 1, 0, 1, 1, 1]], dtype=np.uint8) # sample 3x7 PCM
opts = dict() # see below for options
# Note: H must be in row-major order. If you use
# `scipy.sparse.csr_matrix.todense()` to get the parity check
# matrix, you must specify todense(order='C') to get a row-major
# matrix.
nvdec = qec.get_decoder('nv-qldpc-decoder', H, **opts)
std::size_t block_size = 7;
std::size_t syndrome_size = 3;
cudaqx::tensor<uint8_t> H;

std::vector<uint8_t> H_vec = {1, 0, 0, 1, 0, 1, 1,
                              0, 1, 0, 1, 1, 0, 1,
                              0, 0, 1, 0, 1, 1, 1};
H.copy(H_vec.data(), {syndrome_size, block_size});

cudaqx::heterogeneous_map nv_custom_args;
nv_custom_args.insert("use_osd", true);
// See below for options

auto nvdec = cudaq::qec::get_decoder("nv-qldpc-decoder", H, nv_custom_args);

Note

The "nv-qldpc-decoder" implements the cudaq_qec.Decoder interface for Python and the cudaq::qec::decoder interface for C++, so it supports all the methods in those respective classes.

Parameters:
  • H – Parity check matrix (tensor format)

  • params

    Heterogeneous map of parameters:

    • use_sparsity (bool): Whether or not to use a sparse matrix solver

    • error_rate (double): Probability of an error (in 0-1 range) on a block data bit (defaults to 0.001)

    • error_rate_vec (double): Vector of length “block size” containing the probability of an error (in 0-1 range) on a block data bit (defaults to 0.001). This overrides error_rate.

    • max_iterations (int): Maximum number of BP iterations to perform (defaults to 30)

    • n_threads (int): Number of CUDA threads to use for the GPU decoder (defaults to smart selection based on parity matrix size)

    • use_osd (bool): Whether or not to use an OSD post processor if the initial BP algorithm fails to converge on a solution

    • osd_method (int): 1=OSD-0, 2=Exhaustive, 3=Combination Sweep (defaults to 1). Ignored unless use_osd is true.

    • osd_order (int): OSD postprocessor order (defaults to 0). Ref: Decoding Across the Quantum LDPC Code Landscape

      • For osd_method=2 (Exhaustive), the number of possible permutations searched after OSD-0 grows by 2^osd_order.

      • For osd_method=3 (Combination Sweep), this is the λ parameter. All weight 1 permutations and the first λ bits worth of weight 2 permutations are searched after OSD-0. This is (syndrome_length - block_size + λ * (λ - 1) / 2) additional permutations.

      • For other osd_method values, this is ignored.

    • bp_batch_size (int): Number of syndromes that will be decoded in parallel for the BP decoder (defaults to 1)

    • osd_batch_size (int): Number of syndromes that will be decoded in parallel for OSD (defaults to the number of concurrent threads supported by the hardware)

    • iter_per_check (int): Number of iterations between BP convergence checks (defaults to 1, and max is max_iterations). Introduced in 0.4.0.

    • clip_value (float): Value to clip the BP messages to. Should be a non-negative value (defaults to 0.0, which disables clipping). Introduced in 0.4.0.

    • bp_method (int): Core BP algorithm to use (defaults to 0). Introduced in 0.4.0, expanded in 0.5.0:

      • 0: sum-product

      • 1: min-sum (introduced in 0.4.0)

      • 2: min-sum+mem (uniform memory strength, requires use_sparsity=True. Introduced in 0.5.0)

      • 3: min-sum+dmem (disordered memory strength, requires use_sparsity=True. Introduced in 0.5.0)

    • composition (int): Iteration strategy (defaults to 0). Introduced in 0.5.0:

      • 0: Standard (single run)

      • 1: Sequential relay (multiple gamma legs). Requires: bp_method=3, use_sparsity=True, and srelay_config

    • scale_factor (float): The scale factor to use for min-sum. Defaults to 1.0. When set to 0.0, the scale factor is dynamically computed based on the number of iterations. Introduced in 0.4.0.

    • proc_float (string): The processing float type to use. Defaults to “fp64”. Valid values are “fp32” and “fp64”. Introduced in 0.5.0.

    • gamma0 (float): Memory strength parameter. Required for bp_method=2, and for composition=1 (sequential relay). Introduced in 0.5.0.

    • gamma_dist (vector<float>): Gamma distribution interval [min, max] for disordered memory strength. Required for bp_method=3 if explicit_gammas not provided. Introduced in 0.5.0.

    • explicit_gammas (vector<vector<float>>): Explicit gamma values for each variable node. For bp_method=3 with composition=0, provide a 2D vector where each row has block_size columns. For composition=1 (Sequential relay), provide num_sets rows (one per relay leg). Overrides gamma_dist if provided. Introduced in 0.5.0.

    • srelay_config (heterogeneous_map): Sequential relay configuration (required for composition=1). Contains the following parameters. Introduced in 0.5.0:

      • pre_iter (int): Number of pre-iterations to run before relay legs

      • num_sets (int): Number of relay sets (legs) to run

      • stopping_criterion (string): When to stop relay legs:

        • ”All”: Run all legs

        • ”FirstConv”: Stop relay after first convergence

        • ”NConv”: Stop after N convergences (requires stop_nconv parameter)

      • stop_nconv (int): Number of convergences to wait for before stopping (required only when stopping_criterion="NConv")

    • bp_seed (int): Seed for random number generation used in bp_method=3 (disordered memory BP). Optional parameter, defaults to 42 if not provided. Introduced in 0.5.0.

    • opt_results (heterogeneous_map): Optional results to return. This field can be left empty if no additional results are desired. Choices are:

      • bp_llr_history (int): Return the last bp_llr_history iterations of the BP LLR history. Minimum value is 0 and maximum value is max_iterations. The actual number of returned iterations might be fewer than bp_llr_history if BP converges before the requested number of iterations. Introduced in 0.4.0. Note: Not supported for composition=1.

      • num_iter (bool): If true, return the number of BP iterations run. Introduced in 0.5.0.

Sliding Window Decoder

class sliding_window

The Sliding Window Decoder is a wrapper around a standard decoder that introduces two key differences:

1. Sliding Window Decoding: The decoding process is performed incrementally, one window at a time. The window size is specified by the user. This allows decoding to begin before all syndromes have been received, potentially reducing overall latency in multi-round QEC codes.

2. Partial Syndrome Support: Unlike standard decoders, the decode function (and its variants like decode_batch) can accept partial syndromes. If partial syndromes are provided, the return vector will be empty, the decoder will not complete the processing and remain in an intermediate state, awaiting future syndromes. The return vector is only non-empty once enough data has been provided to match the original syndrome size (calculated from the Parity Check Matrix).

Sliding window decoders are advantageous in QEC codes subject to circuit-level noise across multiple syndrome extraction rounds. These decoders permit syndrome processing to begin before the complete syndrome measurement sequence is obtained, potentially reducing the overall decoding latency. However, this approach introduces a trade-off: the reduction in latency typically comes at the cost of increased logical error rates. Therefore, the viability of sliding window decoding depends critically on the specific code parameters, noise model, and latency requirements of the system under consideration.

Sliding window decoding imposes only a single structural constraint on the parity check matrices: each syndrome extraction round must produce a constant number of syndrome measurements. Notably, the decoder makes no assumptions about temporal correlations or periodicity in the underlying noise process.

References: Toward Low-latency Iterative Decoding of QLDPC Codes Under Circuit-Level Noise

Note

It is required to create decoders with the get_decoder API from the CUDA-QX extension points API, such as

import cudaq
import cudaq_qec as qec
import numpy as np

cudaq.set_target('stim')
num_rounds = 5
code = qec.get_code('surface_code', distance=num_rounds)
noise = cudaq.NoiseModel()
noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.001), 1)
statePrep = qec.operation.prep0
dem = qec.z_dem_from_memory_circuit(code, statePrep, num_rounds, noise)
inner_decoder_params = {'use_osd': True, 'max_iterations': 50}
opts = {
    'error_rate_vec': np.array(dem.error_rates),
    'window_size': 1,
    'num_syndromes_per_round': dem.detector_error_matrix.shape[0] // num_rounds,
    'inner_decoder_name': 'single_error_lut',
    'inner_decoder_params': inner_decoder_params,
}
swdec = qec.get_decoder('sliding_window', dem.detector_error_matrix, **opts)
#include "cudaq/qec/code.h"
#include "cudaq/qec/decoder.h"
#include "cudaq/qec/experiments.h"
#include "common/NoiseModel.h"

int main() {
    // Generate a Detector Error Model.
    int num_rounds = 5;
    auto code = cudaq::qec::get_code(
        "surface_code", cudaqx::heterogeneous_map{{"distance", num_rounds}});
    cudaq::noise_model noise;
    noise.add_all_qubit_channel("x", cudaq::depolarization2(0.001), 1);
    auto statePrep = cudaq::qec::operation::prep0;
    auto dem = cudaq::qec::z_dem_from_memory_circuit(*code, statePrep, num_rounds,
                                                    noise);
    // Use the DEM to create a sliding window decoder.
    auto inner_decoder_params =
        cudaqx::heterogeneous_map{{"use_osd", true}, {"max_iterations", 50}};
    auto opts = cudaqx::heterogeneous_map{
        {"error_rate_vec", dem.error_rates},
        {"window_size", 1},
        {"num_syndromes_per_round",
        dem.detector_error_matrix.shape()[0] / num_rounds},
        {"inner_decoder_name", "single_error_lut"},
        {"inner_decoder_params", inner_decoder_params}};
    auto swdec = cudaq::qec::get_decoder("sliding_window",
                                        dem.detector_error_matrix, opts);

    return 0;
}

Note

The "sliding_window" decoder implements the cudaq_qec.Decoder interface for Python and the cudaq::qec::decoder interface for C++, so it supports all the methods in those respective classes.

Parameters:
  • H – Parity check matrix (tensor format)

  • params

    Heterogeneous map of parameters:

    • error_rate_vec (double): Vector of length “block size” containing the probability of an error (in 0-1 range). This vector is used to populate the error_rate_vec parameter for the inner decoder (automatically sliced correctly according to each window).

    • window_size (int): The number of rounds of syndrome data in each window. (Defaults to 1.)

    • step_size (int): The number of rounds to advance the window by each time. (Defaults to 1.)

    • num_syndromes_per_round (int): The number of syndromes per round. (Must be provided.)

    • straddle_start_round (bool): When forming a window, should error mechanisms that span the start round and any preceding rounds be included? (Defaults to False.)

    • straddle_end_round (bool): When forming a window, should error mechanisms that span the end round and any subsequent rounds be included? (Defaults to True.)

    • inner_decoder_name (string): The name of the inner decoder to use.

    • inner_decoder_params (Python dict or C++ heterogeneous_map): A dictionary of parameters to pass to the inner decoder.

Tensor Network Decoder

class cudaq_qec.plugin.decoders.tensor_network_decoder.TensorNetworkDecoder

A general class for tensor network decoders for quantum error correction codes.

This decoder constructs a tensor network representation of a quantum code using its parity check matrix, logical observables, and noise model. The tensor network is based on the Tanner graph of the code and can be contracted to compute the probability that a logical observable has flipped, given a syndrome.

The decoder supports both single-syndrome and batch decoding, and can run on CPU or GPU (using cuTensorNet if available).

The Tensor Network Decoder is a Python-only implementation and it requires Python 3.11 or higher. C++ APIs are not available for this decoder.

Due to the additional dependencies of the Tensor Network Decoder, you must specify the optional pip package when installing CUDA-Q QEC in order to use this decoder. Use pip install cudaq-qec[tensor-network-decoder] in order to use this decoder.

The Tensor Network Decoder has the same GPU support as the Quantum Low-Density Parity-Check Decoder. However, if you are using the V100 GPU (SM70), you will need to pin your cuTensor version to 2.2 by running pip install cutensor_cu12==2.2. Note that this GPU will not be supported by the Tensor Network Decoder when CUDA-Q 0.5.0 is released.

Note

It is recommended to create decoders using the cudaq_qec plugin API:

import cudaq_qec as qec
import numpy as np

# Example: [3,1] repetition code
H = np.array([[1, 1, 0],
        [0, 1, 1]], dtype=np.uint8)
logical_obs = np.array([[1, 1, 1]], dtype=np.uint8)
noise_model = [0.1, 0.1, 0.1]

decoder = qec.get_decoder("tensor_network_decoder", H, logical_obs=logical_obs, noise_model=noise_model)

syndrome = [0.0, 1.0]
result = decoder.decode(syndrome)

Tensor Network Structure

The tensor network constructed by this decoder is based on the Tanner graph of the code, extended with noise and logical observable tensors. The structure is illustrated below:

      open/output index < logical observable
          --------
             |
s1      s2   |     s3   < syndromes               : product of 2D vectors [1 , 1-2pi] (pi is the probability detector i flipped)
|       |    |     |                        ----|
c1      c2  l1     c3   < checks / logical      | : delta tensors
|     / |   | \    |                            |
H   H   H   H  H   H    < Hadamard matrices     | TANNER (bipartite) GRAPH
  \ |   |  /   |  /                             |
    e1  e2     e3       < errors                | : delta tensors
    |   |     /                            -----|
     \ /     /
    P(e1, e2, e3)       < noise / error model     : classical probability density

ci, ej, lk are delta tensors represented sparsely as indices.
Parameters:
  • H – Parity check matrix (numpy.ndarray), shape (num_checks, num_qubits)

  • logical_obs – Logical observable matrix (numpy.ndarray), shape (1, num_qubits)

  • noise_model – Noise model, either a list of probabilities (length = num_qubits) or a quimb.tensor.TensorNetwork

  • check_inds – (optional) List of check index names

  • error_inds – (optional) List of error index names

  • logical_inds – (optional) List of logical index names

  • logical_tags – (optional) List of logical tags

  • contract_noise_model – (bool, optional) Whether to contract the noise model at initialization (default: True)

  • dtype – (str, optional) Data type for tensors (default: “float32”)

  • device – (str, optional) Device for tensor operations (“cpu”, “cuda”, or “cuda:X”, default: “cuda”)

Methods

decode(syndrome)

Decode a single syndrome by contracting the tensor network.

Parameters:

syndrome – List of float values (soft-decision probabilities) for each check.

Returns:

DecoderResult with the probability that the logical observable flipped.

decode_batch(syndrome_batch)

Decode a batch of syndromes.

Parameters:

syndrome_batch – numpy.ndarray of shape (batch_size, num_checks)

Returns:

List of DecoderResult objects with the probability that the logical observable has flipped for each syndrome.

optimize_path(optimize=None, batch_size=-1)

Optimize the contraction path for the tensor network.

Parameters:
  • optimize – Optimization options or None

  • batch_size – (int, optional) Batch size for optimization (default: -1, no batching)

Returns:

Optimizer info object

Real-Time Decoding

The Real-Time Decoding API enables low-latency error correction on quantum hardware by allowing CUDA-Q quantum kernels to interact with decoders during circuit execution. This API is designed for use cases where corrections must be calculated and applied within qubit coherence times.

The real-time decoding system supports simulation environments for local testing and hardware integration (e.g., on Quantinuum’s Helios QPU).

Core Decoding Functions

These functions can be called from within CUDA-Q quantum kernels (@cudaq.kernel decorated functions) to interact with real-time decoders.

cudaq_qec.qec.enqueue_syndromes(decoder_id, syndromes, tag=0)

Enqueue syndrome measurements for decoding.

Parameters:
  • decoder_id – Unique identifier for the decoder instance (matches configured decoder ID)

  • syndromes – List of syndrome measurement results from stabilizer measurements

  • tag – Optional tag for logging and debugging (default: 0)

Example:

import cudaq
import cudaq_qec as qec
from cudaq_qec import patch

@cudaq.kernel
def measure_and_decode(logical: patch, decoder_id: int):
    syndromes = measure_stabilizers(logical)
    qec.enqueue_syndromes(decoder_id, syndromes, 0)
cudaq_qec.qec.get_corrections(decoder_id, return_size, reset=False)

Retrieve calculated corrections from the decoder.

Parameters:
  • decoder_id – Unique identifier for the decoder instance

  • return_size – Number of correction bits to return (typically equals number of logical observables)

  • reset – Whether to reset accumulated corrections after retrieval (default: False)

Returns:

List of boolean values indicating detected bit flips for each logical observable

Example:

@cudaq.kernel
def apply_corrections(logical: patch, decoder_id: int):
    corrections = qec.get_corrections(decoder_id, 1, False)
    if corrections[0]:
        x(logical.data)  # Apply transversal X correction
cudaq_qec.qec.reset_decoder(decoder_id)

Reset decoder state, clearing all queued syndromes and accumulated corrections.

Parameters:

decoder_id – Unique identifier for the decoder instance to reset

Example:

@cudaq.kernel
def run_experiment(decoder_id: int):
    qec.reset_decoder(decoder_id)  # Reset at start of each shot
    # ... perform experiment ...

Configuration API

The configuration API enables setting up decoders before circuit execution. Decoders are configured using YAML files or programmatically constructed configuration objects.

cudaq_qec.configure_decoders(config)

Configure decoders from a multi_decoder_config object.

Parameters:

config – multi_decoder_config object containing decoder specifications

Returns:

0 on success, non-zero error code on failure

cudaq_qec.configure_decoders_from_file(config_file)

Configure decoders from a YAML file.

Parameters:

config_file – Path to YAML configuration file

Returns:

0 on success, non-zero error code on failure

cudaq_qec.configure_decoders_from_str(config_str)

Configure decoders from a YAML string.

Parameters:

config_str – YAML configuration as a string

Returns:

0 on success, non-zero error code on failure

cudaq_qec.finalize_decoders()

Finalize and clean up decoder resources. Should be called before program exit.

Helper Functions

Real-time decoding requires converting matrices to sparse format for efficient decoder configuration. The following utility functions are essential:

cudaq_qec.pcm_to_sparse_vec(pcm)

Convert a parity check matrix (PCM) to sparse vector representation for decoder configuration.

Parameters:

pcm – Dense binary matrix as numpy array (e.g., dem.detector_error_matrix or dem.observables_flips_matrix)

Returns:

Sparse vector (list of integers) where -1 separates rows

Usage in real-time decoding:

config.H_sparse = qec.pcm_to_sparse_vec(dem.detector_error_matrix)
config.O_sparse = qec.pcm_to_sparse_vec(dem.observables_flips_matrix)
cudaq_qec.pcm_from_sparse_vec(sparse_vec, num_rows, num_cols)

Convert sparse vector representation back to a dense parity check matrix.

Parameters:
  • sparse_vec – Sparse representation (from YAML or decoder config)

  • num_rows – Number of rows in the output matrix

  • num_cols – Number of columns in the output matrix

Returns:

Dense binary matrix as numpy array

cudaq_qec.generate_timelike_sparse_detector_matrix(num_syndromes_per_round, num_rounds, include_first_round)

Generate the D_sparse matrix that encodes how detectors relate across syndrome measurement rounds.

Parameters:
  • num_syndromes_per_round – Number of syndrome measurements per round (typically code distance squared)

  • num_rounds – Total number of syndrome measurement rounds

  • include_first_round – Boolean (False for standard memory experiments) or list for custom first round

Returns:

Sparse matrix encoding detector relationships

Usage in real-time decoding:

config.D_sparse = qec.generate_timelike_sparse_detector_matrix(
    numSyndromesPerRound, numRounds, False)

See also Parity Check Matrix Utilities for additional PCM manipulation functions.

Common

Parity Check Matrix Utilities