Real-Time Decoding

Real-time decoding enables CUDA-Q QEC decoders to operate in low-latency, online environments where decoders run concurrently with quantum computations. This capability is essential for quantum error correction on real quantum hardware, where corrections must be calculated and applied within qubit coherence times.

The real-time decoding framework supports two primary deployment scenarios:

Hardware Integration: Decoders running on classical computers connected to real quantum processing units (QPUs) via low-latency networks
Simulation Mode: Decoders operating in simulated environments for testing and development on local systems

Workflow Overview

Real-time decoding integrates seamlessly into quantum error correction pipelines through a carefully designed four-stage workflow. This workflow separates the computationally intensive characterization phase from the latency-critical runtime phase, ensuring that decoders can operate efficiently during quantum circuit execution.

The workflow consists of four stages:

Detector Error Model (DEM) Generation: Before running a quantum program, the user first characterizes how errors propagate through the quantum circuit. The library internally uses Memory Syndrome Matrix (MSM) representations to track error propagation, but this complexity is abstracted through helper functions like z_dem_from_memory_circuit. The user simply provides a quantum code, noise model, and circuit parameters, and receives a complete detector error model that maps error mechanisms to syndrome patterns. This step is performed once during development.
Decoder Configuration and Saving: Using the DEM, the user configures decoder instances with the specific error model data. This includes converting parity check matrices to sparse format, setting decoder-specific parameters (like lookup table depth or BP iterations), and assigning unique IDs to each logical qubit’s decoder. The configuration is then saved to a YAML file, capturing all the information decoders need to interpret syndrome measurements correctly. This creates a portable, reusable configuration that separates characterization from execution.
Decoder Loading and Initialization: Just before circuit execution, the user loads the saved YAML configuration file. The library parses the configuration, instantiates the appropriate decoder implementations, initializes internal data structures, and registers the decoders with the CUDA-Q runtime. For GPU-based decoders, matrices are transferred to device memory; for lookup table decoders, syndrome-to-correction mappings are constructed. This initialization takes milliseconds to seconds depending on code size and happens before quantum operations begin.
Real-Time Decoding: During quantum circuit execution, the decoding API is used within quantum kernels to interact with decoders. As the circuit measures stabilizers, syndromes are enqueued to the decoder, which processes them concurrently. When corrections are needed, the decoder is queried and the suggested operations are applied to the logical qubits. This entire process happens within the coherence time constraints of the quantum hardware.

Real-Time Decoding Example

Here are two examples demonstrating real-time decoding in Python and C++:

Python

#!/usr/bin/env python3
"""
Simple 3-qubit repetition code with real-time decoding.
This is the most basic QEC example possible.
"""

import os

os.environ["CUDAQ_DEFAULT_SIMULATOR"] = "stim"

import cudaq
import cudaq_qec as qec


# Prepare logical |0⟩
@cudaq.kernel
def prep0(logical: qec.patch):
    for i in range(logical.data.size()):
        reset(logical.data[i])


# Measure ZZ stabilizers for 3-qubit repetition code
@cudaq.kernel
def measure_stabilizers(logical: qec.patch) -> list[bool]:
    for i in range(logical.ancz.size()):
        reset(logical.ancz[i])

    # Z0Z1 stabilizer
    cx(logical.data[0], logical.ancz[0])
    cx(logical.data[1], logical.ancz[0])

    # Z1Z2 stabilizer
    cx(logical.data[1], logical.ancz[1])
    cx(logical.data[2], logical.ancz[1])

    return [mz(logical.ancz[0]), mz(logical.ancz[1])]


# [Begin QEC Circuit]
# QEC circuit with real-time decoding
@cudaq.kernel
def qec_circuit() -> list[bool]:
    qec.reset_decoder(0)

    data = cudaq.qvector(3)
    ancz = cudaq.qvector(2)
    ancx = cudaq.qvector(0)
    logical = patch(data, ancx, ancz)

    prep0(logical)

    # 3 rounds of syndrome measurement
    for _ in range(3):
        syndromes = measure_stabilizers(logical)
        qec.enqueue_syndromes(0, syndromes, 0)

    # Get corrections and apply them
    corrections = qec.get_corrections(0, 3, False)
    for i in range(3):
        if corrections[i]:
            x(data[i])

    return mz(data)


# [End QEC Circuit]


def main():
    # Get 3-qubit repetition code
    code = qec.get_code("repetition", distance=3)

    # [Begin DEM Generation]
    # Step 1: Generate detector error model
    print("Step 1: Generating DEM...")
    cudaq.set_target("stim")

    noise = cudaq.NoiseModel()
    noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.01), 1)

    dem = qec.z_dem_from_memory_circuit(code, qec.operation.prep0, 3, noise)
    # [End DEM Generation]

    # [Begin Save DEM]
    # Save decoder config
    config = qec.decoder_config()
    config.id = 0
    config.type = "multi_error_lut"
    config.block_size = dem.detector_error_matrix.shape[1]
    config.syndrome_size = dem.detector_error_matrix.shape[0]
    config.H_sparse = qec.pcm_to_sparse_vec(dem.detector_error_matrix)
    config.O_sparse = qec.pcm_to_sparse_vec(dem.observables_flips_matrix)

    # Calculate numRounds from DEM (we send 1 additional round, so add 1)
    num_syndromes_per_round = 2  # Z0Z1 and Z1Z2
    num_rounds = dem.detector_error_matrix.shape[
        0] // num_syndromes_per_round + 1
    config.D_sparse = qec.generate_timelike_sparse_detector_matrix(
        num_syndromes_per_round, num_rounds, False)
    lut_config = qec.multi_error_lut_config()
    lut_config.lut_error_depth = 2
    config.set_decoder_custom_args(lut_config)

    multi_config = qec.multi_decoder_config()
    multi_config.decoders = [config]

    with open("config.yaml", 'w') as f:
        f.write(multi_config.to_yaml_str(200))
    print("Saved config to config.yaml")
    # [End Save DEM]

    # Step 2: Load config and run circuit
    print("\nStep 2: Running circuit with decoding...")
    # [Begin Load DEM]
    qec.configure_decoders_from_file("config.yaml")
    # [End Load DEM]

    run_result = cudaq.run(qec_circuit, shots_count=10)
    print("Ran 10 shots")

    qec.finalize_decoders()

    print("\nDone!")


if __name__ == "__main__":
    main()
# [End Documentation]

C++

// Simple 3-qubit repetition code with real-time decoding
// This is the most basic QEC example possible

#include "cudaq.h"
#include "cudaq/qec/code.h"
#include "cudaq/qec/experiments.h"
#include "cudaq/qec/pcm_utils.h"
#include "cudaq/qec/realtime/decoding.h"
#include "cudaq/qec/realtime/decoding_config.h"
#include <common/NoiseModel.h>
#include <fstream>

// [Begin Save DEM]
// Save decoder configuration to YAML file
void save_dem(const cudaq::qec::detector_error_model &dem,
              const std::string &filename) {
  // Create decoder config
  cudaq::qec::decoding::config::decoder_config config;
  config.id = 0;
  config.type = "multi_error_lut";
  config.block_size = dem.num_error_mechanisms();
  config.syndrome_size = dem.num_detectors();
  config.H_sparse = cudaq::qec::pcm_to_sparse_vec(dem.detector_error_matrix);
  config.O_sparse = cudaq::qec::pcm_to_sparse_vec(dem.observables_flips_matrix);

  // Calculate numRounds from DEM (we send 1 additional round, so add 1)
  uint64_t numSyndromesPerRound = 2; // Z0Z1 and Z1Z2
  auto numRounds = dem.num_detectors() / numSyndromesPerRound + 1;
  config.D_sparse = cudaq::qec::generate_timelike_sparse_detector_matrix(
      numSyndromesPerRound, numRounds, false);

  cudaq::qec::decoding::config::multi_error_lut_config lut_config;
  lut_config.lut_error_depth = 2;
  config.decoder_custom_args = lut_config;

  cudaq::qec::decoding::config::multi_decoder_config multi_config;
  multi_config.decoders.push_back(config);

  std::ofstream file(filename);
  file << multi_config.to_yaml_str(200);
  file.close();
  printf("Saved config to %s\n", filename.c_str());
}
// [End Save DEM]

// [Begin Load DEM]
// Load decoder configuration from YAML file
void load_dem(const std::string &filename) {
  std::ifstream file(filename);
  std::string yaml((std::istreambuf_iterator<char>(file)),
                   std::istreambuf_iterator<char>());
  auto config =
      cudaq::qec::decoding::config::multi_decoder_config::from_yaml_str(yaml);
  cudaq::qec::decoding::config::configure_decoders(config);
  printf("Loaded config from %s\n", filename.c_str());
}
// [End Load DEM]

// Prepare logical |0⟩
__qpu__ void prep0(cudaq::qec::patch logical) {
  for (std::size_t i = 0; i < logical.data.size(); ++i) {
    cudaq::reset(logical.data[i]);
  }
}

// Measure ZZ stabilizers for 3-qubit repetition code
__qpu__ std::vector<cudaq::measure_result>
measure_stabilizers(cudaq::qec::patch logical) {
  for (std::size_t i = 0; i < logical.ancz.size(); ++i) {
    cudaq::reset(logical.ancz[i]);
  }

  // Z0Z1 stabilizer
  cudaq::x<cudaq::ctrl>(logical.data[0], logical.ancz[0]);
  cudaq::x<cudaq::ctrl>(logical.data[1], logical.ancz[0]);

  // Z1Z2 stabilizer
  cudaq::x<cudaq::ctrl>(logical.data[1], logical.ancz[1]);
  cudaq::x<cudaq::ctrl>(logical.data[2], logical.ancz[1]);

  return {mz(logical.ancz[0]), mz(logical.ancz[1])};
}

// [Begin QEC Circuit]
// QEC circuit with real-time decoding
__qpu__ int64_t qec_circuit() {
  cudaq::qec::decoding::reset_decoder(0);

  cudaq::qvector data(3);
  cudaq::qvector ancz(2);
  cudaq::qvector ancx; // Empty for repetition code
  cudaq::qec::patch logical(data, ancx, ancz);

  prep0(logical);

  // 3 rounds of syndrome measurement
  for (int round = 0; round < 3; ++round) {
    auto syndromes = measure_stabilizers(logical);
    cudaq::qec::decoding::enqueue_syndromes(0, syndromes);
  }

  // Get corrections and apply them
  auto corrections = cudaq::qec::decoding::get_corrections(0, 3);
  for (std::size_t i = 0; i < 3; ++i) {
    if (corrections[i])
      cudaq::x(data[i]);
  }

  return cudaq::to_integer(mz(data));
}
// [End QEC Circuit]

int main() {
  auto code = cudaq::qec::get_code("repetition",
                                   cudaqx::heterogeneous_map{{"distance", 3}});

  // [Begin DEM Generation]
  // Step 1: Generate detector error model
  printf("Step 1: Generating DEM...\n");
  cudaq::noise_model noise;
  noise.add_all_qubit_channel("x", cudaq::depolarization2(0.01), 1);

  auto dem = cudaq::qec::z_dem_from_memory_circuit(
      *code, cudaq::qec::operation::prep0, 3, noise);
  // [End DEM Generation]

  save_dem(dem, "config.yaml");

  // Step 2: Load config and run circuit
  printf("\nStep 2: Running circuit with decoding...\n");
  load_dem("config.yaml");

  cudaq::run(10, qec_circuit);
  printf("Ran 10 shots\n");

  cudaq::qec::decoding::config::finalize_decoders();

  printf("\nDone!\n");
  return 0;
}
// [End Documentation]

The examples above showcase the main components of the real-time decoding workflow:

Decoder configuration file: Initializes and configures the decoders before circuit execution.
Quantum kernel: Uses the real-time decoding API to interact with the decoders, primarily through reset_decoder, enqueue_syndromes, and get_corrections.
Syndrome extraction: Measures the stabilizers of the logical qubits.
Correction application: Applies the corrections to the logical qubits.
Logical observable measurement: Measures the logical observables of the logical qubits.
Decoder finalization: Frees up resources after circuit execution.

The API is designed to be called from within quantum kernels (marked with @cudaq.kernel in Python or __qpu__ in C++). The runtime automatically routes these calls to the appropriate backend—whether a simulation environment on the local machine or a low-latency connection to quantum hardware. The API is device-agnostic, so the same kernel code works across different deployment scenarios.

The user is required to provide a configuration file or generate one if it is not present. The generation process depends on the decoder type and the detector error model studied in other sections of the documentation. Moreover, the user must write an appropriate kernel that describes the correct syndrome extraction and correction application logic.

The next section provides instructions to generate a configuration file, write a quantum kernel, and compile and run the examples correctly.

Configuration

The configuration process transforms a quantum circuit’s error characteristics into a format that decoders can efficiently process. This section walks through each step in detail, showing how to go from circuit simulation to a fully configured real-time decoder.

Step 1: Generate Detector Error Model

The first step is to characterize the quantum circuit’s behavior under noise. A detector error model (DEM) captures the relationship between physical errors and the syndrome patterns they produce. This characterization is circuit-specific and depends on the code structure, noise model, and measurement schedule.

Under the hood, the CUDA-Q QEC library uses the Memory Syndrome Matrix (MSM) representation to efficiently encode error propagation information. The MSM captures all possible error chains and their syndrome signatures, tracking how errors propagate through the circuit over time. However, this complexity is abstracted away from the user through convenient helper functions.

The library provides a family of dem_from_memory_circuit functions that automatically handle the MSM generation and processing:

z_dem_from_memory_circuit: For circuits measuring Z-basis stabilizers (used in the example below)
x_dem_from_memory_circuit: For circuits measuring X-basis stabilizers
dem_from_memory_circuit: General-purpose function for arbitrary stabilizer measurements

These functions take a quantum code, an initial state preparation operation, the number of measurement rounds, and a noise model, then return a complete detector error model ready for decoder configuration. The user simply needs to configure the noise model and specify the circuit structure—the library handles all the error tracking and matrix construction automatically.

Here is how to generate a DEM for a circuit:

Python

    # Step 1: Generate detector error model
    print("Step 1: Generating DEM...")
    cudaq.set_target("stim")

    noise = cudaq.NoiseModel()
    noise.add_all_qubit_channel("x", cudaq.Depolarization2(0.01), 1)

    dem = qec.z_dem_from_memory_circuit(code, qec.operation.prep0, 3, noise)

C++

  // Step 1: Generate detector error model
  printf("Step 1: Generating DEM...\n");
  cudaq::noise_model noise;
  noise.add_all_qubit_channel("x", cudaq::depolarization2(0.01), 1);

  auto dem = cudaq::qec::z_dem_from_memory_circuit(
      *code, cudaq::qec::operation::prep0, 3, noise);

Step 2: Configure and Save Decoder

Once a DEM has been generated, the next step is to package this information into a decoder configuration and save it to a YAML file. The configuration structure holds all the parameters a decoder needs: the parity check matrix (H_sparse), the observable flip matrix (O_sparse), the detector error matrix (D_sparse), and decoder-specific tuning parameters.

These matrices are generated in sparse matrix format, which is crucial for performance. They can be large considering error correcting codes with large number of physical qubits, and moreover, real-time decoders process thousands of syndrome measurements per second, and take decision based on these matrices, so compact representations are essential. The helper function pcm_to_sparse_vec is used to convert the dense binary matrices into a space-efficient format where -1 marks row boundaries and integers represent column indices of non-zero elements.

Each decoder type has its own configuration structure with specific parameters. For lookup table decoders, the user specifies how many simultaneous errors to consider. For belief propagation decoders, the user sets iteration limits and convergence criteria. The configuration API provides type-safe structures for each decoder, ensuring that all required parameters are included.

The configuration is then saved to a YAML file for reuse. The YAML format is human-readable, making it easy to inspect, modify, and share configurations across different execution environments.

Here is how to create and save a decoder configuration:

Python

    # Save decoder config
    config = qec.decoder_config()
    config.id = 0
    config.type = "multi_error_lut"
    config.block_size = dem.detector_error_matrix.shape[1]
    config.syndrome_size = dem.detector_error_matrix.shape[0]
    config.H_sparse = qec.pcm_to_sparse_vec(dem.detector_error_matrix)
    config.O_sparse = qec.pcm_to_sparse_vec(dem.observables_flips_matrix)

    # Calculate numRounds from DEM (we send 1 additional round, so add 1)
    num_syndromes_per_round = 2  # Z0Z1 and Z1Z2
    num_rounds = dem.detector_error_matrix.shape[
        0] // num_syndromes_per_round + 1
    config.D_sparse = qec.generate_timelike_sparse_detector_matrix(
        num_syndromes_per_round, num_rounds, False)
    lut_config = qec.multi_error_lut_config()
    lut_config.lut_error_depth = 2
    config.set_decoder_custom_args(lut_config)

    multi_config = qec.multi_decoder_config()
    multi_config.decoders = [config]

    with open("config.yaml", 'w') as f:
        f.write(multi_config.to_yaml_str(200))
    print("Saved config to config.yaml")

C++

// Save decoder configuration to YAML file
void save_dem(const cudaq::qec::detector_error_model &dem,
              const std::string &filename) {
  // Create decoder config
  cudaq::qec::decoding::config::decoder_config config;
  config.id = 0;
  config.type = "multi_error_lut";
  config.block_size = dem.num_error_mechanisms();
  config.syndrome_size = dem.num_detectors();
  config.H_sparse = cudaq::qec::pcm_to_sparse_vec(dem.detector_error_matrix);
  config.O_sparse = cudaq::qec::pcm_to_sparse_vec(dem.observables_flips_matrix);

  // Calculate numRounds from DEM (we send 1 additional round, so add 1)
  uint64_t numSyndromesPerRound = 2; // Z0Z1 and Z1Z2
  auto numRounds = dem.num_detectors() / numSyndromesPerRound + 1;
  config.D_sparse = cudaq::qec::generate_timelike_sparse_detector_matrix(
      numSyndromesPerRound, numRounds, false);

  cudaq::qec::decoding::config::multi_error_lut_config lut_config;
  lut_config.lut_error_depth = 2;
  config.decoder_custom_args = lut_config;

  cudaq::qec::decoding::config::multi_decoder_config multi_config;
  multi_config.decoders.push_back(config);

  std::ofstream file(filename);
  file << multi_config.to_yaml_str(200);
  file.close();
  printf("Saved config to %s\n", filename.c_str());
}

Step 3: Load Configuration

Before running quantum circuits with real-time decoding, the saved decoder configuration must be loaded and initialized. This step bridges the gap between the offline characterization phase (Steps 1-2) and the online execution phase (Step 4), preparing the decoder instances for real-time operation.

The configuration loading process performs several important operations:

YAML Parsing: The configuration file is parsed and validated to ensure all required fields are present and properly formatted. This includes checking matrix dimensions, decoder parameters, and metadata.
Decoder Instantiation: Based on the decoder type specified in the configuration (e.g., multi_error_lut, nv-qldpc-decoder), the appropriate decoder implementation is instantiated and allocated resources on the GPU or CPU.
Matrix Initialization: The sparse matrices (H_sparse, O_sparse, D_sparse) are loaded into the decoder’s internal data structures. For GPU-based decoders, this includes transferring data to device memory.
Decoder-Specific Initialization: Each decoder type performs its own preparation: lookup table decoders build syndrome-to-correction mappings, belief propagation decoders initialize message-passing structures, and sliding window decoders configure their buffering mechanisms.
Backend Registration: The decoder instances are registered with the CUDA-Q runtime so they can be accessed from quantum kernels using their unique IDs.

This initialization happens quickly, typically only a few milliseconds for small codes and up to a few seconds for large distance codes with complex decoders. Since it occurs before quantum circuit execution, it does not impact the latency-critical decoding operations.

The separation of configuration from execution provides significant benefits: users can maintain a library of configurations for different code distances, noise levels, and decoder types, then simply load the appropriate one when running experiments. Configurations can be version-controlled alongside code, shared across research teams, and validated offline before deployment to quantum hardware.

Here is how to load a decoder configuration:

Python

    qec.configure_decoders_from_file("config.yaml")

C++

// Load decoder configuration from YAML file
void load_dem(const std::string &filename) {
  std::ifstream file(filename);
  std::string yaml((std::istreambuf_iterator<char>(file)),
                   std::istreambuf_iterator<char>());
  auto config =
      cudaq::qec::decoding::config::multi_decoder_config::from_yaml_str(yaml);
  cudaq::qec::decoding::config::configure_decoders(config);
  printf("Loaded config from %s\n", filename.c_str());
}

Step 4: Use in Quantum Kernels

With decoders configured and initialized, they can be used within quantum kernels. The real-time decoding API provides three key functions that integrate seamlessly with CUDA-Q’s quantum programming model: reset_decoder prepares a decoder for a new shot, enqueue_syndromes sends syndrome measurements to the decoder for processing, and get_corrections retrieves the decoder’s recommended corrections.

These functions are designed to be called from within quantum kernels (marked with @cudaq.kernel in Python or __qpu__ in C++). The runtime automatically routes these calls to the appropriate backend - whether that is a simulation environment on the local machine or a low-latency connection to quantum hardware. The API is device-agnostic, so the same kernel code works across different deployment scenarios.

The typical usage pattern is: reset the decoder at the start of each shot, enqueue syndromes after each stabilizer measurement round, then get corrections before measuring the logical observables. Decoders process syndromes asynchronously, so by the time get_corrections is called, the decoder has usually finished its analysis. If decoding takes longer than expected, get_corrections will block until results are available.

Note

While resetting the decoder at the beginning of each shot isn’t strictly required, it is strongly recommended to ensure that when running on a remote QPU, any potential errors encountered in one shot do not affect future shot results.

Here is how to use the real-time decoding API in quantum kernels:

Python

# QEC circuit with real-time decoding
@cudaq.kernel
def qec_circuit() -> list[bool]:
    qec.reset_decoder(0)

    data = cudaq.qvector(3)
    ancz = cudaq.qvector(2)
    ancx = cudaq.qvector(0)
    logical = patch(data, ancx, ancz)

    prep0(logical)

    # 3 rounds of syndrome measurement
    for _ in range(3):
        syndromes = measure_stabilizers(logical)
        qec.enqueue_syndromes(0, syndromes, 0)

    # Get corrections and apply them
    corrections = qec.get_corrections(0, 3, False)
    for i in range(3):
        if corrections[i]:
            x(data[i])

    return mz(data)

C++

// QEC circuit with real-time decoding
__qpu__ int64_t qec_circuit() {
  cudaq::qec::decoding::reset_decoder(0);

  cudaq::qvector data(3);
  cudaq::qvector ancz(2);
  cudaq::qvector ancx; // Empty for repetition code
  cudaq::qec::patch logical(data, ancx, ancz);

  prep0(logical);

  // 3 rounds of syndrome measurement
  for (int round = 0; round < 3; ++round) {
    auto syndromes = measure_stabilizers(logical);
    cudaq::qec::decoding::enqueue_syndromes(0, syndromes);
  }

  // Get corrections and apply them
  auto corrections = cudaq::qec::decoding::get_corrections(0, 3);
  for (std::size_t i = 0; i < 3; ++i) {
    if (corrections[i])
      cudaq::x(data[i]);
  }

  return cudaq::to_integer(mz(data));
}

Backend Selection

CUDA-Q QEC’s real-time decoding system is designed to work seamlessly across different execution environments. The backend selection determines where quantum circuits run and how decoders communicate with the quantum processor. Understanding the differences between simulation and hardware backends helps the user develop efficiently and deploy confidently.

Simulation Backend

The simulation backend is the primary tool during development, testing, and algorithm validation. It runs entirely on the local machine, using quantum simulators like Stim to execute circuits while decoders process syndromes and calculation corrections. This setup is ideal for rapid iteration: the user can test decoder configurations, validate circuit logic, and debug syndrome processing without waiting for hardware access or paying for compute time.

The simulation backend mimics real-time decoding’s concurrent operation by running the decoder(s) within the same process as the simulator. This means that other than GPU hardware differences between the local environment and the remote NVQLink decoders, the decoders behave the same way whether testing locally or running on a quantum computer. The main difference is that simulation does not have the same strict latency constraints, making it easier to experiment with complex decoder configurations.

Use the simulation backend for local development and testing:

Python

import cudaq
import cudaq_qec as qec

cudaq.set_target("stim")  # Or other simulator
qec.configure_decoders_from_file("config.yaml")

# Run circuit with noise model
results = cudaq.run(my_circuit, shots_count=100,
                   noise_model=cudaq.NoiseModel())

C++

# Compile with simulation support
nvq++ -std=c++20 my_circuit.cpp -lcudaq-qec \
      -lcudaq-qec-realtime-decoding \
      -lcudaq-qec-realtime-decoding-simulation

./a.out

Quantinuum Hardware Backend

The Quantinuum hardware backend connects quantum circuits to real ion-trap quantum computers. Unlike the simulation backend where decoders run on the local machine, the Quantinuum backend uploads the decoder configuration to Quantinuum’s infrastructure, where decoders run on GPU-equipped servers co-located with the quantum hardware. This architecture minimizes latency between syndrome measurements and correction application.

Important Setup Requirements:

Configuration Upload: When configure_decoders_from_file() or configure_decoders() is called, the decoder configuration is automatically base64-encoded and uploaded to Quantinuum’s REST API (api/gpu_decoder_configs/v1beta/). This happens before job submission. The configuration includes all decoder parameters, error models, and sparse matrices.
Extra Payload Provider: The user must specify extra_payload_provider="decoder" when setting the target. This registers a payload provider that injects the decoder configuration UUID into each job request, telling Quantinuum which decoder configuration to use for the circuit.
Backend Compilation: For C++, the user must link against -lcudaq-qec-realtime-decoding-quantinuum instead of the simulation library. This library implements the Quantinuum-specific communication protocol for syndrome transmission.
Configuration Lifetime: Decoder configurations persist on Quantinuum’s servers and are referenced by UUID. If the configuration is modified, it must be uploaded again - the system will generate a new UUID and use the new configuration for subsequent jobs.

Note: The real-time decoding interfaces are experimental, and subject to change. Real-time decoding on Quantinuum’s Helios-1 device is currently only available to partners and collaborators. Please email QCSupport@quantinuum.com for more information.

Emulation vs. Hardware Modes:

Emulation mode (emulate=True) is particularly valuable for testing the deployment setup without consuming hardware credits. Running with this flag performs a local, noise-free simulation without any actual submission to Quantinuum’s servers.

Use the Quantinuum backend for hardware or emulation:

Python

cudaq.set_target("quantinuum",
                 emulate=False,  # True for emulation
                 machine="Helios-1",
                 extra_payload_provider="decoder")

qec.configure_decoders_from_file("config.yaml")
results = cudaq.run(my_circuit, shots_count=100)

C++

# Compile for Quantinuum
nvq++ --target quantinuum --quantinuum-machine Helios-1 \
      my_circuit.cpp -lcudaq-qec \
      -lcudaq-qec-realtime-decoding \
      -lcudaq-qec-realtime-decoding-quantinuum

./a.out

Compilation and Execution Examples

This section provides complete, tested compilation and execution commands for both simulation and hardware backends, extracted from the CUDA-Q QEC test infrastructure. The section begins with common usage patterns that guide decoder and compilation choices, then provides the specific commands needed for each backend.

Common Use Cases

Before diving into compilation details, it is helpful to understand the typical scenarios and how they map to decoder choices and workflow parameters. A full set of common examples is provided to guide development. These examples describe the complete workflow for developing an application that uses real-time decoding in a single file. The relevant C++ and Python examples can be found at the following path: libs/qec/unittests/realtime/app_examples. The files have names like surface_code-1.cpp and surface_code_1.py. The rest of this section shows how to compile and run these 2 examples.

These examples provide comprehensive support for application development with real-time decoding. The subsequent step, once the user has chosen the appropriate decoder and the appropriate backend, is to compile and execute the application. Instructions are provided below for both the simulation and the hardware backends.

C++ Compilation

Simulation Backend (Stim)

Compile with the simulation backend for local testing:

nvq++ --target stim surface_code-1.cpp         \
      -lcudaq-qec                              \
      -lcudaq-qec-realtime-decoding            \
      -lcudaq-qec-realtime-decoding-simulation \
      -o surface_code-1

# Execute
./surface_code-1 --distance 3 --num_shots 1000 --save_dem config.yaml

Key Points:

--target stim: Use the Stim quantum simulator
-lcudaq-qec: Core QEC library with codes and experiments
-lcudaq-qec-realtime-decoding: Real-time decoding core API
-lcudaq-qec-realtime-decoding-simulation: Simulation-specific decoder backend

Quantinuum Backend (Hardware)

Compile for actual Quantinuum hardware:

nvq++ --target quantinuum                         \
      --quantinuum-machine Helios-1               \
      --quantinuum-extra-payload-provider decoder \
      surface_code-1.cpp                          \
      -lcudaq-qec                                 \
      -lcudaq-qec-realtime-decoding               \
      -lcudaq-qec-realtime-decoding-quantinuum    \
      -Wl,--export-dynamic                        \
      -o surface_code-1-quantinuum-hardware

# Execute
export CUDAQ_QUANTINUUM_CREDENTIALS=<credentials_file_path>
./surface_code-1-quantinuum-hardware --distance 3 --num_shots 100 --load_dem config.yaml

Key Points:

Use Quantinuum target names: Helios-1, Helios-1E, Helios-1SC, etc.
Currently only Helios-1 will run the GPU decoders. The Helios-1E emulator will not run the GPU decoders.
Set CUDAQ_QUANTINUUM_CREDENTIALS environment variable with the user’s credentials. Check out the Quantinuum hardware backend documentation for more information.

Emulated Quantinuum Compilation Workflow

Compile for Quantinuum emulation mode:

nvq++ --target quantinuum --emulate            \
      --quantinuum-machine Helios-Fake         \
      surface_code-1.cpp                       \
      -lcudaq-qec                              \
      -lcudaq-qec-realtime-decoding            \
      -lcudaq-qec-realtime-decoding-quantinuum \
      -Wl,--export-dynamic                     \
      -o surface_code-1-quantinuum-emulate

# Execute
./surface_code-1-quantinuum-emulate --distance 3 --num_shots 1000 --load_dem config.yaml

Key Points:

--target quantinuum --emulate: Emulate Quantinuum compilation path
--quantinuum-machine Helios-Fake: Specify machine (Helios-Fake for emulation)
-lcudaq-qec-realtime-decoding-quantinuum: Quantinuum-specific decoder backend (replaces -simulation)
-Wl,--export-dynamic: Required linker flag for dynamic symbol resolution

Note

When running with --emulate, there is no noise being applied because there is currently no way to express noise in target-specific QIR. Therefore, when running with emulation, users will see noise-free sample data.

Python Execution

Simulation Backend (Stim)

# Generate a decoder configuration file
python3 surface_code-1.py --distance 3 --save_dem config.yaml
# Run the circuit with the decoder configuration
python3 surface_code-1.py --distance 3 --load_dem config.yaml --num_shots 1000

Quantinuum Backend (Hardware)

python3 surface_code-1.py --distance 3 --load_dem config.yaml --num_shots 1000 --target quantinuum --machine-name Helios-1

Key Points:

Use real machine names (check Quantinuum portal for available machines)
Reduce shot count for hardware experiments (hardware time is expensive)

Emulated Quantinuum Compilation Workflow

python3 surface_code-1.py --distance 3 --load_dem config.yaml --num_shots 1000 --target quantinuum --emulate

Key Points:

emulate=True: Emulate Quantinuum compilation path
Decoder config is automatically uploaded to Quantinuum’s servers when cudaq_qec.configure_decoders_from_file() (Python) or cudaq::qec::decoding::config::configure_decoders_from_file() (C++) is called

Complete Workflow Example

Given that the user follows the structure of the examples provided, where each executable takes terminal arguments to configure the application, the following workflow can be used to compile and execute the application.

# Phase 1: Generate Detector Error Model (DEM)
# This is done once per code/distance/noise configuration

## C++
./surface_code-1 --distance 3 --num_shots 1000 --p_spam 0.01 \
                 --save_dem config_d3.yaml --num_rounds 12 --decoder_window 6

## Python
python surface_code-1.py --distance 3 --num_shots 1000 --p_spam 0.01 \
                         --save_dem config_d3.yaml --num_rounds 12 --decoder_window 6

# Phase 2: Run with Real-Time Decoding
# Use the saved DEM configuration

## Simulation
./surface_code-1 --distance 3 --num_shots 1000 --load_dem config_d3.yaml \
                 --num_rounds 12 --decoder_window 6

## Quantinuum Emulation
./surface_code-1-quantinuum-emulate --distance 3 --num_shots 1000 --load_dem config_d3.yaml \
                        --num_rounds 12 --decoder_window 6

## Quantinuum Hardware
export CUDAQ_QUANTINUUM_CREDENTIALS=credentials.json
./surface_code-1-quantinuum-hardware --distance 3 --num_shots 100 --load_dem config_d3.yaml \
                      --num_rounds 12 --decoder_window 6

Application Parameters:

--distance: Code distance (3, 5, 7, etc.)
--num_shots: Number of circuit repetitions
--p_spam: Physical error rate for noise model (DEM generation only)
--save_dem: Generate and save DEM configuration to file
--load_dem: Load existing DEM configuration from file
--num_rounds: Total number of syndrome measurement rounds
--decoder_window: Number of rounds processed per decoding window

Debugging and Environment Variables

Useful Environment Variables:

# Enable decoder configuration debugging
export CUDAQ_QEC_DEBUG_DECODER=1

# Set default simulator
export CUDAQ_DEFAULT_SIMULATOR=stim

# Dump JIT IR for debugging compilation issues
export CUDAQ_DUMP_JIT_IR=1

# Set Quantinuum credentials file
export CUDAQ_QUANTINUUM_CREDENTIALS=/path/to/credentials.json

The variables can be set in the user’s environment or in a script. They are valid both for python and C++ applications, however, they must be set before importing the cudaq or cudaq_qec libraries.

Common Compilation Issues:

Missing libraries: Ensure all -lcudaq-qec-* libraries are linked
Wrong backend library: Use -simulation for Stim, -quantinuum for Quantinuum
Missing --export-dynamic flag: Required for Quantinuum targets
Wrong target flags: --emulate with Helios-Fake for emulation, remove for hardware

Common Runtime Issues:

“Decoder X not found”: Call configure_decoders_from_file() before circuit execution
“Configuration upload failed”: Check network connectivity and Quantinuum credentials
Dimension mismatch errors: Verify DEM dimensions match the circuit’s syndrome count
High error rates: Check decoder window size matches DEM generation window

Decoder Selection

The page CUDA-Q QEC Decoders provides information about which decoders are compatible with real-time decoding.

Troubleshooting

Even with careful configuration, issues may be encountered during real-time decoding. This section covers the most common problems and their solutions, organized by symptom. When troubleshooting, start by isolating whether the issue is in DEM generation, decoder configuration, or runtime execution.

Configuration Upload Failures (Quantinuum Backend)

When using the Quantinuum backend, the decoder configuration must be uploaded to their REST API before job submission. Upload failures prevent the quantum program from running and can be difficult to diagnose without knowing what to look for.

Possible Issues:

Network connectivity problems: Connection to Quantinuum’s servers is interrupted or unstable
Configuration too large: Decoder configuration exceeds Quantinuum’s upload size limits (typically happens with large distance codes and lookup tables)
Invalid credentials: API authentication fails due to expired or incorrect credentials
Malformed configuration: YAML structure is invalid or contains unsupported parameters

Solutions:

Enable debug logging: Set CUDAQ_QEC_DEBUG_DECODER=1 environment variable to see the exact configuration being uploaded and any error messages from the REST API
Check network: Verify that Quantinuum’s API endpoints can be reached before running the program. Test with a simple job submission first.
Reduce configuration size: If uploads fail due to size, switch from lookup table decoders to QLDPC (much more compact), or use sliding window with smaller windows
Validate YAML locally: Before uploading, test that multi_decoder_config::from_yaml_str() can parse the configuration file without errors
Check credentials: Ensure the Quantinuum API credentials are valid and have not expired. Refresh tokens if necessary.
Test with emulation: Try emulate=True first - emulation uses the same upload infrastructure but provides faster feedback if there are configuration issues

Verification:

After fixing configuration issues, the following log messages should appear:

[info] Initializing realtime decoding library with config file: config.yaml
[info] Initializing decoders...
[info] Creating decoder 0 of type multi_error_lut
[info] Done initializing decoder 0 in 0.234 seconds

If errors appear instead, check the full error message - it often contains specific details about what failed (network timeout, size limit, parsing error, etc.).

Real-Time Decoding

Workflow Overview

Real-Time Decoding Example

Configuration

Step 1: Generate Detector Error Model

Step 2: Configure and Save Decoder

Step 3: Load Configuration

Step 4: Use in Quantum Kernels

Backend Selection

Simulation Backend

Quantinuum Hardware Backend

Compilation and Execution Examples

Common Use Cases

C++ Compilation

Python Execution

Complete Workflow Example

Debugging and Environment Variables

Decoder Selection

Troubleshooting

Configuration Upload Failures (Quantinuum Backend)

See Also