CUDA-Q Realtime Messaging Protocol

This document defines the RPC (Remote Procedure Call) payload encoding used by the realtime dispatch kernel for processing data and returning results. It complements the document on the CUDA-Q Realtime Host API which focuses on wiring and API usage.

Scope

  • RPC header/response wire format

  • PTP timestamp propagation for latency measurement

  • Payload encoding and type system

  • Schema contract and payload interpretation

  • Function dispatch semantics

Note: This protocol is hardware-agnostic. While the CUDA-Q Realtime Host API contains implementation details for both GPU and CPU-based dispatchers, the wire format and encoding rules specified here apply universally.

RPC Header / Response

Each ring-buffer slot is interpreted as:

| RPCHeader | payload bytes (arg_len) | unused padding (slot_size - header - payload) |
struct RPCHeader {
  uint32_t magic;          // RPC_MAGIC_REQUEST
  uint32_t function_id;    // fnv1a_hash("handler_name")
  uint32_t arg_len;        // payload bytes following this header
  uint32_t request_id;     // caller-assigned ID, echoed in the response
  uint64_t ptp_timestamp;  // PTP send timestamp (set by sender; 0 if unused)
};

struct RPCResponse {
  uint32_t magic;          // RPC_MAGIC_RESPONSE
  int32_t  status;         // 0 = success
  uint32_t result_len;     // bytes of response payload
  uint32_t request_id;     // echoed from RPCHeader::request_id
  uint64_t ptp_timestamp;  // echoed from RPCHeader::ptp_timestamp
};

Both structs are 24 bytes, packed with no padding.

Magic values (little-endian 32-bit):

  • RPC_MAGIC_REQUEST = 0x43555152 ('CUQR')

  • RPC_MAGIC_RESPONSE = 0x43555153 ('CUQS')

Request ID Semantics

request_id is a caller-assigned opaque 32-bit value included in every request. The dispatch kernel copies it verbatim into the corresponding RPCResponse. The protocol does not interpret or constrain the value; its meaning is defined by the application.

Typical uses:

  • Shot index: The sender sets request_id to the shot number, enabling out-of-order or pipelined verification of responses.

  • Sequence number: Monotonically increasing counter for detecting lost or duplicated messages.

  • Unused: Set to 0 when not needed. The dispatcher echoes it regardless.

The dispatcher echoes request_id in all dispatch paths (cooperative, regular, and graph-launch).

PTP Timestamp Semantics

ptp_timestamp is a 64-bit field carrying a Precision Time Protocol (PTP) send timestamp. It enables end-to-end latency measurement from the moment a message leaves the sender (e.g., FPGA) to the moment a response is produced.

The dispatch kernel copies ptp_timestamp verbatim from the incoming RPCHeader into the corresponding RPCResponse. Individual RPC handlers do not need to read, interpret, or propagate this field; it is handled entirely by the dispatch infrastructure.

Typical uses:

  • FPGA-injected timestamp: The FPGA writes the PTP time-of-day into ptp_timestamp just before transmitting each message. The receiver compares the echoed timestamp against the PTP clock at capture time to compute round-trip latency.

  • Software timestamp: A software sender (e.g., playback tool) may set the field to a host-side PTP or monotonic clock value for profiling.

  • Unused: Set to 0 when latency measurement is not needed. The dispatcher echoes it regardless.

The encoding is opaque to the protocol; the 64-bit value is echoed without interpretation. By convention, the field carries a PTP time-of-day in nanoseconds, but senders and receivers may agree on any encoding.

The dispatcher echoes ptp_timestamp in all dispatch paths (cooperative, regular, and graph-launch).

Function ID Semantics

function_id selects which handler the dispatcher invokes for a given RPC message. The dispatcher performs a lookup in the function table (array of function pointers + IDs) and calls the matching entry.

See the documentation for the CUDA-Q Realtime Host API for function ID hashing, handler naming, and function table registration details.

Schema and Payload Interpretation

The RPC payload is typeless on the wire. The bytes following RPCHeader are an opaque blob from the protocol’s perspective.

Payload interpretation is defined by the handler schema, which is registered in the dispatcher’s function table during setup. The schema specifies:

  • Number of arguments

  • Type and size of each argument

  • Number of return values

  • Type and size of each return value

Out-of-band contract: The client (e.g., FPGA) firmware and dispatcher function table must agree on the schema for each function_id. Schema mismatches are detected during integration testing, not at runtime.

For handlers with multiple arguments, the payload is a concatenation of argument data in schema order:

| RPCHeader | arg0_bytes | arg1_bytes | arg2_bytes | ... |

The dispatcher uses the schema to determine where each argument begins and ends within the payload.

Type System

Standardized payload type identifiers used in handler schemas:

enum PayloadTypeID : uint8_t {
  TYPE_UINT8           = 0x10,
  TYPE_INT32           = 0x11,
  TYPE_INT64           = 0x12,
  TYPE_FLOAT32         = 0x13,
  TYPE_FLOAT64         = 0x14,
  TYPE_ARRAY_UINT8     = 0x20,
  TYPE_ARRAY_INT32     = 0x21,
  TYPE_ARRAY_FLOAT32   = 0x22,
  TYPE_ARRAY_FLOAT64   = 0x23,
  TYPE_BIT_PACKED      = 0x30   // Bit-packed data (LSB-first)
};

Schema type descriptor (see the documentation for the CUDA-Q Realtime Host API for full definition):

struct cudaq_type_desc_t {
  uint8_t  type_id;       // PayloadTypeID value
  uint8_t  reserved[3];
  uint32_t size_bytes;    // Total size in bytes
  uint32_t num_elements;  // Interpretation depends on type_id
};

The num_elements field interpretation:

  • Scalar types (TYPE_UINT8, TYPE_INT32, etc.): unused, set to 1

  • Array types (TYPE_ARRAY_*): number of array elements

  • TYPE_BIT_PACKED: number of bits (not bytes)

Note: For arbitrary binary data or vendor-specific formats, use TYPE_ARRAY_UINT8.

Encoding rules:

  • All multi-byte integers: little-endian

  • Floating-point: IEEE 754 format

  • Arrays: tightly packed elements (no padding)

  • Bit-packed data: LSB-first within each byte, size_bytes = ceil(num_elements / 8)

Payload Encoding

The payload contains the argument data for the handler function. The encoding depends on the argument types specified in the handler schema.

Single-Argument Payloads

For handlers with one argument, the payload contains the argument data directly:

| RPCHeader | argument_bytes |

Multi-Argument Payloads

For handlers with multiple arguments, arguments are concatenated in schema order with no padding or delimiters:

| RPCHeader | arg0_bytes | arg1_bytes | arg2_bytes | ... |

The schema specifies the size of each argument, allowing the dispatcher to compute offsets.

Size Constraints

The total payload must fit in a single ring-buffer slot:

total_size = sizeof(RPCHeader) + arg_len ≤ slot_size
max_payload_bytes = slot_size - sizeof(RPCHeader)

Encoding Examples

Example 1: Handler with signature void process(int32_t count, float threshold)

Schema:

  • arg0: TYPE_INT32, 4 bytes

  • arg1: TYPE_FLOAT32, 4 bytes

Wire encoding:

Offset | Content
-------|--------
0-23   | RPCHeader { magic, function_id, arg_len=8, request_id, ptp_timestamp }
24-27  | count (int32_t, little-endian)
28-31  | threshold (float, IEEE 754)

Example 2: Handler with signature void decode(const uint8_t* bits, uint32_t num_bits)

Schema:

  • arg0: TYPE_BIT_PACKED, size_bytes=16, num_elements=128

  • arg1: TYPE_UINT32, size_bytes=4, num_elements=1

Wire encoding:

Offset | Content
-------|--------
0-23   | RPCHeader { magic, function_id, arg_len=20, request_id, ptp_timestamp }
24-39  | bits (bit-packed, LSB-first, 128 bits)
40-43  | num_bits=128 (uint32_t, little-endian)

Bit-Packed Data Encoding

For TYPE_BIT_PACKED arguments:

  • Bits are packed LSB-first within each byte

  • Payload length: size_bytes = ceil(num_elements / 8) bytes

  • The schema specifies both size_bytes (storage) and num_elements (actual bit count)

Example for 10 bits (size_bytes=2, num_elements=10):

bits:    b0 b1 b2 b3 b4 b5 b6 b7 b8 b9
byte[0]: b0 b1 b2 b3 b4 b5 b6 b7   (LSB-first)
byte[1]: b8 b9 0  0  0  0  0  0    (unused bits set to zero)

The handler can use num_elements from the schema to determine how many bits are valid, avoiding the need to pass bit count as a separate argument (though some handlers may still choose to do so for flexibility).

Use case: TYPE_BIT_PACKED is suitable for binary measurements where each measurement result is 0 or 1 (1 bit per measurement).

Multi-Bit Measurement Encoding

For applications requiring richer measurement data (e.g., soft readout, leakage detection), use array types instead of TYPE_BIT_PACKED:

4-bit soft readout (confidence values 0-15):

Use TYPE_ARRAY_UINT8 with custom packing (2 measurements per byte):

  • Schema: TYPE_ARRAY_UINT8, size_bytes = ceil(num_measurements / 2), num_elements = num_measurements

  • Encoding: Low nibble = measurement[0], high nibble = measurement[1], etc.

8-bit soft readout (confidence values 0-255):

Use TYPE_ARRAY_UINT8 with one byte per measurement:

  • Schema: TYPE_ARRAY_UINT8, size_bytes = num_measurements, num_elements = num_measurements

  • Encoding: byte[i] = measurement[i]

Floating-point confidence values:

Use TYPE_ARRAY_FLOAT32:

  • Schema: TYPE_ARRAY_FLOAT32, size_bytes = num_measurements × 4, num_elements = num_measurements

  • Encoding: IEEE 754 single-precision floats, tightly packed

Leakage/erasure-resolving readout (values beyond binary):

Use TYPE_ARRAY_UINT8 or TYPE_ARRAY_INT32 depending on the range of measurement outcomes (e.g., 0=ground, 1=excited, 2=leakage state).

Response Encoding

The response is written to the TX ring buffer slot (separate from the RX buffer that contains the request):

| RPCResponse | result_bytes |

Like the request payload, the response payload encoding is defined by the handler schema. The schema’s results[] array specifies the type and size of each return value.

Single-Result Response

For handlers returning one value, the result is written directly after the response header.

Example response for a handler returning a single uint8_t:

Schema:

  • result0: TYPE_UINT8, size_bytes=1, num_elements=1

Wire encoding:

Offset | Content                                    | Value (hex)
-------|--------------------------------------------|--------------
0-3    | magic (RPC_MAGIC_RESPONSE)                 | 53 51 55 43
4-7    | status (0 = success)                       | 00 00 00 00
8-11   | result_len                                 | 01 00 00 00
12-15  | request_id (echoed from request)           | XX XX XX XX
16-23  | ptp_timestamp (echoed from request)        | XX XX XX XX XX XX XX XX
24     | result value (uint8_t)                     | 03
25-... | unused padding                             | XX XX XX XX

Multi-Result Response

For handlers returning multiple values, results are concatenated in schema order (same pattern as multi-argument requests):

| RPCResponse | result0_bytes | result1_bytes | ... |

Example: Handler returning correction (uint8_t) + confidence (float)

Schema:

  • result0: TYPE_UINT8, size_bytes=1, num_elements=1

  • result1: TYPE_FLOAT32, size_bytes=4, num_elements=1

Wire encoding:

Offset | Content
-------|--------
0-23   | RPCResponse { magic, status=0, result_len=5, request_id, ptp_timestamp }
24     | correction (uint8_t)
25-28  | confidence (float32, IEEE 754)

Status Codes

  • status = 0: Success

  • status > 0: Handler-specific error

  • status < 0: Protocol-level error

QEC-Specific Usage Example

This section shows how the realtime messaging protocol is used for quantum error correction (QEC) decoding. This is one application of the protocol; other use cases follow the same pattern.

QEC Terminology

In QEC applications, the following terminology applies:

  • Measurement result: Raw readout value from a QPU measurement (0 or 1 for binary readout)

  • Detection event: XOR’d measurement results as dictated by the parity check (stabilizer) matrix

  • Syndrome: The full history or set of detection events used by the decoder

The decoder consumes detection events (often called “syndrome data” colloquially) and produces corrections.

QEC Decoder Handler

Typical QEC decoder signature:

void qec_decode(const uint8_t* detection_events, uint32_t num_events,
                uint8_t* correction);

Schema:

  • arg0: TYPE_BIT_PACKED, variable size (detection events, 1 bit per event)

  • arg1: TYPE_UINT32, 4 bytes (number of detection events)

  • result0: TYPE_UINT8, 1 byte (correction bit-packed)

Decoding Rounds

For QEC applications, one RPC message typically corresponds to one decoding round (one invocation of the decoder with a set of detection events). The boundaries of each decoding round are determined by the quantum control system (e.g., FPGA) when building RPC messages.

Note: The term “shot” is often used in quantum computing to mean one full execution of a quantum program (repeated num_shots times for statistics). In the context of realtime decoding, we use “decoding round” to avoid confusion, as there may be many RPC invocations during a single quantum program execution.