- class trt_decoder
A GPU-accelerated quantum error correction decoder based on NVIDIA TensorRT. This decoder leverages TensorRT’s optimized inference engine to perform fast neural network-based decoding of quantum error correction syndromes.
The TRT decoder supports loading pre-trained neural network models in ONNX format or directly loading pre-built TensorRT engine files for maximum performance. It automatically optimizes the model for the target GPU architecture and supports various precision modes (FP16, BF16, INT8, FP8) to balance accuracy and speed.
Neural network-based decoders can be trained to perform syndrome decoding for specific quantum error correction codes and noise models. The TRT decoder provides a high-performance inference engine for these models, with automatic CUDA graph optimization for reduced latency.
Requires a CUDA-capable GPU and TensorRT installation. See the CUDA-Q GPU Compatibility List for a list of valid GPU configurations.
Note
It is required to create decoders with the
get_decoderAPI from the CUDA-QX extension points API, such asimport cudaq_qec as qec import numpy as np # Create a simple parity check matrix (not used by the TRT decoder) H = np.array([[1, 0, 0, 1, 0, 1, 1], [0, 1, 0, 1, 1, 0, 1], [0, 0, 1, 0, 1, 1, 1]], dtype=np.uint8) # Option 1: Load from ONNX model (builds TRT engine) trt_dec = qec.get_decoder('trt_decoder', H, onnx_load_path='model.onnx', precision='fp16', engine_save_path='model.engine') # Option 2: Load pre-built TRT engine (faster startup) trt_dec = qec.get_decoder('trt_decoder', H, engine_load_path='model.engine')
#include "cudaq/qec/decoder.h" std::size_t block_size = 7; std::size_t syndrome_size = 3; cudaqx::tensor<uint8_t> H; // Create a simple parity check matrix (not used by the TRT decoder) std::vector<uint8_t> H_vec = {1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1}; H.copy(H_vec.data(), {syndrome_size, block_size}); // Option 1: Load from ONNX model (builds TRT engine) cudaqx::heterogeneous_map params1; params1.insert("onnx_load_path", "model.onnx"); params1.insert("precision", "fp16"); params1.insert("engine_save_path", "model.engine"); auto trt_dec1 = cudaq::qec::get_decoder("trt_decoder", H, params1); // Option 2: Load pre-built TRT engine (faster startup) cudaqx::heterogeneous_map params2; params2.insert("engine_load_path", "model.engine"); auto trt_dec2 = cudaq::qec::get_decoder("trt_decoder", H, params2);
Note
The
"trt_decoder"implements thecudaq_qec.Decoderinterface for Python and thecudaq::qec::decoderinterface for C++, so it supports all the methods in those respective classes.Note
The parity check matrix
His not used by the TRT decoder. The neural network model encodes the decoding logic, so the parity check matrix is only required to satisfy the decoder interface. You can pass any valid parity check matrix of appropriate dimensions.Note
Batch Processing: The TRT decoder automatically handles batch size optimization. Models trained with batch_size > 1 will receive zero-padded inputs when using
decode()on a single syndrome. When usingdecode_batch(), provide syndromes in multiples of the model’s batch size for optimal performance.- Parameters:
H – Parity check matrix (tensor format). Note: This parameter is not used by the TRT decoder but is required by the decoder interface.
params –
Heterogeneous map of parameters:
Required (choose one):
onnx_load_path(string): Path to ONNX model file. The decoder will build a TensorRT engine from this model. Cannot be used together withengine_load_path.engine_load_path(string): Path to pre-built TensorRT engine file. Provides faster initialization since the engine is already optimized. Cannot be used together withonnx_load_path.
Optional:
engine_save_path(string): Path to save the built TensorRT engine. Only applicable when usingonnx_load_path. Saving the engine allows for faster initialization in subsequent runs by usingengine_load_path.precision(string): Precision mode for inference (defaults to “best”). Valid options:”fp16”: Use FP16 (half precision) - good balance of speed and accuracy
”bf16”: Use BF16 (bfloat16) - available on newer GPUs (Ampere+)
”int8”: Use INT8 quantization - fastest but requires calibration
”fp8”: Use FP8 precision - available on Hopper GPUs
”tf32”: Use TensorFloat-32 - available on Ampere+ GPUs
”noTF32”: Disable TF32 and use standard FP32
”best”: Let TensorRT automatically choose the best precision (default)
Note: If the requested precision is not supported by the hardware, the decoder will fall back to FP32 with a warning.
memory_workspace(size_t): Memory workspace size in bytes for TensorRT engine building (defaults to 1GB = 1073741824 bytes). Larger workspaces may allow TensorRT to explore more optimization strategies.use_cuda_graph(bool): Enable CUDA graph optimization for improved performance (defaults to True). CUDA graphs capture inference operations and replay them with reduced kernel launch overhead, providing ~20% speedup. The optimization is applied automatically on the first decode call. Automatically disabled for models with dynamic shapes or multiple optimization profiles. Set to False to force traditional execution path.batch_size(automatic): The decoder automatically detects the model’s batch size from the first input dimension. For models with batch_size > 1, thedecode()method automatically zero-pads single syndromes to fill the batch. Thedecode_batch()method requires the number of syndromes to be an integral multiple of the model’s batch size.