← Back to digest

Pauli Correlation Encoding for mRNA Secondary Structure Prediction: Problem-Aware Decoding for Dense-Constraint QUBOs

Triet Friedhoff, Mihir Metkar, Wade Davis, Vaibhaw Kumar, Alexey Galda (Moderna / IBM Quantum) · arXiv:2605.20163 · submitted 19 May 2026 · score 8/10 (HIGH)

Abstract

Pauli Correlation Encoding (PCE) compresses m binary variables onto n = O(m1/k) qubits for a tunable compression order k ≥ 2 by mapping them to commuting Pauli correlators, but the continuous expectation values it produces must be decoded into feasible binary solutions, a challenge that becomes acute for problems with dense constraints. We apply PCE to the mRNA secondary structure prediction problem, formulated as a densely-constrained QUBO. We train throughout with a QUBO-space sigmoid loss that preserves the QUBO penalty structure directly. For decoding, we introduce the Problem-Aware Guided Decoder (PAGD), which scores candidate variable commitments by the product of their marginal QUBO energy reduction and a trained expectation-value prior, with constraint-aware feasibility pruning at each step. On six benchmark mRNA sequences (30–60 nt, 50–240 variables, 7–14 qubits), PAGD with 100 restarts achieves 75–100% near-optimal recovery, defined as P(gap<1%), for sequences up to m=152 variables, vs. 0–30% for the sign-rounding+local-search baseline. We demonstrate that PCE-trained expectation values provide a useful prior. Hardware-scale tests extend the pipeline to three 102–105 nt instances (694–745 variables, ~190 000 pair constraints, 23 qubits) on IBM Heron processors. The circuits transpile SWAP-free into 480 native two-qubit gates at depth 256, and PAGD-decoded gaps on QPU runs match or beat simulator means for all three instances, including exact CPLEX-optimum recovery for one sequence.

Executive summary

Friedhoff et al. apply Pauli Correlation Encoding (PCE) — a polynomial-compression scheme mapping m classical bits to ~m1/k qubits via commuting Pauli correlators — to densely constrained QUBOs, specifically mRNA secondary structure prediction. PCE's central pain point is decoding: continuous expectation values must be rounded to feasible binary strings, which is hopeless under dense pairwise constraints. Their contribution is a Problem-Aware Guided Decoder (PAGD) that interleaves marginal-energy reduction with feasibility pruning, plus a QUBO-space sigmoid loss that preserves the penalty structure during training. The pipeline scales: 745-variable mRNA folds on 23 qubits of an IBM Heron device, transpiled SWAP-free to a heavy-hex subgraph, and hardware results match simulator on all three biological-scale sequences. For Yuan, this is a parallel to Y2 in spirit (compressed encoding for hard-constrained binary optimisation) but a substantially different mechanism — and a Y6-class hardware demonstration on Heron.

Main contribution

Three pieces. (i) A QUBO-space sigmoid loss: rather than training PCE expectation values in Ising space and post-hoc decoding, they push the loss into QUBO space directly by mapping each Pauli correlator through a sigmoid into a [0,1] continuous relaxation of the binary variable, then evaluating the QUBO penalty exactly on that relaxation. This preserves the penalty structure (rather than smearing it through a hyperbolic-tangent change of variables). (ii) A Problem-Aware Guided Decoder (PAGD): at each step, score every still-unfixed variable's commitment to 0 or 1 by the product of its trained expectation-value prior and the marginal QUBO energy reduction it would provide, with hard pruning of choices that violate dense constraints. (iii) Hardware-aware Informed-k ansatz topology: choose entangling-pair edges by ranking QUBO-importance pairs and routing the resulting maximum spanning tree onto the device's heavy-hex coupling map, padding with native heavy-hex edges. Result: SWAP-free transpilation to 480 native two-qubit gates at depth 256 on 23 qubits, end-to-end deployable on IBM Heron.

Key results / experimental protocol

Detailed walkthrough

The setup (Sec. Introduction). mRNA secondary structure prediction is formulated as a Sankoff-style QUBO with O(m²) pairing variables and dense constraints (no two pairs may share a base, no pairs may cross). Densely constrained QUBOs are the worst case for any naive expectation-value decoder: a small expectation error per variable compounds into infeasibility almost surely.

PCE compression and the decoding problem (Sec. Methods). PCE maps binary variable xi to an expectation ⟨Pi⟩ of a commuting Pauli correlator. With k qubits per "block", up to ~3k commuting correlators can be defined. Training optimises the circuit parameters so that the correlator expectations approximately match the optimal binary assignment. The hard part is decoding: a continuous [-1,1] expectation does not directly yield a feasible bitstring under dense constraints. The naïve sign-rounding baseline almost always lands infeasible.

Loss in QUBO space (Sec. Training Loss). Rather than the standard Ising-space penalty (which sees the encoded variables as continuous Ising spins and applies the quadratic cost), the authors apply a sigmoid σ to each correlator to push it onto [0,1] (a relaxation of the binary variable), and evaluate the QUBO penalty exactly on this [0,1] relaxation. This means soft-feasibility violations are surfaced during training, not just at decoding. A theoretical analysis shows the QUBO-space loss preserves the penalty structure (sharp barriers around feasibility) while the Ising-space loss flattens them.

The Problem-Aware Guided Decoder (Sec. Decoding Schemes). Greedy with feasibility lookahead. Start from the fully unfixed problem. At each step, for each (unfixed variable i, candidate value v ∈ {0,1}) pair, compute score(i,v) = prior(i,v) × ΔEQUBO(i,v) where prior comes from the trained expectation, and ΔEQUBO is the marginal QUBO-energy reduction. Then check feasibility — if committing xi=v would force a constraint violation under any feasible completion, prune; otherwise commit greedily. Restart K times from different random orderings/break-ties to get an empirical decoded-gap distribution.

Ansatz design (Sec. Methods — Problem-Informed Ansatz). Layerwise parameter sharing keeps the parameter count at 2p+1 regardless of qubit number, making COBYLA optimisation tractable. The Informed-k topology connects the top-k QUBO-importance pairs, picking up the most relevant problem structure with the fewest entangling gates. Hardware-aware variant routes this onto the device's heavy-hex coupling map: 22 of 22 informed pairs land natively, 2 padding edges add unavoidable heavy-hex constraints.

Results — simulator (Sec. Results). Across 20 seeds per configuration, the PAGD + Informed-k + layerwise pipeline outperforms every ablation by a wide margin. The improvement comes from (a) the QUBO-space loss training, (b) the PAGD feasibility pruning, and (c) the multi-restart aggregation. Fig. 4 isolates each contribution; trained EV is the single largest factor.

Results — hardware (Sec. Hardware-Scale Results). On three real mRNA sequences (102–105 nt, 694–745 variables, 23 qubits on ibm_aachen), the QPU samples are decoded by PAGD and yield gaps that match or beat the simulator's mean for all three instances, with exact CPLEX optimum recovered on one. The SWAP-free transpilation is achieved by aligning the Informed-k spanning tree to the heavy-hex coupling map. Depth is 256, well within Heron's coherence budget. Fig. 6 shows the 23-qubit subgraph used; Fig. 7 shows the lowest-energy folds for all benchmark sequences.

Figures

Figure 1
Figure 1. Circuit ansatz under the layerwise parameterization: a single R_y angle θ(ℓ) shared across all qubits within layer ℓ, and a single MS angle φ(ℓ) shared across all entangling pairs in layer ℓ — 2p+1 trainable parameters at depth p. NN-topology shown.
Figure 2
Figure 2. Ansatz-topology comparison across four benchmark sequences at PAGD-K10 (layerwise, 20 seeds per topology). Shaded background separates p=2, p=4, p=6 regions. Four topologies compared: NN, Informed-k (top-k QUBO-importance pairs), Informed-2k, and the hardware-aware variant.
Figure 3
Figure 3. Circuit-depth p selection across the six benchmark sequences (PCE, Informed-k ansatz, layerwise, 20 seeds per (m,p)). Each subplot is one size class; normalised median PAGD-K10 decoded gap as a function of p.
Figure 4
Figure 4. Decoder comparison across six benchmark sequences. Each sequence evaluated at the optimal depth p for its size class. (a) Near-optimal recovery rate P(gap<1%) with Wilson 95% CI. (b) Median optimality gap with IQR. PAGD-K100 dramatically outperforms the Sign+LS baseline.
Figure 5
Figure 5. P(gap<1%) versus restart count K ∈ {1, 10, 50, 100, 200} on seq_240 (m=240, n=14 qubits, 20 seeds, layerwise, Informed-k, p=6). Three EV conditions: PCE-trained, untrained circuit, random EV.
Figure 6
Figure 6. Hardware-aware Informed-k ansatz on the 23-qubit subgraph used for seq_745 on ibm_aachen, drawn on a fragment of the surrounding heavy-hex device (grey). Node labels are physical qubit indices. Solid blue: 22 native pairs of the QUBO-importance maximum spanning tree. Solid red: 2 remaining native heavy-hex edges within the subgraph.
Figure 7
Figure 7. Lowest-energy mRNA secondary-structure folds for all nine benchmark sequences. Top two rows: simulator-evaluated m ∈ {50, 80, 120, 152, 195, 240}. Bottom row: m ∈ {694, 715, 745} (≈100 nt each) — also deployed on IBM Heron QPUs.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan