← Back to digest

Efficient Fourier-Based Linear Combination of Unitaries and Applications in Quantum Optimization

Almudena Carrera Vazquez, Daniel J. Egger, Stefan Woerner (IBM Quantum Zurich) · arXiv:2605.18985 · submitted 18 May 2026 · score 9/10 (HIGH)

Abstract

We investigate ancilla-free linear combination of unitaries (LCU) as a framework for approximating complex quantum circuits. This is particularly effective for quantum optimization algorithms, where candidate solutions can be evaluated classically and the task is to sample high-quality bitstrings rather than reproduce the full output distribution. We show that Fourier-based LCU constructions efficiently decompose broad classes of diagonal and non-diagonal unitaries, replacing highly connected qubit interactions with single-qubit gate layers or significantly simpler structures at the cost of a polynomial sampling overhead. Applied to algorithms such as QAOA, this yields efficient, hardware-friendly decompositions of, for instance, cardinality-constraint penalties and the fully connected XY-mixer, while maintaining rigorous performance guarantees compared to fully coherent implementations. Furthermore, we establish a formal connection between Fourier-based quantum penalties and Lagrangian relaxation, offering a unified perspective on constraint handling. We validate our approach using exact statevector simulations of 12-qubit circuits and large-scale experiments on 106 superconducting qubits.

Executive summary

Carrera Vazquez, Egger, and Woerner (IBM) present an ancilla-free Fourier-based LCU approximation for quantum optimization that recasts highly-connected operators (cardinality penalties, fully-connected XY mixers) into a sum of much simpler unitaries that are sampled classically run-by-run rather than coherently summed via an ancilla. The scheme exchanges circuit depth/connectivity for a polynomial sampling overhead and comes with rigorous performance guarantees relative to the coherent QAOA. They run a 106-qubit densest-k-subgraph (DkS) experiment on an IBM Heron-class device and a 12-qubit benchmark with both penalty and XY-mixer constructions. For Yuan, this is one of the most directly relevant papers of the cycle: the cardinality-constraint penalty + XY mixer combination is exactly the constraint regime of Y2 (quasi-binary portfolio) and Y4 (Grover for cardinality), and the IBM 106-qubit demonstration shares the noise regime of Y3.

Main contribution

The paper formalises an ancilla-free LCU sampler: a target unitary U is written as a (signed) convex combination U = Σj αj Vj of cheap basis circuits, and one stochastically draws Vj per shot, post-processing measurement outcomes by classical re-weighting. They build a Fourier-LCU library: diagonal unitaries diag(eiφ(x)) get truncated Fourier series in the bit-pattern, and non-diagonal permutation-invariant unitaries (e.g. XY mixers acting on a Hamming-weight sector) decompose via SU(2) Fourier modes. The penalty version replaces costly all-to-all RZZ ladders for the (Σxi−k)² cardinality term by Z-only LCU layers; the XY-mixer version replaces the fully-connected complete-bipartite XY graph by O(N) RZRY circuits. The sampling overhead is controlled by the "1-norm" of the LCU coefficients ΓU, which they bound polynomially. They also prove a formal correspondence between the Fourier-penalty parameter and Lagrangian-relaxation multipliers — i.e. each shot effectively realises a different Lagrangian relaxation, and the average reproduces a soft constraint.

Key theorems / lemmas / algorithms

Detailed walkthrough

Setup (Sec. 2). The authors observe that QAOA's job is not to reproduce a coherent output state — only to sample good bitstrings. So a unitary U only needs to be implemented in a way that yields the same diagonal measurement statistics, which is exactly what an LCU+stochastic post-selection accomplishes without an ancilla. Concretely: pick Vj with probability |αj|/Γ, run, measure bitstring x, and weight the empirical estimator by sign(αj). The variance overhead is Γ², making this a depth-vs-shots trade.

Diagonal Fourier (Sec. 3). A penalty term λ·(Σixi−k)² generates a unitary that, in the computational basis, has phases φ(x) = λ(s−k)² where s is the Hamming weight. They write eiγφ as a Fourier series in s, truncate at degree D, and obtain an LCU whose basis circuits are products of single-qubit RZd) — no two-qubit gates at all. Γ scales polynomially in D, and D required for fixed accuracy scales gracefully with system size. This is the central technical move: a hard, all-to-all penalty becomes a stack of single-qubit gates, at the cost of polynomially more shots.

XY-mixer Fourier (Sec. 4 + App. D). The hard XY-mixer constructed from the fully connected XY graph preserves Hamming weight (the cardinality-constraint feasible space) but normally requires O(N²) ladder of RXX+YY gates. The authors give a Fourier-on-SU(2) decomposition that realises the same mixer on the symmetric subspace using O(N) basis circuits of single-qubit RY/RZ after a column of Hadamards. They handle initialisation in the Hamming-weight-k sector via a warm-started Dicke-state preparation (citing Egger 2021).

Lagrangian unification (Sec. 5). A surprising structural result: in expectation, the Fourier-LCU penalty is equivalent to averaging over a family of Lagrangian relaxations of the constrained problem, each with a different multiplier drawn from the Fourier coefficients. So the sampler simultaneously explores a continuum of penalty strengths — a form of automatic Lagrange-multiplier search.

Numerical validation — 12 qubits (Sec. 6.1–6.2). Densest-k-subgraph (DkS, k=4) on a 3-regular 12-node graph. Three experiments per construction: full coherent, truncated LCU, and a comparison using CVaR sampling. Distributions of objective values and of feasible-only subsets show the LCU's Γ-scaled distribution overlaps the coherent distribution remarkably tightly — confirming the theory empirically. The XY-mixer variant restricts samples to the cardinality-feasible sector by construction; the penalty version uses an Egger-style warm-started initialisation followed by the LCU penalty layer.

Hardware — 106 qubits on heavy-hex (Sec. 6.3 + App. F). A 5×3 SWAP-extended heavy-hex graph yields a 106-node DkS instance solved on an IBM Heron-class device. They compare CVaR distributions across ten repetitions with the coherent analytical expectation and CPLEX optimum. The LCU approach reaches CVaR within a small factor of the coherent expectation and finds high-quality bitstrings — concretely demonstrating that the depth reduction translates to noise-robustness on real hardware. The best found solution on hardware (highlighted on the device graph in Fig. 8) is very close to CPLEX's optimum (Fig. 9).

Appendices. A: Hilbert-space decomposition justifying the symmetric-sector restriction. B: continuous LCU on SU(2). C: bounding ΓU. D: full XY-mixer LCU construction. E: additional applications — Erdős–Rényi MaxCut, skewed Sherrington–Kirkpatrick, Maximum Independent Set with indicator penalties, and index tracking with risk constraints (essentially a portfolio-style problem). The appendix demonstrating index tracking with risk constraints is unusually relevant — it brings the LCU technology directly into the portfolio-optimisation regime (Y2/Y3).

Figures

Figure 1
Figure 1. The considered 3-regular 12-node graph and corresponding optimal subgraph (highlighted in green) for k=4 (densest-k-subgraph benchmark used throughout the 12-qubit simulations).
Figure 2
Figure 2. 12-node densest-k-subgraph simulations with the penalty-LCU construction. Left: full sample distribution of objective values across experiments (clipped to [−10, 4]); Right: same restricted to feasible (cardinality-k) samples. The "Exp.1/Γ₂" trace shows the coherent distribution scaled by the LCU sampling overhead.
Figure 3
Figure 3. 12-node DkS simulations with the XY-mixer-LCU construction (warm-started Dicke initialisation). Same panel structure as Fig. 2. The XY mixer automatically restricts samples to the Hamming-weight-k sector.
Figure 4
Figure 4. Cumulative distribution functions of CVaR values and best found solutions across ten repetitions for each of the three hardware experiments on the 106-node SWAP-extended heavy-hex graph. Analytical coherent-penalty-QAOA expectation and CPLEX optimum shown for reference.
Figure 5
Figure 5. 106-node problem graph (5×3 heavy-hex grid + three SWAP layers). Green nodes/edges: best found solution on the IBM quantum hardware.
Figure 6
Figure 6. Same 106-node graph as Fig. 5; green nodes/edges show an optimal solution found by CPLEX (classical reference).

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan