Efficient Fourier-Based Linear Combination of Unitaries and Applications in Quantum Optimization

Almudena Carrera Vazquez, Daniel J. Egger, Stefan Woerner (IBM Quantum Zurich) · arXiv:2605.18985 · submitted 18 May 2026 · score 9/10 (HIGH)

Abstract

We investigate ancilla-free linear combination of unitaries (LCU) as a framework for approximating complex quantum circuits. This is particularly effective for quantum optimization algorithms, where candidate solutions can be evaluated classically and the task is to sample high-quality bitstrings rather than reproduce the full output distribution. We show that Fourier-based LCU constructions efficiently decompose broad classes of diagonal and non-diagonal unitaries, replacing highly connected qubit interactions with single-qubit gate layers or significantly simpler structures at the cost of a polynomial sampling overhead. Applied to algorithms such as QAOA, this yields efficient, hardware-friendly decompositions of, for instance, cardinality-constraint penalties and the fully connected XY-mixer, while maintaining rigorous performance guarantees compared to fully coherent implementations. Furthermore, we establish a formal connection between Fourier-based quantum penalties and Lagrangian relaxation, offering a unified perspective on constraint handling. We validate our approach using exact statevector simulations of 12-qubit circuits and large-scale experiments on 106 superconducting qubits.

Executive summary

Carrera Vazquez, Egger, and Woerner (IBM) present an ancilla-free Fourier-based LCU approximation for quantum optimization that recasts highly-connected operators (cardinality penalties, fully-connected XY mixers) into a sum of much simpler unitaries that are sampled classically run-by-run rather than coherently summed via an ancilla. The scheme exchanges circuit depth/connectivity for a polynomial sampling overhead and comes with rigorous performance guarantees relative to the coherent QAOA. They run a 106-qubit densest-k-subgraph (DkS) experiment on an IBM Heron-class device and a 12-qubit benchmark with both penalty and XY-mixer constructions. For Yuan, this is one of the most directly relevant papers of the cycle: the cardinality-constraint penalty + XY mixer combination is exactly the constraint regime of Y2 (quasi-binary portfolio) and Y4 (Grover for cardinality), and the IBM 106-qubit demonstration shares the noise regime of Y3.

Main contribution

The paper formalises an ancilla-free LCU sampler: a target unitary U is written as a (signed) convex combination U = Σ_j α_j V_j of cheap basis circuits, and one stochastically draws V_j per shot, post-processing measurement outcomes by classical re-weighting. They build a Fourier-LCU library: diagonal unitaries diag(e^iφ(x)) get truncated Fourier series in the bit-pattern, and non-diagonal permutation-invariant unitaries (e.g. XY mixers acting on a Hamming-weight sector) decompose via SU(2) Fourier modes. The penalty version replaces costly all-to-all R_ZZ ladders for the (Σx_i−k)² cardinality term by Z-only LCU layers; the XY-mixer version replaces the fully-connected complete-bipartite XY graph by O(N) R_ZR_Y circuits. The sampling overhead is controlled by the "1-norm" of the LCU coefficients Γ_U, which they bound polynomially. They also prove a formal correspondence between the Fourier-penalty parameter and Lagrangian-relaxation multipliers — i.e. each shot effectively realises a different Lagrangian relaxation, and the average reproduces a soft constraint.

Key theorems / lemmas / algorithms

LCU sampling lemma: for U = Σ_j α_j V_j, sampling V_j with probability ∝ |α_j| and re-weighting by sign(α_j) gives an unbiased estimator of any diagonal observable expectation with variance overhead Γ_U² = (Σ|α_j|)² (Sec. 2, ancilla-free LCU).
Fourier decomposition of diagonal cost unitaries (Sec. 3): truncating the Fourier series of e^iγ·φ(x) at degree D gives an LCU with D+1 basis unitaries, each consisting of single-qubit R_Z rotations only — replacing all-to-all ZZ ladders.
Fourier-LCU for permutation-invariant non-diagonal unitaries (Sec. 4): explicit construction for the fully connected XY-mixer on the Hamming-weight-k sector — basis circuits consist of single-qubit R_Y/R_Z rotations after a Hadamard layer.
Constraint–Lagrangian correspondence (Sec. 5): each Fourier coefficient corresponds to a Lagrangian multiplier; averaging the per-shot Lagrangian-relaxed problem reproduces the original constrained QAOA expectation.
Performance theorem: for sampling tasks, the LCU sampler with Γ_U sampling overhead inherits the coherent QAOA's approximation-ratio guarantee with a multiplicative factor 1/Γ_U on tail probabilities (App. C).

Detailed walkthrough

Setup (Sec. 2). The authors observe that QAOA's job is not to reproduce a coherent output state — only to sample good bitstrings. So a unitary U only needs to be implemented in a way that yields the same diagonal measurement statistics, which is exactly what an LCU+stochastic post-selection accomplishes without an ancilla. Concretely: pick V_j with probability |α_j|/Γ, run, measure bitstring x, and weight the empirical estimator by sign(α_j). The variance overhead is Γ², making this a depth-vs-shots trade.

Diagonal Fourier (Sec. 3). A penalty term λ·(Σ_ix_i−k)² generates a unitary that, in the computational basis, has phases φ(x) = λ(s−k)² where s is the Hamming weight. They write e^iγφ as a Fourier series in s, truncate at degree D, and obtain an LCU whose basis circuits are products of single-qubit R_Z(γ_d) — no two-qubit gates at all. Γ scales polynomially in D, and D required for fixed accuracy scales gracefully with system size. This is the central technical move: a hard, all-to-all penalty becomes a stack of single-qubit gates, at the cost of polynomially more shots.

XY-mixer Fourier (Sec. 4 + App. D). The hard XY-mixer constructed from the fully connected XY graph preserves Hamming weight (the cardinality-constraint feasible space) but normally requires O(N²) ladder of R_XX+YY gates. The authors give a Fourier-on-SU(2) decomposition that realises the same mixer on the symmetric subspace using O(N) basis circuits of single-qubit R_Y/R_Z after a column of Hadamards. They handle initialisation in the Hamming-weight-k sector via a warm-started Dicke-state preparation (citing Egger 2021).

Lagrangian unification (Sec. 5). A surprising structural result: in expectation, the Fourier-LCU penalty is equivalent to averaging over a family of Lagrangian relaxations of the constrained problem, each with a different multiplier drawn from the Fourier coefficients. So the sampler simultaneously explores a continuum of penalty strengths — a form of automatic Lagrange-multiplier search.

Numerical validation — 12 qubits (Sec. 6.1–6.2). Densest-k-subgraph (DkS, k=4) on a 3-regular 12-node graph. Three experiments per construction: full coherent, truncated LCU, and a comparison using CVaR sampling. Distributions of objective values and of feasible-only subsets show the LCU's Γ-scaled distribution overlaps the coherent distribution remarkably tightly — confirming the theory empirically. The XY-mixer variant restricts samples to the cardinality-feasible sector by construction; the penalty version uses an Egger-style warm-started initialisation followed by the LCU penalty layer.

Hardware — 106 qubits on heavy-hex (Sec. 6.3 + App. F). A 5×3 SWAP-extended heavy-hex graph yields a 106-node DkS instance solved on an IBM Heron-class device. They compare CVaR distributions across ten repetitions with the coherent analytical expectation and CPLEX optimum. The LCU approach reaches CVaR within a small factor of the coherent expectation and finds high-quality bitstrings — concretely demonstrating that the depth reduction translates to noise-robustness on real hardware. The best found solution on hardware (highlighted on the device graph in Fig. 8) is very close to CPLEX's optimum (Fig. 9).

Appendices. A: Hilbert-space decomposition justifying the symmetric-sector restriction. B: continuous LCU on SU(2). C: bounding Γ_U. D: full XY-mixer LCU construction. E: additional applications — Erdős–Rényi MaxCut, skewed Sherrington–Kirkpatrick, Maximum Independent Set with indicator penalties, and index tracking with risk constraints (essentially a portfolio-style problem). The appendix demonstrating index tracking with risk constraints is unusually relevant — it brings the LCU technology directly into the portfolio-optimisation regime (Y2/Y3).

Figures

Figure 1. The considered 3-regular 12-node graph and corresponding optimal subgraph (highlighted in green) for k=4 (densest-k-subgraph benchmark used throughout the 12-qubit simulations).

Figure 2. 12-node densest-k-subgraph simulations with the penalty-LCU construction. Left: full sample distribution of objective values across experiments (clipped to [−10, 4]); Right: same restricted to feasible (cardinality-k) samples. The "Exp.1/Γ₂" trace shows the coherent distribution scaled by the LCU sampling overhead.

Figure 3. 12-node DkS simulations with the XY-mixer-LCU construction (warm-started Dicke initialisation). Same panel structure as Fig. 2. The XY mixer automatically restricts samples to the Hamming-weight-k sector.

Figure 4. Cumulative distribution functions of CVaR values and best found solutions across ten repetitions for each of the three hardware experiments on the 106-node SWAP-extended heavy-hex graph. Analytical coherent-penalty-QAOA expectation and CPLEX optimum shown for reference.

Figure 5. 106-node problem graph (5×3 heavy-hex grid + three SWAP layers). Green nodes/edges: best found solution on the IBM quantum hardware.

Figure 6. Same 106-node graph as Fig. 5; green nodes/edges show an optimal solution found by CPLEX (classical reference).

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Y1 (warm-started iterative QAOA): the paper uses warm-started initial states (cited as Egger 2021) for both penalty and XY-mixer constructions — same Egger-school warm-start idea Yuan extends in Y1. The LCU technique is complementary to iterative warm-starting and can be composed with it.
Y2 (quasi-binary encoding + hard XY-mixer for portfolio): directly parallel — Y2 builds hard mixers that preserve cardinality without penalties; this paper gives an alternative compilation of the same fully-connected XY hard mixer that drastically reduces two-qubit gate count. Combining Y2's quasi-binary encoding with this paper's Fourier-LCU mixer is a concrete proposal worth attempting.
Y3 (DGMVP portfolio QAOA, layerwise opt, shot-noise scaling): the appendix on index tracking with risk constraints is essentially a portfolio-tracking problem; the LCU sampling overhead language fits naturally with Y3's shot-noise-scaling framework. The 106-qubit hardware demonstration is in the same noise regime Y3 analyses.
Y4 (Grover + ADMM for cardinality-constrained BO): shares the cardinality-constraint scope but uses LCU sampling instead of Grover amplification. Yuan's ADMM hybrid splits problem structure differently — these are complementary depth-vs-shots trade-offs.
Y5 (Pauli-sparse Gibbs states, SDP dequantisation): indirect — both papers exploit structural sparsity (Fourier decomposition vs. Pauli sparsity) to reduce coherent resources at the cost of classical sampling.
Y6 (PBR on Heron2): shared hardware platform (Heron-class IBM device, 106 qubits) and shared noise-regime understanding.

Recommended action for Yuan

Cite in next portfolio/QAOA draft: this is the most direct competitor to Y2's quasi-binary + hard-mixer approach as a constraint-handling technique on near-term hardware; benchmarking the two side-by-side on the DGMVP instances of Y3 would make a strong follow-up paper.
Read App. E "Index tracking with risk constraints" carefully: it is a portfolio-style problem and Yuan's CVaR-QAOA + DGMVP machinery is the natural comparison baseline; consider whether the LCU sampling overhead actually beats Y2/Y3's resource counts.
Email Egger/Woerner: they are explicit collaborators of the warm-start school Yuan builds on (Y1); a discussion combining iterative warm-starting (Y1) with Fourier-LCU XY-mixer compilation would be mutually useful.