← Back to digest

Second-Order FALQON Parameter Transfer for the Max-Cut Problem on 3-Regular Graphs

Gabriel Fernandes Thomaz, Eduarda Rodrigues Monteiro, Jerusa Marchi, Marcelo Zen Pretto, Alisson dos Passos Fumaco, Evandro Chagas Ribeiro da Rosa · arXiv:2605.04253 · 2026-05-07 (cross from cs.ET) · score 9/10

Abstract

The Feedback-based Algorithm for Quantum Optimization (FALQON) offers a deterministic alternative to variational quantum algorithms by bypassing classical optimization loops. However, maintaining convergence on large problem instances often requires restricting the time step, necessitating quantum circuit depths that exceed Noisy Intermediate-Scale Quantum (NISQ) hardware capabilities. This paper investigates the parameter transferability of second-order FALQON applied to the Max-Cut problem on 3-regular graphs. Through numerical experiments evaluating quantum circuits up to 16 layers on graphs up to 24 nodes, we demonstrate a highly advantageous scaling behavior: transferring feedback parameters optimized on small instances to larger target graphs yields significantly higher approximation ratios than natively optimizing the parameters directly on the larger graphs. This performance advantage arises because parameters trained on smaller instances can safely adopt aggressively larger time steps. By offloading the expensive parameter discovery phase to small-scale instances, this transfer strategy simultaneously reduces computational overhead and enhances the approximation ratio, thereby bringing FALQON closer to practical viability on near-term quantum architectures.

Executive summary

The authors take FALQON — a deterministic, feedback-driven cousin of QAOA in which the layer parameters βk are computed online from measured commutator expectation values rather than chosen by a classical optimiser — and show that for 3-regular Max-Cut, the (Δt, βk) sequence learned on a tiny n=6 graph transfers to graphs up to n=24, where it actually beats native re-optimisation at fixed depth (l=16). The mechanism is precisely the warm-starting intuition Yuan exploits in Y1: native optimisation on a larger instance is forced to retreat to ever-smaller Δt to preserve monotone convergence, which strangles state evolution within a fixed circuit-depth budget. The transferred parameters smuggle in a larger Δt that the small-instance landscape "safely" sanctioned, giving more useful state motion per layer. For Yuan's programme, this is the closest analogue to Y1's iterative warm-start logic that has appeared on arXiv this announce cycle.

Main contribution

For second-order FALQON applied to Max-Cut on 3-regular graphs, the authors empirically establish (i) that the optimal time step decays as Δt ≈ 0.4984 · n−0.51101/(2√n), formalising the small-step bottleneck that natively-trained 16-layer FALQON suffers as n grows; and (ii) that transferring (Δt, βk) from a small training graph (especially ntrain=6) to a larger target graph yields strictly higher approximation ratios than the native run, across all ntarget ∈ {6,…,24}, averaged over a 20×20 cross-evaluation per (ntrain, ntarget) pair. The paper reframes parameter transfer not as "good-enough re-use" but as an algorithmic improvement in its own right: it lets a depth-16 circuit access the larger-Δt regime that small instances tolerated, side-stepping the depth/Δt trade-off the second-order discretisation was designed to relax.

Key results / algorithmic ingredients

Detailed walkthrough

Setup. Section II frames Max-Cut canonically as ground-state preparation of HC = −½ Σ(i,j)∈E(1 − ZiZj) with the standard transverse-field mixer HM = Σ Xi. FALQON in its original first-order form picks βk = −⟨A⟩k with A = i[HM, HC]; this guarantees a monotone decrease of ⟨HC⟩ in the small-Δt limit but at the cost of a depth that grows as 1/Δt for fixed accuracy. Second-order FALQON (Arai et al. 2025, cited as [arai_scalable_2025]) augments the feedback law with ⟨B⟩k and ⟨C⟩k nested-commutator terms, giving βk = −(⟨A⟩k + Δt ⟨C⟩k) / (2 Δt ⟨B⟩k). The point of the second-order correction is precisely to let the practitioner pick a larger stable Δt — and the transfer experiment in this paper exploits that headroom.

Numerical protocol. Section IV describes the scan: for each n ∈ {6, 8, …, 24}, generate 20 random 3-regular graphs (the standard Y1 ensemble); for the training subset ntrain ∈ {6, 8, …, 18}, sweep Δt ∈ [0.1, 1.0] at step 0.001, run 16 FALQON layers, and pick the (Δt, {βk}) that maximises the approximation ratio at layer 16. An early-stopping rule terminates the sweep when divergence is detected. Ground-truth Max-Cut energies for normalising the approximation ratio are computed by classical simulated annealing per graph instance. The averaged optimal Δt vs. n is fitted on log-log axes (Figure 1) and produces the n−0.511 power law.

Transfer evaluation. The cross-evaluation matrix in Section IV runs 400 (source-graph × target-graph) trials per (ntrain, ntarget) cell with ntrain ≤ ntarget; the cell value is the mean approximation ratio after 16 layers when the source's (Δt, βk) drive the target's HC. The full matrix is rendered as the Figure 2 heatmap; the diagonal is the natively-optimised baseline.

The non-intuitive finding. Section V is where the paper does its real work. Naïvely, one would expect transfer quality to degrade as the structural mismatch (here, just |ntrain − ntarget|) grows. Instead, transferred approximation ratios are nearly flat in ntarget, and native optimisation degrades. The authors' diagnosis: native optimisation is a victim of its own caution. As n grows, the spectral norm of HC grows, so to remain inside the second-order stability envelope the native optimiser is pushed to smaller Δt — exactly the n−1/2 law of Figure 1. With l=16 hard-capped, that means each native run buys progressively less state motion. The transferred (Δt, βk) from n=6 carries Δt ≈ 0.20 — almost twice what the native optimiser at n=24 dares pick (≈0.115) — and the target run completes more useful evolution per layer.

Section VI's framing — and a Y1 connection. The authors explicitly call this a paradigm shift: the value of the second-order discretisation was supposed to be the larger Δt it tolerates, but native re-optimisation re-tightens Δt as n grows; transfer recovers the freedom that the second-order scheme was designed to give. This is structurally close to Y1's iterative warm-starting story for QAOA — there the warm start is a measurement-induced bias on |ψ0⟩, here the warm start is a (Δt, βk) schedule from a smaller instance. Both move the expensive optimisation off the large-instance critical path and exploit local structural similarity in 3-regular Max-Cut. The honest caveat the authors flag in Section VI — that the success likely depends on uniform local structure (3-regularity) and would not generalise to high-degree-variance graphs — is exactly the limitation that Y1's parameter-concentration / iterative warm-starting story should also be sanity-checked against.

What is and isn't done. No noise simulation: pure noiseless statevector. No hardware execution. No theoretical proof of why transfer works — explicitly listed as future work, alongside (a) extending to larger n, (b) other graph classes / problems, (c) noise resilience, (d) "graph reduction" — i.e., given a large instance, how to algorithmically choose a small representative training graph. The cap at n=24 keeps the entire study in the classically-tractable regime, so the quantum-advantage question is not addressed.

Figures

Figure 1: optimal Δt versus graph size on log-log axes
Figure 1. Average optimal time step (Δt) that maximizes the approximation ratio after 16 layers, plotted on a log-log scale as a function of the 3-regular graph size (n). The monotonically decreasing trend exhibits a strict power-law decay with a fitted regression of Δt ≈ 0.4984 · n−0.5110, indicating that the time step scales approximately as 1/(2√n).
Figure 2: FALQON parameter transferability matrix heatmap
Figure 2. FALQON parameter transferability matrix. The heatmap displays the average approximation ratio achieved when evaluating 16-layer FALQON circuits using parameters optimized on training graphs (n_train, y-axis) applied to the Hamiltonians of target graphs (n_target, x-axis).
Figure 3: native vs. transferred approximation ratio versus target size
Figure 3. Approximation ratio of natively optimized parameters (solid red line) versus parameters transferred from smaller instances (dashed lines). The native performance decays significantly as the target graph size (n) increases due to the algorithm selecting progressively smaller time steps (Δt) that restrict exploration within the fixed 16-layer limit. In contrast, transferring parameters from smaller graphs (particularly n = 6) significantly outperforms native optimization on large graphs.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography. Bibliography scanned for arXiv IDs 2502.09704, 2304.06915, 2410.16265, 2603.14744, 2510.08292, 2510.11213 and for "Yuan" + (warm-start | quasi-binary | QAOA portfolio | cardinality binary | Goemans-Williamson Pauli | PBR Heron) — no hits in main.tex or main_bibliography.bib.

Overlap with Y1–Y6

Recommended action for Yuan