Second-Order FALQON Parameter Transfer for the Max-Cut Problem on 3-Regular Graphs

Gabriel Fernandes Thomaz, Eduarda Rodrigues Monteiro, Jerusa Marchi, Marcelo Zen Pretto, Alisson dos Passos Fumaco, Evandro Chagas Ribeiro da Rosa · arXiv:2605.04253 · 2026-05-07 (cross from cs.ET) · score 9/10

Abstract

The Feedback-based Algorithm for Quantum Optimization (FALQON) offers a deterministic alternative to variational quantum algorithms by bypassing classical optimization loops. However, maintaining convergence on large problem instances often requires restricting the time step, necessitating quantum circuit depths that exceed Noisy Intermediate-Scale Quantum (NISQ) hardware capabilities. This paper investigates the parameter transferability of second-order FALQON applied to the Max-Cut problem on 3-regular graphs. Through numerical experiments evaluating quantum circuits up to 16 layers on graphs up to 24 nodes, we demonstrate a highly advantageous scaling behavior: transferring feedback parameters optimized on small instances to larger target graphs yields significantly higher approximation ratios than natively optimizing the parameters directly on the larger graphs. This performance advantage arises because parameters trained on smaller instances can safely adopt aggressively larger time steps. By offloading the expensive parameter discovery phase to small-scale instances, this transfer strategy simultaneously reduces computational overhead and enhances the approximation ratio, thereby bringing FALQON closer to practical viability on near-term quantum architectures.

Executive summary

The authors take FALQON — a deterministic, feedback-driven cousin of QAOA in which the layer parameters β_k are computed online from measured commutator expectation values rather than chosen by a classical optimiser — and show that for 3-regular Max-Cut, the (Δt, β_k) sequence learned on a tiny n=6 graph transfers to graphs up to n=24, where it actually beats native re-optimisation at fixed depth (l=16). The mechanism is precisely the warm-starting intuition Yuan exploits in Y1: native optimisation on a larger instance is forced to retreat to ever-smaller Δt to preserve monotone convergence, which strangles state evolution within a fixed circuit-depth budget. The transferred parameters smuggle in a larger Δt that the small-instance landscape "safely" sanctioned, giving more useful state motion per layer. For Yuan's programme, this is the closest analogue to Y1's iterative warm-start logic that has appeared on arXiv this announce cycle.

Main contribution

For second-order FALQON applied to Max-Cut on 3-regular graphs, the authors empirically establish (i) that the optimal time step decays as Δt ≈ 0.4984 · n^−0.5110 ≈ 1/(2√n), formalising the small-step bottleneck that natively-trained 16-layer FALQON suffers as n grows; and (ii) that transferring (Δt, β_k) from a small training graph (especially n_train=6) to a larger target graph yields strictly higher approximation ratios than the native run, across all n_target ∈ {6,…,24}, averaged over a 20×20 cross-evaluation per (n_train, n_target) pair. The paper reframes parameter transfer not as "good-enough re-use" but as an algorithmic improvement in its own right: it lets a depth-16 circuit access the larger-Δt regime that small instances tolerated, side-stepping the depth/Δt trade-off the second-order discretisation was designed to relax.

Key results / algorithmic ingredients

Algorithm — second-order FALQON feedback law: at layer k, set β_k = −(⟨A⟩_k + Δt ⟨C⟩_k) / (2 Δt ⟨B⟩_k), where A = i[H_M, H_C], B = ½[[H_M, H_C], H_M], C = [[H_M, H_C], H_C]. Layer evolution: |ψ_k+1⟩ = e^{−iΔt β_k H_M} e^{−iΔt H_C} |ψ_k⟩.
Empirical scaling law (Section IV): the optimal Δt averaged over 20 random 3-regular graphs per size obeys Δt ≈ 0.4984 · n^−0.5110 ≈ 1/(2√n) on the interval [0.1, 1.0] swept at resolution 0.001.
Transfer protocol (Section IV): for each (n_train, n_target) with n_train ≤ n_target, evaluate all 20×20=400 pairs, applying (Δt, β_k) from the source's optimum to the target's H_C; report mean approximation ratio.
Headline empirical result (Section V, Figure 3): transferring from n_train=6 yields a near-flat approximation-ratio curve in n_target, while native optimisation degrades steadily — at large n_target, the transferred-from-6 curve is well above native.
Mechanism (Section V): at n=6 the optimum sits near Δt≈0.20; at n=24 native picks Δt≈0.115. The larger borrowed Δt drives more state motion per layer, and the second-order feedback law's enhanced stability permits this without divergence.

Detailed walkthrough

Setup. Section II frames Max-Cut canonically as ground-state preparation of H_C = −½ Σ_(i,j)∈E(1 − Z_iZ_j) with the standard transverse-field mixer H_M = Σ X_i. FALQON in its original first-order form picks β_k = −⟨A⟩_k with A = i[H_M, H_C]; this guarantees a monotone decrease of ⟨H_C⟩ in the small-Δt limit but at the cost of a depth that grows as 1/Δt for fixed accuracy. Second-order FALQON (Arai et al. 2025, cited as [arai_scalable_2025]) augments the feedback law with ⟨B⟩_k and ⟨C⟩_k nested-commutator terms, giving β_k = −(⟨A⟩_k + Δt ⟨C⟩_k) / (2 Δt ⟨B⟩_k). The point of the second-order correction is precisely to let the practitioner pick a larger stable Δt — and the transfer experiment in this paper exploits that headroom.

Numerical protocol. Section IV describes the scan: for each n ∈ {6, 8, …, 24}, generate 20 random 3-regular graphs (the standard Y1 ensemble); for the training subset n_train ∈ {6, 8, …, 18}, sweep Δt ∈ [0.1, 1.0] at step 0.001, run 16 FALQON layers, and pick the (Δt, {β_k}) that maximises the approximation ratio at layer 16. An early-stopping rule terminates the sweep when divergence is detected. Ground-truth Max-Cut energies for normalising the approximation ratio are computed by classical simulated annealing per graph instance. The averaged optimal Δt vs. n is fitted on log-log axes (Figure 1) and produces the n^−0.511 power law.

Transfer evaluation. The cross-evaluation matrix in Section IV runs 400 (source-graph × target-graph) trials per (n_train, n_target) cell with n_train ≤ n_target; the cell value is the mean approximation ratio after 16 layers when the source's (Δt, β_k) drive the target's H_C. The full matrix is rendered as the Figure 2 heatmap; the diagonal is the natively-optimised baseline.

The non-intuitive finding. Section V is where the paper does its real work. Naïvely, one would expect transfer quality to degrade as the structural mismatch (here, just |n_train − n_target|) grows. Instead, transferred approximation ratios are nearly flat in n_target, and native optimisation degrades. The authors' diagnosis: native optimisation is a victim of its own caution. As n grows, the spectral norm of H_C grows, so to remain inside the second-order stability envelope the native optimiser is pushed to smaller Δt — exactly the n^−1/2 law of Figure 1. With l=16 hard-capped, that means each native run buys progressively less state motion. The transferred (Δt, β_k) from n=6 carries Δt ≈ 0.20 — almost twice what the native optimiser at n=24 dares pick (≈0.115) — and the target run completes more useful evolution per layer.

Section VI's framing — and a Y1 connection. The authors explicitly call this a paradigm shift: the value of the second-order discretisation was supposed to be the larger Δt it tolerates, but native re-optimisation re-tightens Δt as n grows; transfer recovers the freedom that the second-order scheme was designed to give. This is structurally close to Y1's iterative warm-starting story for QAOA — there the warm start is a measurement-induced bias on |ψ₀⟩, here the warm start is a (Δt, β_k) schedule from a smaller instance. Both move the expensive optimisation off the large-instance critical path and exploit local structural similarity in 3-regular Max-Cut. The honest caveat the authors flag in Section VI — that the success likely depends on uniform local structure (3-regularity) and would not generalise to high-degree-variance graphs — is exactly the limitation that Y1's parameter-concentration / iterative warm-starting story should also be sanity-checked against.

What is and isn't done. No noise simulation: pure noiseless statevector. No hardware execution. No theoretical proof of why transfer works — explicitly listed as future work, alongside (a) extending to larger n, (b) other graph classes / problems, (c) noise resilience, (d) "graph reduction" — i.e., given a large instance, how to algorithmically choose a small representative training graph. The cap at n=24 keeps the entire study in the classically-tractable regime, so the quantum-advantage question is not addressed.

Figures

Figure 1: optimal Δt versus graph size on log-log axes — Figure 1. Average optimal time step (Δt) that maximizes the approximation ratio after 16 layers, plotted on a log-log scale as a function of the 3-regular graph size (n). The monotonically decreasing trend exhibits a strict power-law decay with a fitted regression of Δt ≈ 0.4984 · n^−0.5110, indicating that the time step scales approximately as 1/(2√n).

Figure 2: FALQON parameter transferability matrix heatmap — Figure 2. FALQON parameter transferability matrix. The heatmap displays the average approximation ratio achieved when evaluating 16-layer FALQON circuits using parameters optimized on training graphs (n_train, y-axis) applied to the Hamiltonians of target graphs (n_target, x-axis).

Figure 3: native vs. transferred approximation ratio versus target size — Figure 3. Approximation ratio of natively optimized parameters (solid red line) versus parameters transferred from smaller instances (dashed lines). The native performance decays significantly as the target graph size (n) increases due to the algorithm selecting progressively smaller time steps (Δt) that restrict exploration within the fixed 16-layer limit. In contrast, transferring parameters from smaller graphs (particularly n = 6) significantly outperforms native optimization on large graphs.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography. Bibliography scanned for arXiv IDs 2502.09704, 2304.06915, 2410.16265, 2603.14744, 2510.08292, 2510.11213 and for "Yuan" + (warm-start | quasi-binary | QAOA portfolio | cardinality binary | Goemans-Williamson Pauli | PBR Heron) — no hits in main.tex or main_bibliography.bib.

Overlap with Y1–Y6

Y1 (iterative warm-started QAOA, 3-regular Max-Cut): direct method-and-scope kinship. Both papers attack the same testbed (3-regular Max-Cut, n up to ~24, NISQ-budget circuit depths) and both move the expensive optimisation off the large-instance critical path — Y1 via measurement-based warm-start of |ψ₀⟩, this paper via parameter transfer of (Δt, β_k) from a small training graph. The core mechanism in both cases is exploiting local structural similarity in 3-regular instances. The 1/(2√n) Δt scaling reported here is a useful empirical lever Y1 could potentially absorb when comparing iterations across graph sizes.
Y2 (quasi-binary encoding QAOA, hard mixers, portfolio): shared territory of layered ansätze for combinatorial optimisation, but the constraint structure (hard cardinality / encoding) is absent here — Max-Cut is unconstrained binary optimisation, so the mixer choice is generic.
Y3 (QAOA DGMVP portfolio, layerwise optimisation): Y3's layerwise-optimisation finding has the same flavour — restricting parameter discovery per layer is strictly more robust than naive joint optimisation. This paper's "optimise on small, transfer to large" is a cousin: amortise the optimisation over a smaller workload. Worth cross-citing.
Y4 (Grover + ADMM cardinality-constrained): no overlap. Different algorithmic family (Grover-search, structured feasible spaces).
Y5 (GW SDP via Pauli-sparse Gibbs states): tangential — Y5 attacks Max-Cut's SDP relaxation rather than QAOA-style direct optimisation. The contrast with FALQON's cost/time-step trade-off is a useful framing for the Y5 narrative.
Y6 (PBR on Heron2): no overlap.

Recommended action for Yuan

Cite in any future Y1 follow-up. This is the cleanest contemporary appearance of "transfer/warm-start beats native at fixed depth on 3-regular Max-Cut" outside of Y1's own warm-start story; the n^−1/2 Δt law and the heatmap evidence are useful supporting empirical context for Y1's positioning.
Read deeper before claiming distinct novelty. The authors are not aware of Y1 (no citation). If Y1's iterative warm-start mechanism subsumes parameter-transfer-from-small-instance as a special case, that is worth documenting. If they are genuinely distinct (Y1's measurement-based warm-start vs. their parameter-schedule transfer), a short comparison paragraph in the next Y1 revision would establish priority cleanly.
Consider an email to the corresponding author (Evandro Chagas Ribeiro da Rosa, UFSC / Eldorado) pointing to Y1. They appear to be unaware of the iterative warm-start literature; the connection would be welcome and may surface a useful collaboration on the "graph reduction" future-work item, which is exactly the question of how to choose a small representative instance for warm-starting.
Adapt the n^−1/2 Δt scaling as a benchmark. If Y1's iterative scheme implicitly chooses an effective Δt, plotting that against n would let Yuan pin down whether warm-starting buys exactly the "stay at the small-n Δt" headroom this paper isolates, or something stronger.