Tensor network surrogate models for variational quantum computation

Ryo Watanabe, Dries Sels, Joseph Tindall (Osaka / Boston / Flatiron) · arXiv:2604.20180 · new submission 2026-04-23 · score 8/10

Abstract

We adopt a two-dimensional tensor-network (TN) ansatz to simulate variational quantum algorithms on two-dimensional qubit architectures, demonstrating its capability to accurately simulate deep circuits through the Quantum Approximate Optimization Algorithm (QAOA) applied to Ising spin-glass problems on heavy-hexagonal and square lattices. For heavy-hexagonal problems with up to three-body interactions, parameters trained on small instances and transferred to systems an order of magnitude larger improve the sampled energy distribution only up to intermediate depths, indicating a fundamental limit of parameter concentration as a transfer strategy. By extending the training itself with TN simulations on larger system sizes, we avoid local minima and obtain lower-energy samples. Analyses of entanglement growth and importance sampling show that the simulation remains classically feasible with moderate bond dimension. We find that parameter concentration also persists on square lattices, albeit at substantially higher computational cost to perform reliable sampling. Overall, our TN framework not only provides an efficient and controlled framework for benchmarking variational quantum algorithms on two-dimensional lattices, but also serves as an effective surrogate model for training variational algorithms.

Executive summary

Watanabe, Sels and Tindall push the classical-simulability frontier of QAOA on IBM-native heavy-hex and square lattices to depths p=100 and 127 qubits — a regime previously inaccessible to both state-vector simulators and real hardware. Their tool is a connectivity-aware 2D tensor-network ansatz with belief-propagation truncations and boundary-MPS sampling. The headline scientific finding is that “parameter concentration” (training on small instances, transferring to larger ones) breaks down at intermediate depth; training directly on n′=35 TN simulations (beyond state-vector) is needed to escape local minima at n=127. This is directly relevant to Yuan's Y1 and Y3: the same parameter-concentration/transfer argument that Yuan relies on for layerwise DGMVP optimisation has a ceiling, and this paper characterises where it hits.

Main contribution

They show that (i) a single-site BP-approximated TN ansatz with bond dimension χ=32 faithfully simulates QAOA on heavy-hex IBM lattices to p=100 (with measurable importance-sampling variance and entanglement well below the χ-imposed ceiling), (ii) parameter transfer from n′=16 saturates at moderate depth and cannot reach the global optimum at n=127, (iii) larger training instances (n′=35, themselves optimised inside the TN simulator) do escape local minima and produce lower-energy samples on the 127-qubit target — demonstrating that TN simulations can serve as a training surrogate, not just a forward-simulation benchmark.

Key algorithmic ingredients

Detailed walkthrough

Section II.B fixes the problem class: Ising spin-glasses on (a) heavy-hex lattices with linear/quadratic/cubic couplings uniform on {±1}, and (b) square lattices with only nearest-neighbour quadratic couplings ±1. Both targets are drawn from the IBM benchmark papers (Kim 2023 Nature; Pelofske 2024) specifically to enable head-to-head comparison with hardware runs.

Section III.A describes parameter optimisation. They train on k=100 problem instances at small n′, take the median of the trained angles across instances, and reuse these on the target n. This is parameter concentration with median aggregation; the interpolation ansatz keeps the BOBYQA search in a C=10-dimensional space even at depth 100.

Section III.B is the TN machinery. They use BP to truncate tensors after each gate — converged BP messages furnish a local environment, SVD truncation to bond dimension χ preserves the max-overlap projection within that environment. For the cubic ZZZ rotations in the heavy-hex cost Hamiltonian, they decompose into single-Z + two-qubit CNOTs. Boundary-MPS sampling follows Rudolph's 2025 code; amplitude and norm MPS ranks Rm, RM are independent knobs, and the importance weight ω = P/Q quantifies faithfulness.

Section IV.A presents the 127-qubit heavy-hex results. Figure 3 (ibm_washington_0) shows the energy histogram at p=10, 50, 100 against the Pelofske p=5 baseline. Deeper schedules clearly drive to lower energies, but no improvement beyond p≥50 — the parameter transfer from n′=16 has saturated. Panels (b)–(e) of the same figure show importance-weight distributions: means near unity, variances small at p=10 but significantly larger at p=50,100 — i.e. the TN state-preserves the norm, but sampling becomes harder at depth.

Section IV.B contains the novel contribution. Since n′=27,35 are beyond state-vector reach, the authors run the BOBYQA loop inside the TN simulator with χ=8. Figure 5 (ibm_washington_0_TN) shows that n′=27-trained parameters look almost identical to n′=16 (because n′=27 already hits optimum and saturates), but n′=35-trained parameters visibly escape the local minimum and produce lower energies at the n=127 target — already at p=10. Critically, the entanglement Scut (panels d–f) is lower for the better-performing n′=35 schedule, which is exactly the self-consistent regime where TN simulation stays cheap.

Section IV.C extends to square lattices, where BP fails (short loops) and the norm is not well-preserved. They compensate by normalising ω by its mean and using larger boundary-MPS ranks (RM=64, 98). Parameter concentration still works; accuracy still scales with depth; but the cost per sample balloons, forcing GPU boundary-MPS contractions.

The two cross-cutting messages for Yuan: (1) Parameter concentration is not a free lunch — there is a ceiling, and it scales with the representativeness of the training instance, not just its size. This is directly relevant to Y3's layerwise-optimisation story. (2) Classical TN simulations can now serve as training surrogates for variational quantum algorithms at scales beyond state-vector reach. That changes the calculus of “is this QAOA experiment useful” in the practical-quantum-advantage sense central to Y3.

Figures

Figure 1
Figure 1. Heavy-hexagonal lattices used in this work: (a) ibmq_guadalupe (n=16), (b) ibm_geneva (n=27), (c) 2×2 grid (n=35), (d) ibm_washington (n=127).
Figure 2
Figure 2. (a) Sampled-energy histogram for the 127-qubit ibm_washington instance (minimum energy −200); baseline (γ*,β*) at p=5 vs optimised schedules at p=10, 50, 100. (b–e) Importance-sampling weights ω for each schedule. χ=32, Rm=χ, RM=1.
Figure 3
Figure 3. Dependence on norm-MPS rank RM∈{4,8,16,32} at p=100 for the same ibm_washington instance; sampling variance drops sharply with RM while the energy histogram shifts only slightly.
Figure 4
Figure 4. Bipartite entanglement entropy Scut along bisecting edges vs normalised circuit step j/p. Shallow schedules grow linearly; deep schedules saturate well below the χ=32 ceiling of 35.
Figure 5
Figure 5. (a–c) Sampled-energy histograms for ibm_washington comparing parameters trained on n′=27 vs n′=35 at p=10,50,100. (d–f) Entanglement entropy; n′=35 training gives lower Scut alongside lower energies.
Figure 6
Figure 6. Sampling results for a 6×6 square-lattice Ising spin-glass (ground-state energy −46) with RM=32, 64 at p=10, 25, plus corresponding normalised-weight distributions.
Figure 7
Figure 7. 27-qubit ibm_geneva instance (minimum energy −42); at p=50 the sampling variance drops to 3.56×10−8, effectively exact.
Figure 8
Figure 8. Bipartite entanglement entropy on the ibm_geneva TN ansatz at p=10, 50 and the baseline (γ*,β*); the p=50 “optimal” schedule drives Scut nearly to zero.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan