Divide-and-Conquer Neural Network Surrogates for Quantum Sampling: Accelerating Markov Chain Monte Carlo in Large-Scale Constrained Optimization Problems

Yuya Kawamata, Yuichiro Nakano, Keisuke Fujii (Osaka / Kyoto / RIKEN) · arXiv:2604.20701 · new submission 2026-04-23 · score 9/10

Abstract

Sampling problems are promising candidates for demonstrating quantum advantage, and one approach known as quantum-enhanced Markov chain Monte Carlo [Layden, D. et al., Nature 619, 282–287 (2023)] uses quantum samples as a proposal distribution to accelerate convergence to a target distribution. On the other hand, many practical problems are large-scale and constrained, making it difficult to construct efficient proposal distributions in classical methods and slowing down MCMC mixing. In this work, we propose a divide-and-conquer neural network surrogate framework for quantum sampling to accelerate MCMC under fixed Hamming weight constraints. Our method divides the interaction graph for an Ising problem into subgraphs, generates samples using QAOA for those subproblems with an XY mixer, and trains neural network surrogates conditioned on the Hamming weight to provide proposal distributions for each subset while preserving the constraint. In numerical experiments of Boltzmann sampling on 3-regular graphs, our method consistently accelerated mixing as the system size N increased, with average improvements in the autocorrelation decay rate constant by speedup factors of about 20.3 and 7.6 over classical pair-flip methods based on nearest-neighbor and non-nearest-neighbor exchanges, respectively. We also applied the method to an MNIST feature mask optimization problem with N=784, obtaining faster energy convergence and a 2.03% higher classification accuracy. These results show that our method enables efficient and scalable MCMC and can outperform classical methods for practical applications on NISQ devices.

Executive summary

Kawamata, Nakano and Fujii combine three threads that each independently matter to Yuan: (i) QAOA with an XY mixer on 3-regular graphs (Y1's benchmark problem + Y2's hard-mixer flavour), (ii) cardinality/Hamming-weight constraints (Y4's regime), and (iii) a divide-and-conquer decomposition followed by a conditional MADE neural-net surrogate that takes the QAOA sample distribution offline — removing the need to run QAOA for every MCMC proposal. The result: 20× / 7× speedups over Kawasaki (pair-flip) dynamics on 3-regular graph Boltzmann sampling up to N=512, and a usable NISQ-practical recipe for MNIST (N=784) feature mask optimisation under Hamming-weight constraint. Score 9/10 because the method stack is exactly the confluence of Y1, Y2 and Y4 — this is the paper most likely to influence Yuan's next draft.

Main contribution

The authors push the Layden “quantum-enhanced MCMC” paradigm to large, constrained systems by (i) decomposing the interaction graph into small blocks (|B|≤16 qubits) whose QAOA can be run on NISQ devices, (ii) training a conditional Masked Autoencoder for Distribution Estimation (MADE) per block, conditioned on Hamming weight, so the same trained network supports any feasible weight, and (iii) using the resulting block-local proposal distribution inside a Metropolis–Hastings chain that preserves the fixed-Hamming-weight constraint globally. The key constraint-preserving ingredient is the XY mixer HM = ½ Σ (XiXj + YiYj), which confines the QAOA evolution to the Hamming-weight-K subspace.

Key algorithmic ingredients

Detailed walkthrough

Sec. II.A sets up QAOA with an XY mixer on a constrained binary Ising problem. The XY mixer is the textbook tool for Y2-style hard-constraint enforcement: it commutes with the total spin projection along Z, so the state stays inside the Hamming-weight-K subspace. Sec. II.B recaps Layden 2023's quantum-enhanced MCMC: use QAOA samples as proposal distribution for Metropolis–Hastings, accept based on the Boltzmann ratio, avoid the exponentially slow mixing of classical local-move chains. The authors flag the cost issue — running QAOA per proposal is prohibitive — and cite Nakano 2025's MADE surrogate approach as the starting point.

Sec. III (Methods, their Fig. 1 concept_algo) lays out the full pipeline: (1) build two types of block partitioning (so any single spin-flip chain touches multiple blocks over time), (2) run QAOA with XY mixer and sample each block, (3) train a conditional MADE per block with Hamming weight as conditioning input, (4) use the MADE to generate MH proposals. The conditional MADE (Fig. 2, cmade) extends standard MADE (Germain 2015) with active/inactive connections gated by the conditioning Hamming weight — in effect, one neural network covers all feasible weights rather than requiring a separate network per K.

Sec. IV.A is the 3-regular-graph Boltzmann-sampling benchmark. They fix QAOA size |B|=16 and scale system size N=16, 32, 64, 128, 256, 512. Fig. 3 (N128B16_mcmc) shows the autocorrelation decay for N=128, with the fit ρ(l) = A exp(−τl) defining the decay rate. Fig. 4 (3regular_system) plots τ vs N: the proposed method maintains a speedup factor of ~20.3 over nearest-neighbour Kawasaki and ~7.6 over long-range Kawasaki uniformly across N — importantly, the gap does not close with N. This is the scaling argument Yuan cares about: fixed-size QAOA (|B|=16, NISQ-friendly) gives super-constant speedup that persists into the large-N regime classical methods bog down in.

Sec. IV.B fixes N=256 and varies QAOA block size |B|=2…16 (Fig. 5, 3regular_block). Below |B|=6 the proposed method is no better than Kawasaki; at |B|≥6 it surpasses both variants; and performance continues to improve monotonically with |B|. The operational message: the QAOA block size is the knob, and for 3-regular graphs the useful regime starts at |B|∼6 and scales gracefully up to the largest tested |B|=16.

Sec. IV.C is the MNIST application. They pose feature-mask selection as a QUBO with N=784 variables under fixed Hamming weight (cardinality) constraint — this is a bona fide Y4-style problem. Fig. 6 (mnist_energy) shows the proposed method drives the best-so-far energy down faster than global Kawasaki; at MCMC step 50, classification accuracy is 2.03% higher than the Kawasaki baseline. This is a concrete practical datum for “QAOA-on-NISQ helps a real ML preprocessing step” — not a contrived benchmark.

The cardinality constraint + XY mixer + block decomposition combo is the structural signal: this paper essentially builds a hybrid MCMC of the kind Y4's ADMM framework would use as the quantum-inner-loop proposal distribution. The 3-regular graph benchmark is Y1's target problem family. The hard-mixer philosophy is Y2's. It is difficult to find a quant-ph paper from this week more concentrated on Yuan's research axes.

Figures

Figure 1
Figure 1. Schematic of the proposed method. Four main components: (1) build two block partitions, (2) run QAOA + sampling per block, (3) train neural-network surrogates, (4) perform MCMC.
Figure 2
Figure 2. Conditional MADE. Thick lines indicate active connections, thin lines indicate absent connections; the Hamming weight conditions which paths are active.
Figure 3
Figure 3. Autocorrelation for N=128 with QAOA size |B|=16. Shaded region = std over 12 MCMC runs; solid = mean; dashed = fit A exp(−τ l). Decay rate τ defines convergence speed.
Figure 4
Figure 4. Decay rate τ vs system size N for 3-regular graphs at fixed QAOA size |B|=16. Larger τ = faster mixing.
Figure 5
Figure 5. Decay rate τ vs QAOA block size |B| for a 3-regular graph at N=256.
Figure 6
Figure 6. MCMC result for MNIST feature selection (N=784). Blue = proposed method; red = global Kawasaki dynamics; black dashed = minimum energy over multiple β values.
Figure 7
Figure 7. Autocorrelation ρ(l) for 3-regular graphs at fixed |B|=16 and N=16, 32, 64, 128, 256, 512 (panels a–f).
Figure 8
Figure 8. Autocorrelation ρ(l) for a 3-regular graph at fixed N=256 and QAOA size |B|=2, 4, 6, 8, 10, 12, 14, 16 (panels a–h).

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan