Quantum End-to-End Learning for Contextual Combinatorial Optimization

Jaehwan Lee, Changhyun Kwon (KAIST / Omelet) · arXiv:2605.20222 · submitted 2026-05-21 · score 8/10 (HIGH)

Abstract

Contextual combinatorial optimization (CCO) plays a critical role in decision-making under uncertainty, yet remains a significant challenge. We present Quantum End-to-End Learning (QEL), the first quantum computing-based end-to-end learning framework for CCO that leverages Quantum Approximate Optimization Algorithms. Inspired by the integration of state preparation and evolution in data re-uploading, we propose a context re-uploading phase-separator that jointly captures the complex relations among contexts, uncertain coefficients, and optimal solutions. This allows a contextual encoder to be seamlessly integrated within a quantum surrogate policy, enabling joint end-to-end training with a stationarity guarantee. Exploiting an optimization-aware structure grounded in physical principles that classical methods cannot readily leverage, our approach demonstrates practicality by directly training on task loss despite the discreteness and nonconvexity, while avoiding calls to NP-hard optimization solvers. QEL empirically achieves competitive performance while requiring substantially fewer parameters than classical benchmarks, highlighting its industrial-level potential for the future quantum era.

Executive summary

QEL is a hybrid quantum framework that fuses QAOA with data re-uploading to handle contextual combinatorial optimization — problems where the cost coefficients y are not known at decision time but a context x is. The novelty is a context re-uploading phase-separator: a context-dependent final Hamiltonian whose Ising weights are produced by a small linear or logistic encoder gw(x), then exponentiated inside each QAOA layer. The authors prove the QEL loss is equivalent to empirical risk minimisation of the task loss (Theorem 3.1) and that a diminishing-step parameter-shift SGD converges to stationarity (Theorem 3.2). Empirically on 16- and 25-variable MaxCut, QAP, and bipartite-matching instances, QEL matches or beats classical PnO baselines (LODL, List-LTR) with one to three orders of magnitude fewer trainable parameters. For Yuan, this is closely adjacent to Y2/Y3 — same QAOA-on-Ising machinery, but with the twist that the cost coefficients are learnt from context rather than supplied as data.

Main contribution

The paper's central technical move is replacing the fixed final Hamiltonian HF = ΣhiZi + ΣyijZiZj with a context-conditioned Hamiltonian ĤF = ΣhiZi + Σgw(xij)ZiZj, where gw is either linear (Eq. 13) or logistic (Eq. 14). Repeated phase-separator applications (the QAOA depth p) double as data re-uploading layers, so a single trainable parameter set φ = {θI, θF, w} encodes the predictor and the variational policy. Theorem 3.1 ("QEL as End-to-End Learning") shows that minimising the variational expectation of HF(y) against the QEL ansatz is exactly the empirical risk minimisation problem of contextual optimisation — eliminating the differentiation-through-solver headache that classical PnO approaches face for discrete decisions.

Key theorems / lemmas / algorithms

Detailed walkthrough

Section 2 (preliminaries) rehearses the variational principle and the adiabatic theorem, then formulates the standard Ising final Hamiltonian (Eq. 11) and observes that any QUBO reduces to it via Zi = 1 − 2xi. Standard QAOA territory — the value-add comes in §3.

Section 3.2 (context re-uploading phase-separator) is the novelty. The authors note an awkward mismatch: QML uses fixed amplitude/kernel encodings; QAOA strictly requires the equal-superposition initial state. Conventional contextual-optimisation pipelines therefore cannot plug an ML predictor into the QAOA circuit. Their fix is to embed the contextual encoder inside the phase-separator. Each phase-separator application exp[−iθFk ĤF(gw(x))] simultaneously (i) injects edge-level context features xij and (ii) advances the variational evolution. Because the angle θFk is bounded by the parameter-shift mechanics, they argue a logistic encoder (Eq. 14) — saturating in [0, w0] — better matches the rotation-angle range than a raw linear encoder. Depth p doubles as the data re-uploading count.

Section 3.3 (training and test-time process) ties the construction to empirical risk minimisation. The proof of Theorem 3.1 is two lines but conceptually important: because HF is diagonal in the computational basis, ⟨ψ|HF(y)|ψ⟩ = Ez∼π[f(z, y)], so the variational principle's lower-bound objective is exactly a policy-gradient-style expected cost. They use full-batch parameter-shift gradients (Schuld 2019; Wierichs 2022). Theorem 3.2 deferes regularity conditions to Appendix F; the key claim is that despite finite-shot stochasticity, diminishing step sizes (Robbins-Monro conditions: Σηt = ∞, Σηt² < ∞) recover both expected-norm and almost-sure stationarity.

Section 4 (experiments) uses three problems at 16 and 25 variables: MaxCut, QAP, and a bipartite-matching variant (BMP) from Elmachtoub & Grigas. Constraints are handled by penalty terms (the authors flag constraint-preserving mixers as future work — this is a key point of contact with Y2). Baselines are LODL (Shah 2022) and List-LTR (Mandi 2022). The headline results (Table 1): on MaxCut-25V, QEL-Log p=4 achieves 4.21% regret with 13 parameters versus LODL's 8.88% with 30,188 parameters — a 3-order-of-magnitude parameter reduction at half the regret. QAP results are mixed (LODL wins on QAP) but the parameter ratio is similar. For BMP-25V (the constrained problem), QEL-Log p=4 hits 60% relative regret versus 80% for LODL, but neither is competitive in absolute terms — penalisation hurts both. Increasing p from 3 to 4 generally lowers regret, consistent with QAOA → adiabatic in the p→∞ limit. QEL-Log dominates QEL-Lin everywhere, suggesting nonlinear context encoders matter.

Section 5 (conclusion) calls out constraint-preserving mixers, higher-order objectives, and real-hardware validation as immediate follow-ups. The authors do not benchmark on noisy simulators or actual NISQ devices — all results are noiseless GPU statevector simulation. For Yuan's portfolio-optimisation lineage (Y3) which extensively studied thermal-relaxation regimes, this is a notable gap.

Figures

Figure 1
Figure 1. Schematic of a parametrized quantum circuit (PQC) and the hybrid algorithm loop. 𝓔 denotes the data encoding unitary, and 𝓡1, …, 𝓡r denote parametrized rotation unitaries.
Figure 2
Figure 2. Architecture of the QEL ansatz extending context re-uploading phase-separators. The contextual encoder gw(x) repeatedly uploads latent information of contexts directly into the context-dependent final Hamiltonian ĤF in the phase-separator.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan