Benchmarking Quantum Annealing for a Greenhouse-Inspired Control QUBO

Hamzeh Alavirad, Maryam Bahrami Zanjani (QUSMOS) · arXiv:2605.27670 · submitted 2026-05-28 · score 8/10

Abstract

We benchmark current annealing-based optimization workflows on a greenhouse-inspired quadratic unconstrained binary optimization problem for binary heater scheduling, where the horizon H denotes the number of hourly control decisions. For the main one-day instance (H=24), all solver outputs are decoded back into heater schedules and evaluated in the original greenhouse simulator using the same physical objective and feasibility criterion. Classical simulated annealing and path-integral simulated quantum annealing produce feasible near-optimal solutions in all repetitions, with best objectives close to the exact optimum. In contrast, the tested D-Wave Leap Hybrid BQM workflow is less reliable and does not outperform the classical baselines under 15–60 s requested time limits. Direct D-Wave QPU execution on reduced instances remains feasible in all runs and recovers the exact optimum for H=10 and H=12, but the exact-hit rate drops from 5/10 to 2/10 and then to 0/10 at H=14, with substantially higher variance than the classical baselines. The results do not indicate quantum advantage, but provide a reproducible, physically decoded benchmark that exposes the current strengths and limitations of classical, hybrid, and direct quantum annealing workflows on structured control QUBOs.

Executive summary

This is exactly the kind of negative-result QUBO benchmark Yuan’s Y3 paper builds its case on: a structured combinatorial control problem (binary heater scheduling over a 24-hour horizon) is mapped to a 312-variable BQM, solved on D-Wave hardware (Advantage2_system1) and Leap Hybrid BQM, and compared against an exact ground truth and against classical simulated annealing (SA) and path-integral simulated quantum annealing (PIA). Across every comparison the classical baselines win on mean decoded objective, and the QPU’s exact-hit rate degrades from 5/10 at H=10 to 0/10 at H=14. The paper makes no quantum-advantage claim and explicitly contextualises its findings as a reproducible, physically-decoded benchmark. For Yuan, the relevance is direct: it is independent confirmation that, on a structured constrained-binary-optimisation QUBO at the scales currently reachable on real hardware, simulated annealing remains a strong baseline that hybrid/QPU workflows fail to beat — the same headline conclusion as Y3’s thermal-regime DGMVP portfolio result.

Main contribution

The contribution is twofold. (1) A reproducible greenhouse-inspired control QUBO benchmark with a two-state thermal model, exogenous solar/outdoor-temperature/electricity-price forcing, a quadratic growth proxy, hard temperature-bound constraints encoded via binary slack variables, and a switching-penalty term. The H=24 instance yields a 312-variable BQM with 4 596 quadratic couplers (24 control bits + 288 slack bits with M=6 slack-bits-per-bound-per-step). (2) A clean head-to-head comparison of exact enumeration, SA, PIA, Leap Hybrid BQM (15/30/60 s), and direct Advantage2 QPU execution at reduced horizons. Every QUBO-based method is evaluated not on raw BQM energy but on the decoded simulator-side objective Jtotal = GtotJEJS after re-rolling the schedule through the original simulator and checking temperature-bound feasibility separately — an explicit separation of internal-energy minimisation and decoded physical performance that prior single-objective annealing benchmarks routinely conflate.

Key algorithms / settings

Detailed walkthrough

The benchmark is structured as a single-actuator (binary heater) model-predictive-control problem over a 24-hour horizon at Δt=1 h. The plant is a discrete-time linear two-state thermal system — air temperature Tair,k and a slower body/canopy temperature Tbody,k — driven by outdoor temperature, a half-sine solar radiation profile, and the heater. A growth proxy gk = Lk(gmax − η(Tgrow,kT∗)2) penalises departures from a preferred growth temperature T∗=22°C, scaled by a saturating light factor. Energy use is metered as Ek = PheaterukΔt, electricity cost as Ck = λEpkEk, switching cost is λS per heater toggle, and the temperature trajectory is constrained to 16–26°C.

Section 3 unrolls the discrete-time dynamics, writes the state as an affine function xk = k + ∑tΓk,tut of the binary heater sequence, and substitutes into the growth and feasibility expressions. Because Tgrow,k is affine in u, the squared growth deviation is quadratic in u; the energy and switching terms are linear-quadratic; and the bounds enter as two one-sided quadratic penalties with lower- and upper-slack binary registers. The authors note explicitly (and somewhat unusually for QUBO benchmark papers) that they use both lower and upper slacks even though only one is required, doubling the slack footprint and the coupler-graph density — this design choice is called out as a likely contributor to the dense BQM and is flagged for future improvement.

Section 4 spells out the solver protocol with a level of reproducibility detail that is rare for D-Wave benchmarks: explicit dwave-system v1.34.0, dimod v0.12.21, dwave-samplers v1.7.0 versions; explicit annealing time (20 µs); the solver Advantage2_system1; the hybrid solver tag hybrid_binary_quadratic_model_version2p. The paper also flags the limitation that EmbeddingComposite’s default chain strength was used and chain-break statistics were not recorded — an honest disclaimer that the QPU degradation at H=14 cannot be cleanly attributed to logical-problem difficulty vs. embedding quality.

Section 5’s results are the headline. At H=24, exact enumeration (~224~17M schedules, 1 028 s wall clock) gives J=−145.14. SA achieves −150.11 best and −161.60 mean in 31.84 s; PIA matches it in 3.6 s (best −149.11, mean −169.94). The Leap Hybrid BQM at 30 s is markedly worse: best −174.32, mean −203.84, and only 7 of 10 runs return a feasible decoded schedule. Increasing the requested hybrid time to 60 s actually decreases the feasible-run rate to 2/10 (Table 3). The authors carefully caveat that this non-monotonicity could be a sample-size artefact (10 submissions per setting) and that the hybrid solver is not designed for H=24 problem sizes — an important honest framing. Crucially, when the hybrid solution is feasible, it tends to over-heat (best hybrid uses 1 300 kWh, 13 heater-on intervals, vs. exact 1 100 kWh, 11), driving growth slightly above exact (41.68 vs. 40.36) but objective well below.

On the reduced direct-QPU scan (Section 5, Table 4), the exact-hit rate cleanly degrades with horizon: H=10 = 5/10, H=12 = 2/10, H=14 = 0/10, while SA and PIA stay essentially at the exact optimum (PIA mean −6.79 ± 0.38 at H=12, −36.74 ± 2.33 at H=14). The mean QPU performance also shows much larger variance (±12.71, ±18.07, ±23.38) than SA/PIA. Importantly, all QPU samples remain feasible — the degradation is in optimality, not constraint satisfaction. This is a cleaner story than typical D-Wave benchmarks, which often blur feasibility-failure into “quantum failure.”

Section 5’s discussion is unusually honest about why the hybrid solver underperforms: penalty scaling (Abound=120 chosen empirically), the dense coupling structure from the rollout dynamics, the mismatch between internal BQM energy and decoded simulator objective, and the doubled-slack representation. Section 6 lists limitations clearly — coarse 1-hour sampling, single binary actuator, no manual chain-strength tuning, no embedding-quality logging — in a way that pre-empts reviewer push-back. Section 7’s conclusion repeats the no-quantum-advantage framing and points to multi-day horizons (H=72, H=168) plus richer multi-actuator models as next steps.

Figures

Toy model overview
Figure 1. Overview of the greenhouse toy-model simulator over the benchmark horizon, with the horizontal axis expressed as clock time. From top to bottom, the panels show: (i) the thermal trajectories Tout, Tair, and Tbody together with the bounds Tmin, Tmax, and the reference temperature T∗; (ii) the prescribed solar radiation and the derived light factor; (iii) the heater control variable uk∈{0,1}, shown for both commanded and applied heater signals; (iv) the instantaneous growth proxy and cumulative growth; (v) the interval energy use and cumulative energy consumption; and (vi) the electricity price tariff used to weight energy consumption over the day.
Air temperature trajectories
Figure 2. Best feasible air-temperature trajectories for the H=24 benchmark. Only the best feasible decoded solution of each solver workflow is shown to keep the comparison readable. The admissible interval is defined by Tmin=16°C and Tmax=26°C.
Body temperature trajectories
Figure 3. Body-temperature trajectories for the best feasible decoded solutions of the H=24 benchmark. The body temperature is smoother than the air temperature and mainly serves as a secondary diagnostic of the two-state thermal model.
Cumulative growth
Figure 4. Cumulative growth proxy for the best feasible decoded solutions of the H=24 benchmark. The figure is provided as an additional diagnostic because the main growth–energy–switching trade-off is reported quantitatively in Table 2.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan