A Quantum Approach to Stochastic Optimization in Insurance Underwriting

Bordelon, Garfinkel, Dixit, Whitehead, Holzbauer, Mijares Vilarino, Maldonado Romo, Mitra, Kumar, Utke (Allstate / IBM Quantum) · arXiv:2605.01169 · submitted 2026-05-05 (announce cycle 2026-05-05) · 40 pages · score 9/10

Abstract

The presence of stochastic elements in combinatorial optimization problems makes them particularly challenging, as such problems quickly become intractable for classical computers even at relatively small sizes. In this work, we propose a novel quantum-classical hybrid scheme for solving a class of stochastic optimization problems known as chance-constrained knapsack problems, in which item weights follow probability distributions and constraints may be violated within a specified risk tolerance. Our method employs knapsack-specific QAOA-based circuits to generate samples which, when combined with a self-consistent classical recovery scheme introduced in this work, produce high-quality solutions. Experiments carried out on IBM Heron processors, using circuits with depths up to 177 and comprising 3443 gates acting on as many as 150 qubits, yield solutions that indicate improvement over classical optimization schemes. The proposed quantum-classical scheme paves the way to tackling such problems, with the potential to outperform approaches that rely solely on classical computation.

Executive summary

This is a 150-qubit utility-scale demonstration on IBM's Heron processors of QAOA-style sampling for a constrained binary optimization problem with stochastic constraints — chance-constrained knapsack (CCKP). The pipeline combines a knapsack-specific QAOA ansatz inspired by the Lagrangian-Dual Discretized Adiabatic scheme, CVaR-based variational training, parameter transfer from small surrogate problems, and a novel self-consistent classical recovery loop. At n=150 with 5 constraints, the quantum scheme reaches 93.1% of Gurobi's best while the strongest classical heuristic only manages 55.8%; at n=100 with 20 constraints the gap is 95.7% vs 41.86%. For Yuan, this is the closest May-cycle paper to Y2/Y3/Y4: it does end-to-end portfolio-style binary optimization with constraint-handling QAOA on the same superconducting hardware family used in Y6, with hybrid quantum-classical recovery analogous to Y4's ADMM workflow.

Main contribution

The paper extends QAOA-based combinatorial optimization to stochastic constraints. Theorem 1 reformulates the probabilistic feasibility condition into a deterministic second-order cone constraint under closed-under-convolution distributions. The quantum circuit is a Trotterized LD-DAQC ansatz with both X and XX mixer terms applied along a linear chain (well-matched to IBM heavy-hex). Three innovations: (i) a CVaR upper-tail loss with quadratic feasibility penalty λ; (ii) a Hungarian-algorithm parameter-transfer scheme that aligns constraints by a 4-d feature vector (mean weight, weight density, mean σ, capacity-utilization ratio) so trained parameters from a small instance warm-start a large one; (iii) a self-consistent recovery loop with two-phase per-sample bit-flip repair (feasibility restoration via "relief" maximization, then greedy improvement) and softmax-weighted probability vector updates batched over K mini-batches.

Key theorems and algorithms

Theorem 1 (deterministic reformulation): for weights drawn from a distribution closed under convolution with strictly increasing CDF Φ, the chance constraint P(Σ w̃ x ≤ C) ≥ 1−ε is equivalent to Σ μ x + Φ⁻¹(1−ε)·√(Σ σ² x) ≤ C (a second-order cone constraint).
Algorithm 1 (LD-DAQC ansatz): per-layer Hamiltonians H_problem = Σ (α_p v_i − Σ_m β_p^(m) μ_i^(m)) Z_i and H_mixer = −½ Σ γ_p X_i − ½ Σ γ_p X_iX_i+1, with the X+XX mixer enabling Hamming-weight transitions Δk ∈ {±1, 0, ±2}.
Two-state toy lemma (Sect. 2.3.1): on the {|10⟩, |01⟩} edge, the optimal saturation of the feasibility-throttled mixer is sin(2γ⋆) sin(δ⋆) = min(1, 2r̂/(μ_j−μ_j+1)) — three regimes: full transfer, throttled, and reversed.
Full-chain probability bound (Eq. 16): P(T) ≈ |ω_T|² (1 + 2γ Σ_{(j,j′)∈boundary} sin(2(φ_j − φ_j′))) — the XX-chain acts as a bank of parallel reduced-cost comparators.
Algorithm 2 (parameter transfer): Hungarian algorithm on a constraint-feature distance matrix to permute β-parameters before transfer from n_src qubits to n_tgt qubits.
Algorithm 3 (self-consistent recovery): Phase 1 maximizes "relief"_i/max(v_i, δ) + η(1 − p_i) to remove items from infeasible samples; Phase 2 greedily adds items in decreasing v_i order while feasible. Outer loop updates a selection-probability vector p via softmax-weighted, batched empirical averages.

Detailed walkthrough

The chance-constrained knapsack maximizes Σ v_i x_i subject to P(Σ w̃_i^(m) x_i ≤ C^(m)) ≥ 1−ε. Theorem 1 turns this into a tractable SOCP for stable distributions (Normal, Cauchy, Lévy); the paper restricts to Normal weights. The classical benchmarking phase (Sect. 2.2) sweeps 3,900 instances over n∈{20…150}, M∈{1…100}, μ_max∈{5…100}, σ²∈{0.5…3}, and constraint density / capacity ratios — fixing v_max=50, μ_max=5, r_C=0.25, r_D=0.5 as the hardest configuration. Gurobi proves optimality within 180 s for 100% of n=20 instances but only 36% of n=150 instances; for the largest instances Gurobi runs were extended up to 30 hours to obtain the "best" reference.

The variational circuit (Sect. 2.3) lays Z-rotations weighted by α_p v_i − Σ β_p^(m) μ_i^(m) on each qubit, followed by alternating X and XX rotations along a linear chain. The two-state toy analysis (Sect. 2.3.1) derives three operating regimes parameterized by the marginal residual r̂: when capacity is loose, the optimizer can saturate the bound at sin(2γ) sin(δ)=1; when tight, capacity throttles transfer; when violated, the optimizer is forced to reverse amplitude flow toward the lighter item. Section 2.3.2 generalizes this to the full chain and shows that |ω_T|² is boosted by a sum over boundary edges of "reduced-cost" terms sin(2(φ_j−φ_j′)) — a clean mechanistic explanation of why this ansatz concentrates amplitude on near-feasible high-value subsets.

The variational training uses CVaR with the upper-tail (most adverse outcomes), λ set to Σ v_i as a worst-case penalty bound, and COBYLA as the classical optimizer. To address barren plateaus (cited Larocca 2025, McClean 2018), training is restricted to small instances and parameters are warm-transferred to large instances via the Hungarian-aligned scheme. MPS simulation with bond dimension 512 covers up to n=150 before becoming prohibitive.

Hardware experiments use ibm_fez and ibm_pittsburgh (Heron processors) with 2¹⁵ shots, transpile optimization level 3, dynamical decoupling with the XX sequence. The headline circuit at n=150 has p=7 layers, total depth 177, 161 two-qubit depth, 3443 gates.

Section 2.4 introduces the recovery loop, drawing inspiration from Robledo-Moreno 2025's quantum-chemistry configuration-recovery scheme. Phase 1 removes items via a "relief" score that combines per-unit-profit relief with a low-frequency-item bonus (η weighted by 1−p_i); Phase 2 greedily adds back items in decreasing v_i order, checking the non-separable square-root term in the SOCP residual at each candidate. The outer loop updates p^new via softmax-weighted empirical means averaged over K random batches, terminating when ‖p^new−p‖₁/n < ε_conv.

Results (Sect. 3): for n ≤ 50, both quantum and classical recover Gurobi's optimum (κ=100%). At n=75, quantum reaches 97.8% vs 92.09% (best classical heuristic, Tabu Search). The largest divergence is at n=100 with M=20: quantum 95.7%, best classical 41.86% — a 54-point gap. Only at n=100 with M=5 does Parallel Tempering edge out quantum (99.74% vs 96.4%). The CDF analysis (Fig. p_dependence) shows that increasing layer depth p from 1 to 4 raises feasible-sample probability from 11.5%±2.8% to 25.4%±13.3%, with the optimal solution itself appearing at probability ~10⁻³ at p=4. The recovery loop (Fig. cdf_recovered) does most of its work in the first iteration and reaches full SOCP-feasibility after 3 loops.

Figures

QAOA ansatz — Figure 1. QAOA-based variational quantum circuit with P layers and three variational parameters θ=(α, β, γ) ∈ ℝ^(P×3). The circuit contains parametrized single-qubit X and two-qubit XX rotations where two-qubit gates are arranged over a line. The phase separation block applies diagonal R_Z(2(α_p v_i − Σ_m β_p^(m) μ_i^(m))) gates, where layers are indexed by p=1,…,P.

Gurobi gap evolution — Figure 2. Gurobi optimality gap evolution over time for different problem instances when Gurobi runtime is capped at 30 minutes. The plot shows gap convergence on a log-log scale; colours represent problem sizes (n) and markers indicate constraint count (M).

Classical solver comparison — Figure 3. Final solution quality comparison across all classical solvers after 1800 s for different problem instances. y-axis: closeness κ to Gurobi's best as a percentage; 100% (green dashed line) indicates matching Gurobi. Tabu Search result is missing for n=100, M=20 because it failed to find any feasible solution.

Layer-depth dependence of feasibility — Figure 4. Empirical CDF with shaded confidence regions for feasible region (L < 0). Inset (log y-axis) zooms into the near-optimal region (−120 ≤ L ≤ −80). Vertical dashed line at L=−114 marks the optimal; dotted at L=0 the feasibility threshold. Increasing p concentrates probability mass on feasible solutions.

CDF evolution iter — Figure 5. Empirical CDF of penalized profit for a 100-item, 5-constraint knapsack from hardware-sampled circuits. Comparison of a Hadamard baseline and increasing fine-tuning iterations following parameter transfer from a 50-item, 5-constraint knapsack. Successive fine-tuning shifts the distribution rightward; this saturates around 500 iterations.

CDF after recovery iterations — Figure 6. Empirical distribution of feasibility-aware profit before and after applying recovery-algorithm iterations. Profit is penalized by the unscaled worst-case SOCP violation. The dominant transformation occurs in the first recovery loop; after three loops, all bitstrings are SOCP-feasible.

Bitstring surface before recovery — Figure 7. SOCP surface plot (before recovery) of bitstrings for a 100-item, 5-constraint instance: profit vs worst SOCP residual; colour bar = observed sampling probability. Blue line = classical-best profit; intersection with r*=0 outlines the SOCP-feasible zone.

Bitstring surface after recovery — Figure 8. SOCP surface plot (after recovery): probability mass coalesces near the intersection of low constraint residuals and the classical-best profit. Nearly all bitstrings become CCKPS-feasible.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Y2 (quasi-binary QAOA / portfolio): Strong method overlap. Both papers attack constrained binary optimization with QAOA, both use CVaR-QAOA, both apply iterative refinement. The differences: Y2's quasi-binary encoding uses ~2log₂(U−L+1) qubits per variable with hard constraint-preserving mixers (no penalty); Bordelon et al. use a one-hot binary encoding with X+XX mixer plus a quadratic feasibility penalty in the loss — they accept some constraint violation and rely on classical recovery. Yuan's mixer is hard-feasibility, theirs is soft.
Y3 (QAOA DGMVP portfolio): Both are end-to-end QAOA pipelines for portfolio-style discrete optimization on real hardware. Y3 found that thermal relaxation precludes quantum advantage and that the favourable scaling is in the shot-noise regime; Bordelon et al. claim quantitative outperformance vs classical heuristics at n≥75 on IBM Heron, but the comparison is against heuristic solvers rather than provably optimal Gurobi for large n. The CCKP setting is novel relative to Y3.
Y4 (Grover + ADMM cardinality-constrained BO): Method-level parallel — both papers use a hybrid quantum-classical decomposition where the quantum subroutine generates samples and a classical post-processing step refines them. Y4 has provable ε-approximation guarantees via ADMM; Bordelon et al.'s recovery loop is heuristic (greedy bit-flip with self-consistent probability updates). The cardinality constraint in Y4 is exact; here CCKP is a probabilistic generalization of the multi-knapsack.
Y6 (PBR test on Heron2): Same hardware family — both papers run on IBM Heron processors. Their utility-scale circuits at n=150, depth 177, 3443 gates push hardware in the same regime Y6 stresses with PBR violation tests across 156 qubits.
Y1 (warm-started iterative QAOA): Adjacent. Both use iterative warm-starting; Y1's measurement-based warm-start improves 3-regular MaxCut while Bordelon et al. transfer optimized parameters from small instances via Hungarian-matched constraint features. Different problem class.
Y5 (GW SDP via Gibbs / Pauli sparsity): Tangential.

Recommended action for Yuan

Cite in next QAOA-portfolio draft. This is currently the strongest 2026 utility-scale benchmark for constrained QAOA; it directly invites comparison with Y2's hard-mixer / quasi-binary approach. Worth a bench paragraph in any Y2/Y3 follow-up.
Read Sect. 2.3.1–2.3.2 carefully. The two-state and full-chain analyses give a clean mechanistic picture of how an X+XX mixer redistributes amplitude under feasibility constraints; that machinery may transfer to Y2's quasi-binary mixer analysis where similar Hamming-weight redistribution arguments apply.
Compare quantitatively against Y3's DGMVP scaling. Bordelon et al. claim 95.7% closeness at n=100 with M=20 — running the same instance set through Y3's QAOA-DGMVP pipeline (or Y2's quasi-binary CVaR-QAOA) would provide a direct head-to-head and could be a quick-turnaround benchmark study.
Possibly contact the IBM Quantum co-authors (Kumar, Mijares Vilarino, Maldonado Romo). If the source code is released as promised, integrating Yuan's hard-mixer baseline into their CCKP pipeline would be a natural collaboration.