← Back to digest

A Quantum Approach to Stochastic Optimization in Insurance Underwriting

Bordelon, Garfinkel, Dixit, Whitehead, Holzbauer, Mijares Vilarino, Maldonado Romo, Mitra, Kumar, Utke (Allstate / IBM Quantum) · arXiv:2605.01169 · submitted 2026-05-05 (announce cycle 2026-05-05) · 40 pages · score 9/10

Abstract

The presence of stochastic elements in combinatorial optimization problems makes them particularly challenging, as such problems quickly become intractable for classical computers even at relatively small sizes. In this work, we propose a novel quantum-classical hybrid scheme for solving a class of stochastic optimization problems known as chance-constrained knapsack problems, in which item weights follow probability distributions and constraints may be violated within a specified risk tolerance. Our method employs knapsack-specific QAOA-based circuits to generate samples which, when combined with a self-consistent classical recovery scheme introduced in this work, produce high-quality solutions. Experiments carried out on IBM Heron processors, using circuits with depths up to 177 and comprising 3443 gates acting on as many as 150 qubits, yield solutions that indicate improvement over classical optimization schemes. The proposed quantum-classical scheme paves the way to tackling such problems, with the potential to outperform approaches that rely solely on classical computation.

Executive summary

This is a 150-qubit utility-scale demonstration on IBM's Heron processors of QAOA-style sampling for a constrained binary optimization problem with stochastic constraints — chance-constrained knapsack (CCKP). The pipeline combines a knapsack-specific QAOA ansatz inspired by the Lagrangian-Dual Discretized Adiabatic scheme, CVaR-based variational training, parameter transfer from small surrogate problems, and a novel self-consistent classical recovery loop. At n=150 with 5 constraints, the quantum scheme reaches 93.1% of Gurobi's best while the strongest classical heuristic only manages 55.8%; at n=100 with 20 constraints the gap is 95.7% vs 41.86%. For Yuan, this is the closest May-cycle paper to Y2/Y3/Y4: it does end-to-end portfolio-style binary optimization with constraint-handling QAOA on the same superconducting hardware family used in Y6, with hybrid quantum-classical recovery analogous to Y4's ADMM workflow.

Main contribution

The paper extends QAOA-based combinatorial optimization to stochastic constraints. Theorem 1 reformulates the probabilistic feasibility condition into a deterministic second-order cone constraint under closed-under-convolution distributions. The quantum circuit is a Trotterized LD-DAQC ansatz with both X and XX mixer terms applied along a linear chain (well-matched to IBM heavy-hex). Three innovations: (i) a CVaR upper-tail loss with quadratic feasibility penalty λ; (ii) a Hungarian-algorithm parameter-transfer scheme that aligns constraints by a 4-d feature vector (mean weight, weight density, mean σ, capacity-utilization ratio) so trained parameters from a small instance warm-start a large one; (iii) a self-consistent recovery loop with two-phase per-sample bit-flip repair (feasibility restoration via "relief" maximization, then greedy improvement) and softmax-weighted probability vector updates batched over K mini-batches.

Key theorems and algorithms

Detailed walkthrough

The chance-constrained knapsack maximizes Σ vi xi subject to Pi(m) xi ≤ C(m)) ≥ 1−ε. Theorem 1 turns this into a tractable SOCP for stable distributions (Normal, Cauchy, Lévy); the paper restricts to Normal weights. The classical benchmarking phase (Sect. 2.2) sweeps 3,900 instances over n∈{20…150}, M∈{1…100}, μmax∈{5…100}, σ²∈{0.5…3}, and constraint density / capacity ratios — fixing vmax=50, μmax=5, rC=0.25, rD=0.5 as the hardest configuration. Gurobi proves optimality within 180 s for 100% of n=20 instances but only 36% of n=150 instances; for the largest instances Gurobi runs were extended up to 30 hours to obtain the "best" reference.

The variational circuit (Sect. 2.3) lays Z-rotations weighted by αp vi − Σ βp(m) μi(m) on each qubit, followed by alternating X and XX rotations along a linear chain. The two-state toy analysis (Sect. 2.3.1) derives three operating regimes parameterized by the marginal residual : when capacity is loose, the optimizer can saturate the bound at sin(2γ) sin(δ)=1; when tight, capacity throttles transfer; when violated, the optimizer is forced to reverse amplitude flow toward the lighter item. Section 2.3.2 generalizes this to the full chain and shows that |ωT|² is boosted by a sum over boundary edges of "reduced-cost" terms sin(2(φj−φj′)) — a clean mechanistic explanation of why this ansatz concentrates amplitude on near-feasible high-value subsets.

The variational training uses CVaR with the upper-tail (most adverse outcomes), λ set to Σ vi as a worst-case penalty bound, and COBYLA as the classical optimizer. To address barren plateaus (cited Larocca 2025, McClean 2018), training is restricted to small instances and parameters are warm-transferred to large instances via the Hungarian-aligned scheme. MPS simulation with bond dimension 512 covers up to n=150 before becoming prohibitive.

Hardware experiments use ibm_fez and ibm_pittsburgh (Heron processors) with 2¹⁵ shots, transpile optimization level 3, dynamical decoupling with the XX sequence. The headline circuit at n=150 has p=7 layers, total depth 177, 161 two-qubit depth, 3443 gates.

Section 2.4 introduces the recovery loop, drawing inspiration from Robledo-Moreno 2025's quantum-chemistry configuration-recovery scheme. Phase 1 removes items via a "relief" score that combines per-unit-profit relief with a low-frequency-item bonus (η weighted by 1−pi); Phase 2 greedily adds back items in decreasing vi order, checking the non-separable square-root term in the SOCP residual at each candidate. The outer loop updates pnew via softmax-weighted empirical means averaged over K random batches, terminating when ‖pnewp‖₁/n < εconv.

Results (Sect. 3): for n ≤ 50, both quantum and classical recover Gurobi's optimum (κ=100%). At n=75, quantum reaches 97.8% vs 92.09% (best classical heuristic, Tabu Search). The largest divergence is at n=100 with M=20: quantum 95.7%, best classical 41.86% — a 54-point gap. Only at n=100 with M=5 does Parallel Tempering edge out quantum (99.74% vs 96.4%). The CDF analysis (Fig. p_dependence) shows that increasing layer depth p from 1 to 4 raises feasible-sample probability from 11.5%±2.8% to 25.4%±13.3%, with the optimal solution itself appearing at probability ~10⁻³ at p=4. The recovery loop (Fig. cdf_recovered) does most of its work in the first iteration and reaches full SOCP-feasibility after 3 loops.

Figures

QAOA ansatz
Figure 1. QAOA-based variational quantum circuit with P layers and three variational parameters θ=(α, β, γ) ∈ ℝ^(P×3). The circuit contains parametrized single-qubit X and two-qubit XX rotations where two-qubit gates are arranged over a line. The phase separation block applies diagonal R_Z(2(α_p v_i − Σ_m β_p^(m) μ_i^(m))) gates, where layers are indexed by p=1,…,P.
Gurobi gap evolution
Figure 2. Gurobi optimality gap evolution over time for different problem instances when Gurobi runtime is capped at 30 minutes. The plot shows gap convergence on a log-log scale; colours represent problem sizes (n) and markers indicate constraint count (M).
Classical solver comparison
Figure 3. Final solution quality comparison across all classical solvers after 1800 s for different problem instances. y-axis: closeness κ to Gurobi's best as a percentage; 100% (green dashed line) indicates matching Gurobi. Tabu Search result is missing for n=100, M=20 because it failed to find any feasible solution.
Layer-depth dependence of feasibility
Figure 4. Empirical CDF with shaded confidence regions for feasible region (L < 0). Inset (log y-axis) zooms into the near-optimal region (−120 ≤ L ≤ −80). Vertical dashed line at L=−114 marks the optimal; dotted at L=0 the feasibility threshold. Increasing p concentrates probability mass on feasible solutions.
CDF evolution iter
Figure 5. Empirical CDF of penalized profit for a 100-item, 5-constraint knapsack from hardware-sampled circuits. Comparison of a Hadamard baseline and increasing fine-tuning iterations following parameter transfer from a 50-item, 5-constraint knapsack. Successive fine-tuning shifts the distribution rightward; this saturates around 500 iterations.
CDF after recovery iterations
Figure 6. Empirical distribution of feasibility-aware profit before and after applying recovery-algorithm iterations. Profit is penalized by the unscaled worst-case SOCP violation. The dominant transformation occurs in the first recovery loop; after three loops, all bitstrings are SOCP-feasible.
Bitstring surface before recovery
Figure 7. SOCP surface plot (before recovery) of bitstrings for a 100-item, 5-constraint instance: profit vs worst SOCP residual; colour bar = observed sampling probability. Blue line = classical-best profit; intersection with r*=0 outlines the SOCP-feasible zone.
Bitstring surface after recovery
Figure 8. SOCP surface plot (after recovery): probability mass coalesces near the intersection of low constraint residuals and the classical-best profit. Nearly all bitstrings become CCKPS-feasible.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan