CVaR-Assisted Custom Penalty Function for Constrained Optimization

Xin Wei Lee, Hoong Chuin Lau (Singapore Management University) · arXiv:2604.20088 · new submission 2026-04-23 · score 9/10

Abstract

Constrained combinatorial optimization problems are frequently reformulated as quadratic unconstrained binary optimization (QUBO) models in order to leverage emerging quantum optimization algorithms such as the Variational Quantum Eigensolver (VQE) and the Quantum Approximate Optimization Algorithm (QAOA). However, standard QUBO formulations enforce inequality constraints through slack variables and quadratic penalties, which can significantly increase the problem size and distort the optimization landscape. In this work, we propose a slack-free penalty formulation for constrained binary optimization that eliminates auxiliary slack variables and preserves the feasibility structure of the original problem. The proposed approach introduces a nonlinear custom penalty function to enforce inequality constraints directly in the objective function. To address the computational challenges associated with evaluating nonlinear penalties in variational quantum algorithms, we employ the finite-sampling method that avoids the exponential complexity required by exact expectation computation. Furthermore, we integrate the Conditional Value-at-Risk (CVaR) objective to improve optimization robustness and guide the search toward high-quality solutions. The proposed framework is evaluated on instances of the multi-dimensional knapsack problem, a classical benchmark in combinatorial optimization. We showcase that the proposed custom-penalty formulation combined with CVaR sampling achieves improved optimality gaps and more consistent performance compared with conventional slack-based QUBO formulations. The results suggest that careful penalty design can play a critical role in enabling quantum and hybrid quantum–classical algorithms for constrained optimization problems that arise in operations research.

Executive summary

Lee & Lau tackle the same pain point that Y2 and Y4 address — that standard QUBO slack-variable encoding of inequality constraints bloats the qubit count and distorts the landscape. Their alternative keeps the constraint violation as a nonlinear (Heaviside-step) penalty on the original variables and computes its expectation via finite sampling rather than Pauli-level exact diagonalisation, eliminating the O(2t) blow-up that had previously made custom penalties impractical. CVaR post-selection on the sampled bitstrings then sharpens the optimisation landscape so that VQE/QAOA converges to feasible, high-quality solutions. The message for Yuan: you can now get Y2-style “hard-constraint-preserving” behaviour from a vanilla HEA + penalty pipeline — without designing a constraint-preserving mixer — and it is benchmarked on up-to-50-qubit multi-dimensional knapsack.

Main contribution

The authors replace the exact-diagonalisation estimator of a non-linear penalty expectation ⟨ξ(Hj)⟩ (their earlier QCE paper, exponential in the number of variables per constraint) with a plain Monte-Carlo estimator (θ) = (1/M) Σm L(x(m)), justified by Hoeffding's inequality (Eq. 9–11 of the paper). This reduces per-iteration cost to a fixed shot budget M ≥ (R2/2ε2) ln(2/δ) where R is the loss range. They then stack CVaR-α on top, which costs a further factor of α in samples while concentrating the loss on low-energy tail states. The combined framework is validated on 12 OR-Library multi-dimensional knapsack instances (10–50 variables, 2–10 constraints) with a single-layer RY–CZ hardware-efficient ansatz and POWELL optimiser.

Key algorithmic ingredients

Detailed walkthrough

Section II.A formalises the problem: given minx f(x) subject to equality (gi=0) and inequality (hj≤0) constraints on x∈{0,1}n, the slack QUBO is Lslack = f + Σλ0gi2 + Σλ1(hj + Σ 2l yl)2. Two problems are highlighted: (i) feasibility is ambiguous for slack values that happen to zero the penalty even when x is infeasible, and (ii) binary encoding of slack requires ⌈log2 sj⌉ extra qubits per constraint. Figure 1 (num-qubits-comparison) quantifies this — up to ~80 extra qubits on their 50-variable MDKP instance (pet7), which is a significant hit against current 100-qubit devices.

Section II.B is the crux. Given the custom-penalty Hamiltonian H = Hf + Σλj ξ(Hj), linearity lets you decompose ⟨H⟩ = ⟨Hf⟩ + Σλj ⟨ξ(Hj)⟩. ⟨Hf⟩ decomposes into Pauli-Z expectations, but the non-linear ξ(Hj) term does not — the authors' prior work (“step-QCE”) required populating all 2t eigenvalues of the t-variable constraint Hamiltonian. The finite-sampling estimator sidesteps this entirely: sample the state, evaluate the full (nonlinear) loss on each bitstring, average. The price is a statistical error ε governed by Hoeffding's inequality (Eq. 9), with shot budget scaling as R22. The authors explicitly compute R = Ctrue + 2dΣvi for MDKP and flag coefficient rescaling as an open knob.

Section II.C introduces CVaR-α: after sampling M shots, sort bitstrings by loss and average only the bottom ⌈αM⌉. The authors derive that CVaR needs only α·MFS samples to match the FS estimator — a 10× shot saving at α=0.1 — plus an O(kn log n) sort overhead per evaluation.

Section IV (Analysis on Sampling Cost) works through the Hoeffding bookkeeping. Notably, at α=0.1 and a sufficiently converged state, the probability of sampling the quasi-optimum x* is pα = min(px*/α, 1); so CVaR already “passes” once px* > α. This is precisely the scaling argument Y2 and Y3 engaged with for portfolio QAOA, and it explains why Y3's DGMVP analysis consistently found CVaR more shot-efficient.

Section V (Results). Figure 2 (slack-vs-noslack) compares mean optimality gap Δ across all 12 instances. With finite sampling alone the custom penalty wins on 7/12 instances; with CVaR it wins on all 12, and every CVaR-custom run reaches feasibility (pb4 fails to find a feasible point at all under the slack formulation). Figure 3 (mdkp-optgap) plots box plots over 20 random initialisations; CVaR medians are consistently <0.1, and for the 50-qubit pet7 instance the median optimality gap reaches 7.6×10−3. Figure 4 (mdkp-opt-prob) makes the “curse of dimensionality” point: FS quasi-optimum probability px* decays with qubit count, but CVaR stabilises at ∼0.1 (exactly α), because once the quasi-optimum occupies α of the distribution CVaR saturates. Figure 5 (mdkp-nfev) shows CVaR needs fewer function evaluations to converge.

Implication: on a size-for-size comparison, a single-layer HEA + custom-penalty + CVaR beats the slack-QUBO + QAOA/VQE baseline established by the Monit benchmark. This is exactly the regime where Y2's quasi-binary encoding + hard mixer excels, but with a different tradeoff — no mixer design is needed, at the cost of amplifying the penalty range R.

Figures

Figure 1
Figure 1. Comparison between the number of qubits required for the custom penalty and the common slack variable formulation, across the multi-dimensional knapsack instances used in this work.
Figure 2
Figure 2. Mean optimality gaps: custom penalty (solid) vs slack (dashed), with finite sampling (blue) and CVaR (red), averaged over 20 random initialisations per instance.
Figure 3
Figure 3. Box plots of optimality gap per instance comparing finite sampling (blue) and CVaR (orange) under the custom-penalty formulation.
Figure 4
Figure 4. Probability of sampling the quasi-optimum x* for FS (turquoise) vs CVaR (pink); FS decays with qubits while CVaR stabilises near α=0.1.
Figure 5
Figure 5. Number of function evaluations to convergence for FS (red) vs CVaR (green); CVaR typically converges faster.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan