CVaR-Assisted Custom Penalty Function for Constrained Optimization

Xin Wei Lee, Hoong Chuin Lau (Singapore Management University) · arXiv:2604.20088 · new submission 2026-04-23 · score 9/10

Abstract

Constrained combinatorial optimization problems are frequently reformulated as quadratic unconstrained binary optimization (QUBO) models in order to leverage emerging quantum optimization algorithms such as the Variational Quantum Eigensolver (VQE) and the Quantum Approximate Optimization Algorithm (QAOA). However, standard QUBO formulations enforce inequality constraints through slack variables and quadratic penalties, which can significantly increase the problem size and distort the optimization landscape. In this work, we propose a slack-free penalty formulation for constrained binary optimization that eliminates auxiliary slack variables and preserves the feasibility structure of the original problem. The proposed approach introduces a nonlinear custom penalty function to enforce inequality constraints directly in the objective function. To address the computational challenges associated with evaluating nonlinear penalties in variational quantum algorithms, we employ the finite-sampling method that avoids the exponential complexity required by exact expectation computation. Furthermore, we integrate the Conditional Value-at-Risk (CVaR) objective to improve optimization robustness and guide the search toward high-quality solutions. The proposed framework is evaluated on instances of the multi-dimensional knapsack problem, a classical benchmark in combinatorial optimization. We showcase that the proposed custom-penalty formulation combined with CVaR sampling achieves improved optimality gaps and more consistent performance compared with conventional slack-based QUBO formulations. The results suggest that careful penalty design can play a critical role in enabling quantum and hybrid quantum–classical algorithms for constrained optimization problems that arise in operations research.

Executive summary

Lee & Lau tackle the same pain point that Y2 and Y4 address — that standard QUBO slack-variable encoding of inequality constraints bloats the qubit count and distorts the landscape. Their alternative keeps the constraint violation as a nonlinear (Heaviside-step) penalty on the original variables and computes its expectation via finite sampling rather than Pauli-level exact diagonalisation, eliminating the O(2^t) blow-up that had previously made custom penalties impractical. CVaR post-selection on the sampled bitstrings then sharpens the optimisation landscape so that VQE/QAOA converges to feasible, high-quality solutions. The message for Yuan: you can now get Y2-style “hard-constraint-preserving” behaviour from a vanilla HEA + penalty pipeline — without designing a constraint-preserving mixer — and it is benchmarked on up-to-50-qubit multi-dimensional knapsack.

Main contribution

The authors replace the exact-diagonalisation estimator of a non-linear penalty expectation ⟨ξ(H_j)⟩ (their earlier QCE paper, exponential in the number of variables per constraint) with a plain Monte-Carlo estimator Ê(θ) = (1/M) Σ_m L(x^(m)), justified by Hoeffding's inequality (Eq. 9–11 of the paper). This reduces per-iteration cost to a fixed shot budget M ≥ (R²/2ε²) ln(2/δ) where R is the loss range. They then stack CVaR-α on top, which costs a further factor of α in samples while concentrating the loss on low-energy tail states. The combined framework is validated on 12 OR-Library multi-dimensional knapsack instances (10–50 variables, 2–10 constraints) with a single-layer R_Y–CZ hardware-efficient ansatz and POWELL optimiser.

Key algorithmic ingredients

Custom penalty loss: L_custom(x) = f(x) + Σ_j λ_j ξ[h_j(x)], with ξ chosen as Heaviside step Θ(·) (Eq. 4–5). Separates feasible from infeasible without slack variables.
Finite-sampling expectation (Eq. 8): Ê(θ) = (1/M)Σ L(x^(m)), no dependence on exponential eigenvalue structure.
Hoeffding shot budget (Eq. 10–11): M ≥ R²/(2ε²) · ln(2/δ); motivates coefficient rescaling to keep R small.
CVaR estimator (Eq. 12): CVaR(α;θ) = (1/⌈αM⌉) Σ_m=1^⌈αM⌉ L(x^(m)), with α=0.1 in experiments.
Penalty factor upper bound: λ = 2λ_UB with λ_UB = Σ v_i, ensuring any single constraint violation dominates the objective gain.
Ansatz: |ψ(θ)⟩ = ⊗_i R_Y(θ_n+i) · (linear CZ chain) · ⊗_i R_Y(θ_i); single-layer only, on the argument that deeper HEAs introduce spurious minima.

Detailed walkthrough

Section II.A formalises the problem: given min_x f(x) subject to equality (g_i=0) and inequality (h_j≤0) constraints on x∈{0,1}ⁿ, the slack QUBO is L_slack = f + Σλ₀g_i² + Σλ₁(h_j + Σ 2^l y_l)². Two problems are highlighted: (i) feasibility is ambiguous for slack values that happen to zero the penalty even when x is infeasible, and (ii) binary encoding of slack requires ⌈log₂ s_j⌉ extra qubits per constraint. Figure 1 (num-qubits-comparison) quantifies this — up to ~80 extra qubits on their 50-variable MDKP instance (pet7), which is a significant hit against current 100-qubit devices.

Section II.B is the crux. Given the custom-penalty Hamiltonian H = H_f + Σλ_j ξ(H_j), linearity lets you decompose ⟨H⟩ = ⟨H_f⟩ + Σλ_j ⟨ξ(H_j)⟩. ⟨H_f⟩ decomposes into Pauli-Z expectations, but the non-linear ξ(H_j) term does not — the authors' prior work (“step-QCE”) required populating all 2^t eigenvalues of the t-variable constraint Hamiltonian. The finite-sampling estimator sidesteps this entirely: sample the state, evaluate the full (nonlinear) loss on each bitstring, average. The price is a statistical error ε governed by Hoeffding's inequality (Eq. 9), with shot budget scaling as R²/ε². The authors explicitly compute R = C_true + 2dΣv_i for MDKP and flag coefficient rescaling as an open knob.

Section II.C introduces CVaR-α: after sampling M shots, sort bitstrings by loss and average only the bottom ⌈αM⌉. The authors derive that CVaR needs only α·M_FS samples to match the FS estimator — a 10× shot saving at α=0.1 — plus an O(kn log n) sort overhead per evaluation.

Section IV (Analysis on Sampling Cost) works through the Hoeffding bookkeeping. Notably, at α=0.1 and a sufficiently converged state, the probability of sampling the quasi-optimum x* is p_α = min(p_x*/α, 1); so CVaR already “passes” once p_x* > α. This is precisely the scaling argument Y2 and Y3 engaged with for portfolio QAOA, and it explains why Y3's DGMVP analysis consistently found CVaR more shot-efficient.

Section V (Results). Figure 2 (slack-vs-noslack) compares mean optimality gap Δ across all 12 instances. With finite sampling alone the custom penalty wins on 7/12 instances; with CVaR it wins on all 12, and every CVaR-custom run reaches feasibility (pb4 fails to find a feasible point at all under the slack formulation). Figure 3 (mdkp-optgap) plots box plots over 20 random initialisations; CVaR medians are consistently <0.1, and for the 50-qubit pet7 instance the median optimality gap reaches 7.6×10⁻³. Figure 4 (mdkp-opt-prob) makes the “curse of dimensionality” point: FS quasi-optimum probability p_x* decays with qubit count, but CVaR stabilises at ∼0.1 (exactly α), because once the quasi-optimum occupies α of the distribution CVaR saturates. Figure 5 (mdkp-nfev) shows CVaR needs fewer function evaluations to converge.

Implication: on a size-for-size comparison, a single-layer HEA + custom-penalty + CVaR beats the slack-QUBO + QAOA/VQE baseline established by the Monit benchmark. This is exactly the regime where Y2's quasi-binary encoding + hard mixer excels, but with a different tradeoff — no mixer design is needed, at the cost of amplifying the penalty range R.

Figures

Figure 1. Comparison between the number of qubits required for the custom penalty and the common slack variable formulation, across the multi-dimensional knapsack instances used in this work.

Figure 2. Mean optimality gaps: custom penalty (solid) vs slack (dashed), with finite sampling (blue) and CVaR (red), averaged over 20 random initialisations per instance.

Figure 3. Box plots of optimality gap per instance comparing finite sampling (blue) and CVaR (orange) under the custom-penalty formulation.

Figure 4. Probability of sampling the quasi-optimum x* for FS (turquoise) vs CVaR (pink); FS decays with qubits while CVaR stabilises near α=0.1.

Figure 5. Number of function evaluations to convergence for FS (red) vs CVaR (green); CVaR typically converges faster.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Y2 (quasi-binary / hard-mixer / CVaR-QAOA for portfolio) — Strong methodological parallel. Y2 achieves feasibility by encoding + mixer design that traps the state inside the feasible subspace; this paper achieves a comparable effect by using a vanilla HEA plus a nonlinear step penalty evaluated via Monte-Carlo sampling. Both use CVaR for the same reason (noise-resilient, convergence-sharpening). A direct comparison of the two feasibility-enforcement philosophies on the same MDKP instances would be a clean paper.
Y4 (Grover + ADMM cardinality-constrained BO) — Same problem class (constrained binary optimisation), different algorithmic primitive. Y4's ADMM with ε-approximation guarantees and O(√(C(n,k)/M)) rotations sits at the opposite end — exponential rigour, no variational handholding. This paper's finite-sampling trick does not guarantee convergence to the optimum but makes the variational route competitive on meaningful instance sizes.
Y3 (QAOA DGMVP portfolio, layerwise + dual annealing) — Both papers emphasise that careful optimisation-landscape design is the main lever for NISQ-era utility. This paper's single-layer HEA + CVaR is in the same design philosophy as Y3's layerwise scheduling: keep the classical optimisation tractable, let sampling structure do the work.
Y1 (warm-started QAOA) — Only loose overlap. This paper uses random initialisation; warm-starting the HEA parameters from the CVaR-optimised landscape of a smaller MDKP instance would be a natural extension.

Recommended action for Yuan

Read in detail, then cite in the next portfolio/QAOA draft. This paper is the clearest recent articulation of “why keep slack variables if sampling gives you the penalty for free” and directly adjoins Y2's feasibility-preserving framework.
Benchmark against Y2: run the authors' MDKP-pet7 instance (50 qubits) through the Y2 quasi-binary + hard-mixer pipeline and compare sample efficiency, penalty-factor sensitivity, and wall-clock on identical hardware. A head-to-head would be a strong section in a follow-up Y2 paper.
Consider a joint warm-start extension: use Y1's measurement-based iteration to warm-start the single-layer HEA parameters of this method — might close the remaining FS/CVaR gap on the larger instances (pet5, pet7).