Quantum Search without Global Diffusion

John Burke, Ciaran Mc Goldrick (Trinity College Dublin) · arXiv:2604.15435 · submitted 2026-04-16 · score 8/10 (HIGH)

Abstract

Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log₂(log₂ N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls.

Executive summary

Burke and Mc Goldrick prove that the global diffusion operator in Grover-style amplitude amplification can be replaced entirely by local reflections acting on a tensor-product partition of the register, provided the prepared initial state and the target state both factor across that partition. The oracle remains the only global operator, but each non-oracle step touches at most n_i qubits, slashing transpilation depth. The construction yields an exact closed-form success probability with oracle overhead O(1 / Π cos θ_i), which collapses to 1 once each partition has ≥ log₂(log₂ N) qubits — i.e. the asymptotic O(√N) is preserved. For Yuan, who is actively designing structured-feasible-space Grover algorithms for cardinality-constrained binary optimisation (Y4), this paper offers a directly usable depth-reduction primitive for the inner Grover oracle.

Main contribution

The paper introduces a recursive reflection W_i = (S_ψi W_i−1)^t_i S_ψi (W_i−1 S_ψi)^t_i with W₀ = S_x (the oracle), where S_ψi is the partial diffusion operator on register i. The non-trivial observation is that although the eigenspaces of W_i grow exponentially in dimension (2ⁱ⁻¹), the principal angles between W_i and S_ψi collapse to two values {γ_i, π/2 − γ_i} where sin(2γ_i) = sin(2θ_i) sin(2 t_i−1 γ_i−1). This degeneracy reduces the high-dimensional rotation to a scalar recursion and yields the closed-form success probability ‖P_{x_m} (S_{ψ_m} W_m−1)^t_m |ψ⟩‖² = ½[1 − cos(2θ_m)/cos(2γ_m) · cos(2(2t_m+1)γ_m)]. The intermediate iterate counts t_i = 1 (i < m) are proven oracle-optimal. After amplifying register m, lower registers are measured, the search is repeated on the shrinking subspace, and the total cost forms a geometric series dominated by the first round.

Key theorems / lemmas / algorithms

Angle-degeneracy theorem (Sec. Algorithm/proof): for all i, the principal angles between W_i−1 and S_ψi reduce to exactly two values {γ_i, π/2 − γ_i} satisfying the scalar recursion sin(2γ_i) = sin(2θ_i) sin(2 t_i−1 γ_i−1), γ₁ = θ₁.
Closed-form stage success (Eq. 6 in §Algorithm/overview): after t_m outer iterations, ‖P_{x_m} (S_{ψ_m} W_m−1)^t_m |ψ⟩‖² = ½[1 − cos(2θ_m)/cos(2γ_m) · cos(2(2t_m+1)γ_m)], with optimal iterate t_m* = ⌊π/(4γ_m) − ½⌉.
Per-stage oracle complexity (Eq. 16 in §Algorithm/final_complexity): C ≤ C_QAA / Π_i cos(θ_i), so the new scheme inflates the standard amplitude-amplification cost only by the product of local-overlap cosines.
Total oracle complexity (Eq. 18): dominated by the first round; the rest forms a geometric series with ratio 2^1−s/2 for equal stage size s.
Asymptotic optimality for unstructured search: for partition sizes s ≥ log₂(log₂ N), both the overhead factor and the geometric tail converge to 1, recovering Grover's optimal O(√N) query complexity.
Optimal iterate ordering: t_i = 1 for all intermediate i < m is oracle-optimal; doubling any earlier iterate yields at most the same improvement in γ as doubling t_m directly.

Detailed walkthrough

Setup (§Algorithm/overview). The n-qubit search register is split into m blocks of sizes n₁, …, n_m, and the authors assume both the state preparation A and the target |x⟩ factor as A = ⊗ A_i, |x⟩ = ⊗ |x_i⟩. The local partial diffusion S_ψi = A_i(𝟙 − 2|0⟩⟨0|)A_i† acts only on register i (and trivially elsewhere). The recursive operator W_i is built from S_ψi and W_i−1; the oracle S_x = W₀ is the only operator that couples all qubits. Each W_i is itself a reflection.

Why it works (§Algorithm/proof). The key insight is that despite W_i living on a 2ⁱ-dimensional subspace, the principal angles between the eigenspaces of two successive reflections collapse to two values, governed by a single recursively defined angle γ_i. This is the “scalar reduction” the authors highlight in the abstract: a high-dimensional rotation behaves like a 2D rotation parameterised by γ_i. The angle recursion sin(2γ_i) = sin(2θ_i) sin(2 t_i−1 γ_i−1) with γ₁ = θ₁ closes the analysis.

Complexity (§Algorithm/final_complexity). Setting t_i = 1 for all i < m (proven optimal), the authors obtain sin(2γ_m) = 2^m sin(θ) Π_i cos(θ_i), with sin θ = Π sin(θ_i) the global overlap. Convexity of sin(x)/x then gives γ_m ≥ 2^m−1 θ Π cos(θ_i), hence t_m* ≤ π / (2^m+1 θ Π cos(θ_i)) and a single-round cost C = 2^m−1 t_m* ≤ C_QAA / Π cos(θ_i). The whole algorithm then loops over a sequence of shrinking sub-searches; for equal partition size s the tail is a geometric series with ratio 2^1−s/2, so for s ≥ 6 it sums to under 1.15× the leading term.

Unstructured-search instantiation (§Algorithm/grover_search). For uniform-superposition initial state and a single bitstring target, θ_i = arcsin(2^−s/2) and the overhead 1 / Π cos(θ_i) is bounded by exp(log₂ N / (s · 2^s+1)). For s ≥ log₂ log₂ N this exponent goes to zero, so the algorithm matches Grover's O(√N) query count asymptotically while every non-oracle gate touches at most s qubits — log of the global register size. This matches the asymptotic depth scaling earlier work hinted at (Grover-trade-offs, hardwareGrover2), but here it is derived from a single closed-form recursion rather than from a chain of partial-search reductions.

Empirical verification (§Evaluation, 18 qubits). The authors implement the recursive algorithm in Qiskit for n = 18 and search for the all-ones target. Two configurations are tested: 3-stage [6,6,6] with iterate counts (102, 25, 6) and 2-stage [9,9] with (201, 17). Over 1000 shots the empirical success rates (96.5%, 99.5%) agree to within statistical fluctuation with the theory (96.7%, 99.8%). Importantly, when the algorithm fails it tends to fail partially: in the 3-stage configuration 57% of failed shots still had the correct first-stage substring. This sequential structure means that for partial-target retrieval the effective success rate climbs (98.4% if only the first 6 bits are needed).

Depth comparison. The diffusion-operator circuit depth is compared under three MCX-decomposition scenarios (S1: no ancilla, S2: dirty V-chain via idle inter-partition qubits, S3: one clean ancilla). Under S1, the 3-stage / 2-stage configurations reduce diffusion depth by 95% / 81% over Grover, at a cost of 30% / 9% more oracle calls. Under S2 these climb to 97% / 96%. Crucially, the recursive scheme stays shallower overall whenever the oracle depth does not exceed roughly 3–11× the diffusion-operator depth, with the 2-stage variant tolerating the heaviest oracles.

Scaling (§Evaluation/Scaling). Extrapolating to n = 50 with k = 2, 3, 4 stages, failure probability drops below 10⁻⁴, 10⁻⁴, 10⁻³ respectively, with oracle overhead under 1%, 1%, 5%. For oracle depth comparable to diffusion depth, the 4-stage configuration achieves ~35% total-depth reduction at n = 50; even at 10× oracle depth the recursive scheme still saves about 5%. As a concrete example the authors apply the model to AES-128 (oracle ~10× diffusion depth), getting an estimated 5% time/cost reduction over a standard Grover key-search implementation.

Figures

Figure 1. Simulation and depth comparison of the recursive search scheme against standard Grover search on an 18-qubit single-target problem. Top row: measurement outcome distributions for the 3-stage [6,6,6] (left) and 2-stage [9,9] (right) configurations over 1000 shots. Bottom left: diffusion-operator circuit depth (log scale) for both configurations under three MCX-decomposition scenarios (S1 no ancilla, S2 dirty V-chain, S3 one clean ancilla); percentages are recursive depth relative to Grover. Bottom right: total oracle-call counts, with percentages showing recursive calls relative to a Grover baseline matched to the same success probability.

Figure 2. Scaling behaviour of the recursive search scheme for k = 2, 3, 4 stages across n = 20–50 qubits with a single-target unstructured search. Left: failure probability 1 − p (log scale). Right: relative oracle overhead, defined as the fractional increase in oracle calls over standard Grover search matched to the same success probability.

Figure 3. Depth scaling of the recursive scheme relative to standard Grover for k = 2, 3, 4 stages across n = 20–50. Top row: break-even oracle-to-diffusion depth ratio under S1, S2, S3. Bottom row: total circuit depth of the recursive scheme relative to Grover, for fixed oracle-to-diffusion ratios 1×, 5×, 10×. Dashed red line marks parity; values below indicate a depth advantage.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography. The bib (citations.bib, 50 KB) was scanned for the six arXiv IDs and for an author+title fingerprint; nothing matched. The paper's prior-work focus is on partial-search variants and hardware-Grover trade-offs, not on portfolio-optimisation Grover (Y4) or warm-started QAOA (Y1).

Overlap with Y1–Y6

Y4 (Grover-based cardinality-constrained binary optimisation): directly relevant — Y4 builds a Grover-style algorithm with O(√(C(n,k)/M)) rotations on a structured feasible space of fixed-Hamming-weight strings. Burke–Mc Goldrick's recursive partial-diffusion scheme requires only that the initial and target states factor across some partition. Y4's symmetric-superposition Dicke-state initial state does not naturally factor over qubit-blocks (it is permutation-symmetric across the whole register), so the construction would need adaptation to fixed-cardinality blocks. But the idea that the diffusion can be local while preserving √-speedup is highly relevant to the Grover-circuit-depth bottleneck Y4 highlights as the limiting factor on near-term implementations.
Y1 (iterative warm-started QAOA): indirect — the paper's "measure inner registers, restart on the residue" pattern is structurally similar to Y1's iterative warm-starting, except that here the residue is a smaller register rather than an updated parameter set.
Y3 (QAOA-DGMVP portfolio): peripheral — both papers focus on near-term-hardware depth/oracle-cost trade-offs, but the methods are otherwise orthogonal.
Y2, Y5, Y6: no meaningful overlap.

Recommended action for Yuan

Implement for comparison on the Y4 oracle: the most useful action is to check whether Y4's fixed-cardinality Grover oracle can be cast into a form where the initial state and target state both factor over some partition of the qubit register. If so, the recursive-local-diffusion variant from §Algorithm/grover_search could replace the global Dicke-state diffusion at near-zero asymptotic cost, with potentially significant depth savings (the bottleneck Y4 already identifies). Worth one afternoon's prototype.
Cite in the next Y4 revision (or the follow-up paper): when discussing "extensions to reduce non-oracle depth on the cardinality-constrained search", this paper is the clearest currently-published reference point.
Read the proof of the angle-degeneracy theorem (§Algorithm/proof): the scalar reduction of an exponentially growing eigenspace is the kind of structural trick that recurs in Grover-style analyses (including Y4's). The proof technique itself is worth absorbing in case it applies to multi-marked or weighted-search regimes Yuan may encounter.