Quantum Search without Global Diffusion

John Burke, Ciaran Mc Goldrick (Trinity College Dublin) · arXiv:2604.15435 · submitted 2026-04-16 · score 8/10 (HIGH)

← Back to digest

Abstract

Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log2(log2 N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls.

Executive summary

Burke and Mc Goldrick prove that the global diffusion operator in Grover-style amplitude amplification can be replaced entirely by local reflections acting on a tensor-product partition of the register, provided the prepared initial state and the target state both factor across that partition. The oracle remains the only global operator, but each non-oracle step touches at most ni qubits, slashing transpilation depth. The construction yields an exact closed-form success probability with oracle overhead O(1 / Π cos θi), which collapses to 1 once each partition has ≥ log2(log2 N) qubits — i.e. the asymptotic O(√N) is preserved. For Yuan, who is actively designing structured-feasible-space Grover algorithms for cardinality-constrained binary optimisation (Y4), this paper offers a directly usable depth-reduction primitive for the inner Grover oracle.

Main contribution

The paper introduces a recursive reflection Wi = (Sψi Wi−1)ti Sψi (Wi−1 Sψi)ti with W0 = Sx (the oracle), where Sψi is the partial diffusion operator on register i. The non-trivial observation is that although the eigenspaces of Wi grow exponentially in dimension (2i−1), the principal angles between Wi and Sψi collapse to two values {γi, π/2 − γi} where sin(2γi) = sin(2θi) sin(2 ti−1 γi−1). This degeneracy reduces the high-dimensional rotation to a scalar recursion and yields the closed-form success probability ‖Pxm (Sψm Wm−1)tm |ψ⟩‖² = ½[1 − cos(2θm)/cos(2γm) · cos(2(2tm+1)γm)]. The intermediate iterate counts ti = 1 (i < m) are proven oracle-optimal. After amplifying register m, lower registers are measured, the search is repeated on the shrinking subspace, and the total cost forms a geometric series dominated by the first round.

Key theorems / lemmas / algorithms

Detailed walkthrough

Setup (§Algorithm/overview). The n-qubit search register is split into m blocks of sizes n1, …, nm, and the authors assume both the state preparation A and the target |x⟩ factor as A = ⊗ Ai, |x⟩ = ⊗ |xi⟩. The local partial diffusion Sψi = Ai(𝟙 − 2|0⟩⟨0|)Ai† acts only on register i (and trivially elsewhere). The recursive operator Wi is built from Sψi and Wi−1; the oracle Sx = W0 is the only operator that couples all qubits. Each Wi is itself a reflection.

Why it works (§Algorithm/proof). The key insight is that despite Wi living on a 2i-dimensional subspace, the principal angles between the eigenspaces of two successive reflections collapse to two values, governed by a single recursively defined angle γi. This is the “scalar reduction” the authors highlight in the abstract: a high-dimensional rotation behaves like a 2D rotation parameterised by γi. The angle recursion sin(2γi) = sin(2θi) sin(2 ti−1 γi−1) with γ1 = θ1 closes the analysis.

Complexity (§Algorithm/final_complexity). Setting ti = 1 for all i < m (proven optimal), the authors obtain sin(2γm) = 2m sin(θ) Πi cos(θi), with sin θ = Π sin(θi) the global overlap. Convexity of sin(x)/x then gives γm ≥ 2m−1 θ Π cos(θi), hence tm* ≤ π / (2m+1 θ Π cos(θi)) and a single-round cost C = 2m−1 tm* ≤ CQAA / Π cos(θi). The whole algorithm then loops over a sequence of shrinking sub-searches; for equal partition size s the tail is a geometric series with ratio 21−s/2, so for s ≥ 6 it sums to under 1.15× the leading term.

Unstructured-search instantiation (§Algorithm/grover_search). For uniform-superposition initial state and a single bitstring target, θi = arcsin(2−s/2) and the overhead 1 / Π cos(θi) is bounded by exp(log2 N / (s · 2s+1)). For s ≥ log2 log2 N this exponent goes to zero, so the algorithm matches Grover's O(√N) query count asymptotically while every non-oracle gate touches at most s qubits — log of the global register size. This matches the asymptotic depth scaling earlier work hinted at (Grover-trade-offs, hardwareGrover2), but here it is derived from a single closed-form recursion rather than from a chain of partial-search reductions.

Empirical verification (§Evaluation, 18 qubits). The authors implement the recursive algorithm in Qiskit for n = 18 and search for the all-ones target. Two configurations are tested: 3-stage [6,6,6] with iterate counts (102, 25, 6) and 2-stage [9,9] with (201, 17). Over 1000 shots the empirical success rates (96.5%, 99.5%) agree to within statistical fluctuation with the theory (96.7%, 99.8%). Importantly, when the algorithm fails it tends to fail partially: in the 3-stage configuration 57% of failed shots still had the correct first-stage substring. This sequential structure means that for partial-target retrieval the effective success rate climbs (98.4% if only the first 6 bits are needed).

Depth comparison. The diffusion-operator circuit depth is compared under three MCX-decomposition scenarios (S1: no ancilla, S2: dirty V-chain via idle inter-partition qubits, S3: one clean ancilla). Under S1, the 3-stage / 2-stage configurations reduce diffusion depth by 95% / 81% over Grover, at a cost of 30% / 9% more oracle calls. Under S2 these climb to 97% / 96%. Crucially, the recursive scheme stays shallower overall whenever the oracle depth does not exceed roughly 3–11× the diffusion-operator depth, with the 2-stage variant tolerating the heaviest oracles.

Scaling (§Evaluation/Scaling). Extrapolating to n = 50 with k = 2, 3, 4 stages, failure probability drops below 10−4, 10−4, 10−3 respectively, with oracle overhead under 1%, 1%, 5%. For oracle depth comparable to diffusion depth, the 4-stage configuration achieves ~35% total-depth reduction at n = 50; even at 10× oracle depth the recursive scheme still saves about 5%. As a concrete example the authors apply the model to AES-128 (oracle ~10× diffusion depth), getting an estimated 5% time/cost reduction over a standard Grover key-search implementation.

Figures

Figure 1
Figure 1. Simulation and depth comparison of the recursive search scheme against standard Grover search on an 18-qubit single-target problem. Top row: measurement outcome distributions for the 3-stage [6,6,6] (left) and 2-stage [9,9] (right) configurations over 1000 shots. Bottom left: diffusion-operator circuit depth (log scale) for both configurations under three MCX-decomposition scenarios (S1 no ancilla, S2 dirty V-chain, S3 one clean ancilla); percentages are recursive depth relative to Grover. Bottom right: total oracle-call counts, with percentages showing recursive calls relative to a Grover baseline matched to the same success probability.
Figure 2
Figure 2. Scaling behaviour of the recursive search scheme for k = 2, 3, 4 stages across n = 20–50 qubits with a single-target unstructured search. Left: failure probability 1 − p (log scale). Right: relative oracle overhead, defined as the fractional increase in oracle calls over standard Grover search matched to the same success probability.
Figure 3
Figure 3. Depth scaling of the recursive scheme relative to standard Grover for k = 2, 3, 4 stages across n = 20–50. Top row: break-even oracle-to-diffusion depth ratio under S1, S2, S3. Bottom row: total circuit depth of the recursive scheme relative to Grover, for fixed oracle-to-diffusion ratios 1×, 5×, 10×. Dashed red line marks parity; values below indicate a depth advantage.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography. The bib (citations.bib, 50 KB) was scanned for the six arXiv IDs and for an author+title fingerprint; nothing matched. The paper's prior-work focus is on partial-search variants and hardware-Grover trade-offs, not on portfolio-optimisation Grover (Y4) or warm-started QAOA (Y1).

Overlap with Y1–Y6

Recommended action for Yuan