Quantum Search without Global Diffusion
Abstract
Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log2(log2 N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls.
Executive summary
Burke and Mc Goldrick prove that the global diffusion operator in Grover-style amplitude amplification can be replaced entirely by local reflections acting on a tensor-product partition of the register, provided the prepared initial state and the target state both factor across that partition. The oracle remains the only global operator, but each non-oracle step touches at most ni qubits, slashing transpilation depth. The construction yields an exact closed-form success probability with oracle overhead O(1 / Π cos θi), which collapses to 1 once each partition has ≥ log2(log2 N) qubits — i.e. the asymptotic O(√N) is preserved. For Yuan, who is actively designing structured-feasible-space Grover algorithms for cardinality-constrained binary optimisation (Y4), this paper offers a directly usable depth-reduction primitive for the inner Grover oracle.
Main contribution
The paper introduces a recursive reflection Wi = (Sψi Wi−1)ti Sψi (Wi−1 Sψi)ti with W0 = Sx (the oracle), where Sψi is the partial diffusion operator on register i. The non-trivial observation is that although the eigenspaces of Wi grow exponentially in dimension (2i−1), the principal angles between Wi and Sψi collapse to two values {γi, π/2 − γi} where sin(2γi) = sin(2θi) sin(2 ti−1 γi−1). This degeneracy reduces the high-dimensional rotation to a scalar recursion and yields the closed-form success probability ‖Pxm (Sψm Wm−1)tm |ψ⟩‖² = ½[1 − cos(2θm)/cos(2γm) · cos(2(2tm+1)γm)]. The intermediate iterate counts ti = 1 (i < m) are proven oracle-optimal. After amplifying register m, lower registers are measured, the search is repeated on the shrinking subspace, and the total cost forms a geometric series dominated by the first round.
Key theorems / lemmas / algorithms
- Angle-degeneracy theorem (Sec. Algorithm/proof): for all i, the principal angles between Wi−1 and Sψi reduce to exactly two values {γi, π/2 − γi} satisfying the scalar recursion sin(2γi) = sin(2θi) sin(2 ti−1 γi−1), γ1 = θ1.
- Closed-form stage success (Eq. 6 in §Algorithm/overview): after tm outer iterations, ‖Pxm (Sψm Wm−1)tm |ψ⟩‖² = ½[1 − cos(2θm)/cos(2γm) · cos(2(2tm+1)γm)], with optimal iterate tm* = ⌊π/(4γm) − ½⌉.
- Per-stage oracle complexity (Eq. 16 in §Algorithm/final_complexity): C ≤ CQAA / Πi cos(θi), so the new scheme inflates the standard amplitude-amplification cost only by the product of local-overlap cosines.
- Total oracle complexity (Eq. 18): dominated by the first round; the rest forms a geometric series with ratio 21−s/2 for equal stage size s.
- Asymptotic optimality for unstructured search: for partition sizes s ≥ log2(log2 N), both the overhead factor and the geometric tail converge to 1, recovering Grover's optimal O(√N) query complexity.
- Optimal iterate ordering: ti = 1 for all intermediate i < m is oracle-optimal; doubling any earlier iterate yields at most the same improvement in γ as doubling tm directly.
Detailed walkthrough
Setup (§Algorithm/overview). The n-qubit search register is split into m blocks of sizes n1, …, nm, and the authors assume both the state preparation A and the target |x⟩ factor as A = ⊗ Ai, |x⟩ = ⊗ |xi⟩. The local partial diffusion Sψi = Ai(𝟙 − 2|0⟩⟨0|)Ai† acts only on register i (and trivially elsewhere). The recursive operator Wi is built from Sψi and Wi−1; the oracle Sx = W0 is the only operator that couples all qubits. Each Wi is itself a reflection.
Why it works (§Algorithm/proof). The key insight is that despite Wi living on a 2i-dimensional subspace, the principal angles between the eigenspaces of two successive reflections collapse to two values, governed by a single recursively defined angle γi. This is the “scalar reduction” the authors highlight in the abstract: a high-dimensional rotation behaves like a 2D rotation parameterised by γi. The angle recursion sin(2γi) = sin(2θi) sin(2 ti−1 γi−1) with γ1 = θ1 closes the analysis.
Complexity (§Algorithm/final_complexity). Setting ti = 1 for all i < m (proven optimal), the authors obtain sin(2γm) = 2m sin(θ) Πi cos(θi), with sin θ = Π sin(θi) the global overlap. Convexity of sin(x)/x then gives γm ≥ 2m−1 θ Π cos(θi), hence tm* ≤ π / (2m+1 θ Π cos(θi)) and a single-round cost C = 2m−1 tm* ≤ CQAA / Π cos(θi). The whole algorithm then loops over a sequence of shrinking sub-searches; for equal partition size s the tail is a geometric series with ratio 21−s/2, so for s ≥ 6 it sums to under 1.15× the leading term.
Unstructured-search instantiation (§Algorithm/grover_search). For uniform-superposition initial state and a single bitstring target, θi = arcsin(2−s/2) and the overhead 1 / Π cos(θi) is bounded by exp(log2 N / (s · 2s+1)). For s ≥ log2 log2 N this exponent goes to zero, so the algorithm matches Grover's O(√N) query count asymptotically while every non-oracle gate touches at most s qubits — log of the global register size. This matches the asymptotic depth scaling earlier work hinted at (Grover-trade-offs, hardwareGrover2), but here it is derived from a single closed-form recursion rather than from a chain of partial-search reductions.
Empirical verification (§Evaluation, 18 qubits). The authors implement the recursive algorithm in Qiskit for n = 18 and search for the all-ones target. Two configurations are tested: 3-stage [6,6,6] with iterate counts (102, 25, 6) and 2-stage [9,9] with (201, 17). Over 1000 shots the empirical success rates (96.5%, 99.5%) agree to within statistical fluctuation with the theory (96.7%, 99.8%). Importantly, when the algorithm fails it tends to fail partially: in the 3-stage configuration 57% of failed shots still had the correct first-stage substring. This sequential structure means that for partial-target retrieval the effective success rate climbs (98.4% if only the first 6 bits are needed).
Depth comparison. The diffusion-operator circuit depth is compared under three MCX-decomposition scenarios (S1: no ancilla, S2: dirty V-chain via idle inter-partition qubits, S3: one clean ancilla). Under S1, the 3-stage / 2-stage configurations reduce diffusion depth by 95% / 81% over Grover, at a cost of 30% / 9% more oracle calls. Under S2 these climb to 97% / 96%. Crucially, the recursive scheme stays shallower overall whenever the oracle depth does not exceed roughly 3–11× the diffusion-operator depth, with the 2-stage variant tolerating the heaviest oracles.
Scaling (§Evaluation/Scaling). Extrapolating to n = 50 with k = 2, 3, 4 stages, failure probability drops below 10−4, 10−4, 10−3 respectively, with oracle overhead under 1%, 1%, 5%. For oracle depth comparable to diffusion depth, the 4-stage configuration achieves ~35% total-depth reduction at n = 50; even at 10× oracle depth the recursive scheme still saves about 5%. As a concrete example the authors apply the model to AES-128 (oracle ~10× diffusion depth), getting an estimated 5% time/cost reduction over a standard Grover key-search implementation.
Figures
Citations to Yuan's papers
citations.bib, 50 KB) was scanned for the six arXiv IDs and for an author+title fingerprint; nothing matched. The paper's prior-work focus is on partial-search variants and hardware-Grover trade-offs, not on portfolio-optimisation Grover (Y4) or warm-started QAOA (Y1).Overlap with Y1–Y6
- Y4 (Grover-based cardinality-constrained binary optimisation): directly relevant — Y4 builds a Grover-style algorithm with O(√(C(n,k)/M)) rotations on a structured feasible space of fixed-Hamming-weight strings. Burke–Mc Goldrick's recursive partial-diffusion scheme requires only that the initial and target states factor across some partition. Y4's symmetric-superposition Dicke-state initial state does not naturally factor over qubit-blocks (it is permutation-symmetric across the whole register), so the construction would need adaptation to fixed-cardinality blocks. But the idea that the diffusion can be local while preserving √-speedup is highly relevant to the Grover-circuit-depth bottleneck Y4 highlights as the limiting factor on near-term implementations.
- Y1 (iterative warm-started QAOA): indirect — the paper's "measure inner registers, restart on the residue" pattern is structurally similar to Y1's iterative warm-starting, except that here the residue is a smaller register rather than an updated parameter set.
- Y3 (QAOA-DGMVP portfolio): peripheral — both papers focus on near-term-hardware depth/oracle-cost trade-offs, but the methods are otherwise orthogonal.
- Y2, Y5, Y6: no meaningful overlap.
Recommended action for Yuan
- Implement for comparison on the Y4 oracle: the most useful action is to check whether Y4's fixed-cardinality Grover oracle can be cast into a form where the initial state and target state both factor over some partition of the qubit register. If so, the recursive-local-diffusion variant from §Algorithm/grover_search could replace the global Dicke-state diffusion at near-zero asymptotic cost, with potentially significant depth savings (the bottleneck Y4 already identifies). Worth one afternoon's prototype.
- Cite in the next Y4 revision (or the follow-up paper): when discussing "extensions to reduce non-oracle depth on the cardinality-constrained search", this paper is the clearest currently-published reference point.
- Read the proof of the angle-degeneracy theorem (§Algorithm/proof): the scalar reduction of an exponentially growing eigenspace is the kind of structural trick that recurs in Grover-style analyses (including Y4's). The proof technique itself is worth absorbing in case it applies to multi-marked or weighted-search regimes Yuan may encounter.