← Back to digest

Quantum Search without Global Diffusion

John Burke, Ciaran Mc Goldrick (Trinity College Dublin) · arXiv:2604.15435 · new submission 2026-04-20 · score 8/10 (HIGH)

Abstract

Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log₂(log₂ N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls. For larger problem sizes this oracle overhead rapidly diminishes, and valuable depth reductions persist when the oracle circuit is substantially deeper than the diffusion operator. More broadly, these results show that a global diffusion operator is not necessary to achieve the quadratic speedup in quantum search, offering a new perspective on this foundational algorithm. Moreover, the scalar reduction at the heart of our analysis inspires and motivates new directions and innovations in quantum algorithm design and evaluation.

Executive summary

This paper shows that the quadratic speedup of Grover/amplitude-amplification survives even when the diffusion operator is fully replaced by reflections that act only within non-overlapping qubit partitions — the oracle becomes the only operator that touches all qubits. A recursive construction over partitions admits a closed-form analysis because the principal angles between successive reflection eigenspaces collapse to just two distinct values, despite the underlying eigenspaces growing exponentially. For an 18-qubit search, the recursive scheme cuts non-oracle circuit depth by 51–96% with only ~9% extra oracle calls, and the overhead vanishes asymptotically when partition sizes exceed log₂(log₂ N). For Yuan, this is a direct hardware-aware variant of Grover search that pairs naturally with Y4's cardinality-constrained Grover algorithm: the diffusion bottleneck in Y4-style problems is exactly what this paper attacks.

Main contribution

The authors prove that amplitude amplification can be performed using a recursive search procedure built entirely from partial (per-partition) diffusion operators, with the oracle as the only global operation. The key technical observation: although alternating partial and global reflections normally generates dynamics on an exponentially growing invariant subspace (the "two-operator" case alone requires diagonalising a 4×4 unitary), a recursive construction that builds reflections only from local partial diffusers and the oracle has the remarkable property that the principal angles between successive reflections collapse to exactly two distinct values, both determined by a single recursively-defined angle θ_i per partition. This yields an exact closed-form expression for success probability and optimal iteration count, with oracle overhead O(1/∏ᵢ cos(θᵢ)) relative to standard amplitude amplification. For unstructured single-target search, this overhead converges to 1 when each partition has at least log₂(log₂ N) qubits, preserving the full √N speedup. Each round amplifies one register's component of the target; that register is then measured and the procedure recurses on the remaining qubits.

Key theorems / algorithms

Detailed walkthrough

The paper is organised as: §1 introduction; §2 preliminaries (amplitude amplification, partial diffusion operators); §3 the recursive algorithm and its closed-form analysis; §4 evaluation (simulation + scaling); §5 discussion; §6 conclusion.

Conceptual setup (§1, §2): Standard amplitude amplification rotates within a 2-dimensional plane spanned by |ψ⟩ = A|0⟩ and the target |x⟩, using two global reflections (oracle S_x and diffuser S_ψ). The partial diffusion operator reflects only within a subset of qubits and is naturally suited to algorithms where A factorises across partitions (Grover for unstructured search is the canonical case, since |ψ⟩ is a product of Hadamards). Earlier work (Grover 2002, hardware-Grover variants) used partial diffusers to reduce global diffusion calls, but always retained at least one global non-oracle operation. Even two-operator alternations break the elegant 2-D geometry and need a 4×4 unitary analysis.

The recursive construction (§3): The authors define a sequence of reflection operators W_0, W_1, …, W_{m−1} where W_0 = S_{ψ_1} (the partial diffuser on partition 1) and W_i = (S_{ψ_i} W_{i−1})^{t_i} S_{ψ_i} (W_{i−1} S_{ψ_i})^{t_i}. Each W_i is a reflection-like operator built only from partial diffusers and W_{i−1}; only the oracle ever spans the full register. The full algorithm is to prepare ⊗ Aᵢ|0⟩, apply (S_{ψ_m} W_{m−1})^{t_m}, measure the final partition (which now contains amplified amplitude on the target substring x_m), and recurse on the remaining partitions.

Why it works — the angle collapse (§3.2): Naively the eigenspaces of W_i grow as 2^{n_1 + … + n_i}, but the authors show that the principal angles between successive W_i and S_{ψ_{i+1}} take only two distinct values, both functions of a single recursive angle. The result is a scalar recursion for the algorithm's dynamics — the closed-form expression for success probability after t iterations at each stage. This is the analytical anchor that makes the whole construction tractable.

Oracle overhead (§3.6): The overhead factor 1/∏ cos(θᵢ) is benign for unstructured search where θᵢ → 0 quickly; when each partition has at least log₂(log₂ N) qubits, the overhead converges to 1 and the √N speedup is preserved exactly. For partitions smaller than this threshold, the constant prefactor grows, but the asymptotic scaling is unchanged.

Evaluation (§4): The authors test an 18-qubit single-target search in Qiskit. For a 3-stage [6,6,6] configuration and a 2-stage [9,9] configuration, simulated success matches theory tightly (96.5% vs 96.7% theoretical for 3-stage; 99.5% vs 99.8% for 2-stage), with the "useful" failure structure that 57% of failures still recover the correct first-stage substring. They evaluate three multi-controlled-X decomposition scenarios (S1: no ancillae; S2: dirty V-chain using idle qubits; S3: one clean ancilla). Under S1 the diffusion depth drops by 95% (3-stage) and 81% (2-stage) at 30% / 9% oracle overhead; under S2 the savings rise to 97% / 96%; under S3 they narrow but remain substantial (63% / 51%). The scaling analysis (§4.2) extrapolates to n = 20–50 qubits and finds break-even oracle-to-diffusion depth ratios of ~10³ for the 2-stage configuration at n = 50 under S2 — i.e., even oracles three orders of magnitude deeper than the diffuser still yield net depth savings. A crossover appears around n = 25 where 3- and 4-stage configurations become more depth-efficient than 2-stage as n grows.

Practical implication: For AES-128, where the oracle-to-diffusion depth ratio is estimated around 10×, the recursive scheme yields ~5% total depth reduction at n = 50 qubits. Modest, but real, and scales favourably with n. More broadly, the work raises a structural question: now that the non-oracle cost is essentially eliminated, is the oracle itself the binding constraint, and how should we restructure oracle design accordingly?

Figures

Verification plot
Figure 1 (verification_plot_combined). Simulation and depth comparison of the recursive search scheme against standard Grover search on an 18-qubit single-target problem. Top row: Measurement outcome distributions for the 3-stage [6,6,6] (left) and 2-stage [9,9] (right) configurations over 1000 shots, showing the proportion of shots returning the correct target bitstring and partial matches. Bottom left: Diffusion operator circuit depth (log scale) for both configurations under three decomposition scenarios. S1: no ancillae, S2: dirty V-chain using idle qubits, S3: one clean ancilla. The circuits were transpiled onto a fully connected topology with CX as the native two-qubit gate. Percentages indicate the recursive scheme's depth for implementing all of the diffusion steps relative to Grover search. Bottom right: Total oracle call counts, with percentages showing the recursive scheme's calls relative to the Grover baseline matched to the same success probability.
Scaling properties
Figure 2 (fig1_properties). Scaling behaviour of the recursive search scheme for k = 2, 3, 4 stages of near equal size, applied to unstructured searches over n = 20 to n = 50 qubits with a single target state. Left: Failure probability 1 − p on a logarithmic scale. Right: Relative oracle overhead, defined as the fractional increase in oracle calls over standard Grover search matched to the same success probability.
Depth scaling
Figure 3 (fig2_savings). Depth scaling of the recursive search scheme relative to standard Grover search for k = 2, 3, 4 stages across n = 20 to n = 50 qubits. Top row: Break-even oracle-to-diffusion depth ratio under different multi-controlled X gate decomposition scenarios (S1, S2, S3). The recursive scheme achieves a lower total circuit depth than Grover whenever the oracle depth is below this ratio times the global diffusion operator depth. Bottom row: Total circuit depth of the recursive scheme relative to Grover, assuming a single oracle operator contributes 1×, 5×, and 10× the depth of a single diffusion operator respectively. The dashed red line marks parity with Grover; values below indicate a depth advantage for the recursive scheme.
No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Recommended action for Yuan