Quantum Search without Global Diffusion

John Burke, Ciaran Mc Goldrick (Trinity College Dublin) · arXiv: 2604.15435 · submitted 2026-04-16 · score 8/10 (HIGH)

Abstract

Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log₂(log₂ N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls. For larger problem sizes this oracle overhead rapidly diminishes, and valuable depth reductions persist when the oracle circuit is substantially deeper than the diffusion operator. More broadly, these results show that a global diffusion operator is not necessary to achieve the quadratic speedup in quantum search, offering a new perspective on this foundational algorithm.

Executive summary

Burke & Mc Goldrick show that Grover's quadratic speedup survives even if the diffusion operator is stripped of all globality: every non-oracle gate can act on a local partition of the register, provided the initial and target states factor over the chosen partition. They obtain a closed-form recursive scheme with oracle overhead 1/∏cos(θᵢ), which converges to unity whenever the partition size s ≥ log₂(log₂ N). For Yuan this is directly relevant to Y4: the Grover construction for cardinality-constrained binary optimisation requires hardware-efficient multi-controlled oracles and amplitude-amplification iterations, and localising the diffusion step lowers circuit depth for every marked-element search. The 51%–96% non-oracle depth reductions reported at 18 qubits are exactly the near-term overhead knob that limits NISQ Grover demonstrations.

Main contribution

The authors prove that the global diffusion reflection S_ψ is dispensable when A = ⊗ᵢ Aᵢ and |x⟩ = ⊗ᵢ|xᵢ⟩. They define partial diffusers S_{ψᵢ} = Aᵢ(I − 2|0⟩⟨0|)Aᵢ† = I − 2|ψᵢ⟩⟨ψᵢ| on register i only and construct recursively Wᵢ = (S_{ψᵢ} Wᵢ₋₁)^{tᵢ} S_{ψᵢ} (Wᵢ₋₁ S_{ψᵢ})^{tᵢ}, with W₀ = S_x. The core technical discovery is that the principal angles between the high-dimensional eigenspaces of S_{ψᵢ} and Wᵢ₋₁ collapse to just two values γᵢ, π/2−γᵢ governed by a scalar recursion sin(2γᵢ) = sin(2θᵢ) sin(2tᵢ₋₁γᵢ₋₁), γ₁ = θ₁. Despite exponential growth of the underlying eigenspaces, the dynamics reduce to a scalar equation whose closed form yields both the success probability and the optimal iteration count. Each outer round measures register m and repeats on a shrinking residue of the register — the total oracle cost is dominated by the first round and the remainder sums as a geometric series.

Key theorems / lemmas / algorithms

Decomposition premise (Eq. 1): The recursive scheme requires both the state-preparation unitary and the target state to factor as tensor products over the partition of the n-qubit register.
Scalar angle recursion: sin(2γᵢ) = sin(2θᵢ) sin(2tᵢ₋₁γᵢ₋₁), with γ₁ = θ₁, captures the full algorithmic dynamics in a one-dimensional difference equation.
Projection property (Eq. 4): ‖P_{ψᵢ}(S_{ψᵢ} Wᵢ₋₁)^{tᵢ}|ψᵢ, φᵢ₋₁⟩‖² = cos²(2tᵢγᵢ), with orthogonal complement giving sin²(2tᵢγᵢ) — the scalar result survives despite arbitrary lower-register state φᵢ₋₁.
Per-stage success probability: ‖P_{xₘ}(S_{ψₘ} W_{m-1})^{tₘ}|ψ⟩‖² = ½[1 − (cos 2θₘ/cos 2γₘ) cos(2(2tₘ+1)γₘ)], with optimal iterate t*ₘ = ⌊π/(4γₘ) − ½⌉.
Oracle complexity bound: total cost ≤ C_Grover(n)/[∏cos(θᵢ)] · 1/(1 − 2^{1 − s/2}) for equal partition size s, converging to Grover's O(√N) query complexity whenever s ≥ log₂(log₂ N).
Empirical depth reduction: on the 18-qubit test under V-chain MCX decomposition (scenario S2), the 3-stage and 2-stage configurations reduce diffusion-circuit depth by 97% and 96% respectively, at 30% and 9% additional oracle calls.

Detailed walkthrough

The paper is organised around a familiar two-dimensional picture for Grover that the authors generalise recursively. Sections 2 and 3 set up notation for partial and global reflections and define S_{ψᵢ} as a reflection about |ψᵢ⟩ = Aᵢ|0ᵢ⟩. The recursion in Section 4 alternates an outer operator S_{ψᵢ} (acting locally on register i) with an inner reflection Wᵢ₋₁ that — crucially — is itself built from the oracle and the partial diffusers on the lower registers. The construction is not merely two alternating reflections: each Wᵢ is a conjugation, Wᵢ = Uᵢ S_{ψᵢ} Uᵢ⁻¹ with Uᵢ = (S_{ψᵢ} Wᵢ₋₁)^{tᵢ}, so it is itself a reflection whose action is characterised by principal angles to S_{ψᵢ}.

The central proof (Subsection 4.1 in the source, algorithm/proof.tex) tracks two invariants: the eigenspaces of Wᵢ containing |ψᵢ⟩ and those containing |ψᵢ⊥⟩. For a pair of reflections the dynamics are controlled by the canonical correlation (principal) angles between their reflected subspaces. A priori these subspaces have dimension 2^(i-1), so one would expect up to 2^(i-1) distinct principal angles. The authors show that all of them collapse to just γᵢ and π/2 − γᵢ, governed by the scalar recursion above. The physical consequence is that the operator (S_{ψᵢ} Wᵢ₋₁)^{tᵢ} acts on the two-dimensional plane span{|ψᵢ⟩, |ψᵢ⊥⟩} as a rotation by 2tᵢγᵢ, independent of how the lower registers are configured.

In Subsection 4.2 the recursion is unwound for the full algorithm applied to state |ψ⟩ = ⊗ᵢ|ψᵢ⟩. The closed-form success probability yields the optimal iteration count t*ₘ = ⌊π/(4γₘ) − ½⌉, matching the usual Grover formula in the limit m = 1. The measurement-and-restart structure (Figure 2 of the paper, circuits/full_search.tex) is the practical wrapper: after the outer stage concentrates amplitude on register m, the remaining m−1 registers are measured, reset to |ψᵢ⟩, and the algorithm recurses on the register-removed subproblem. Because each sub-problem is strictly smaller and the probability of restarting is suppressed by cos(θᵢ), the overhead from the tail rounds is a bounded geometric series.

Subsection 4.3 specialises to unstructured single-target search. With uniform partition size s = n/m, one has sin(θᵢ) = 2^(-s/2) and the product 1/∏cos(θᵢ) ≤ exp[log₂(N)/(s·2^(s+1))]. This is where the surprising log log threshold enters: for s ≥ log₂(log₂ N) the exponent vanishes and the scheme matches O(√N). Thus non-oracle operations can act on a number of qubits that is doubly logarithmic in the problem size, while retaining Grover's asymptotic optimality.

Section 5 (Evaluation) validates the predictions with Qiskit noiseless simulation on an 18-qubit search of |1⟩^{⊗18}. For the 3-stage [6,6,6] configuration with iteration counts (102, 25, 6), the target bitstring was measured in 96.5% of 1000 shots versus a 96.7% theoretical prediction; the 2-stage [9,9] configuration achieved 99.5% versus the 99.8% bound. Three MCX decomposition scenarios are compared — S1 (no ancillae, strictly within-partition), S2 (dirty V-chain using idle qubits from other partitions), and S3 (one clean ancilla). Under S2 the diffusion-circuit depth drops by 97% (3-stage) and 96% (2-stage), with an oracle overhead of 30% and 9% respectively. Even under S3, which already permits efficient Grover compilation via a clean ancilla, the recursive scheme saves 63% and 51% of the diffusion depth.

Section 5.2 extrapolates to n = 20…50. The failure probability of the 2-stage configuration drops below 10⁻⁷ by n = 50, and the 3-stage failure below 10⁻⁴. The break-even oracle-to-diffusion depth ratio — the point where the recursive scheme is equivalent to Grover — rises to ~10³ for the 2-stage at n = 50 under S2. For an AES-128 oracle (estimated ~10× depth ratio) the scheme would still save roughly 5% of the total circuit depth.

Figures

Figure 1. Simulation and depth comparison of the recursive search scheme against standard Grover search on an 18-qubit single-target problem. Top row: measurement-outcome distributions for the 3-stage [6,6,6] and 2-stage [9,9] configurations over 1000 shots, showing proportion of shots returning the correct target bitstring and partial matches. Bottom left: diffusion-operator circuit depth (log scale) under MCX scenarios S1 (no ancillae), S2 (dirty V-chain using idle qubits), S3 (one clean ancilla). Bottom right: total oracle-call counts.

Figure 2. Scaling behaviour of the recursive scheme for k = 2, 3, 4 stages over n = 20 to 50 qubits, single-target unstructured search. Left: failure probability 1 − p on a logarithmic scale. Right: relative oracle overhead (fractional increase over Grover matched to the same success probability).

Figure 3. Depth scaling of the recursive search scheme relative to standard Grover for k = 2, 3, 4 stages across n = 20 to 50 qubits. Top row: break-even oracle-to-diffusion depth ratio under MCX decomposition scenarios S1, S2, S3. Bottom row: total circuit depth of the recursive scheme relative to Grover assuming oracle depth contributes 1×, 5×, and 10× the depth of a single diffusion operator. Dashed red line marks parity with Grover; values below indicate a depth advantage.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Y4 — Grover + ADMM for cardinality-constrained BO (arXiv:2603.14744) — PRIMARY overlap. Y4's algorithm requires Grover rotations on the structured feasible space C(n,k) with oracle complexity O(√(C(n,k)/M)). Burke & Mc Goldrick's recursive construction would apply directly whenever Y4's feasible-space encoding admits a tensor-product preparation A = ⊗Aᵢ — e.g., when the cardinality constraint is enforced via independent bit-flag registers. Adopting local diffusers would shrink the non-oracle depth per Grover step by the 51%–96% factors demonstrated here, at a 5%–30% oracle-overhead cost that is tolerable because Y4's oracle is the expensive operation.
Y2 — quasi-binary QAOA (arXiv:2304.06915) — method-adjacent. The Y2 hard-constraint mixer lives inside a structured feasible space; the local-diffusion principle here is an orthogonal technique that could potentially be adapted to warm-started or adaptive quantum search protocols built from quasi-binary encodings.
Y1 — iterative warm-started QAOA (arXiv:2502.09704) — conceptually aligned. Both works exploit structure of the search landscape to reduce per-iteration cost — Y1 via warm-starts, Burke & Mc Goldrick via tensor-factored initial states. Neither subsumes the other.
Y3, Y5, Y6 — no direct overlap. Y3 studies QAOA for portfolio optimisation under noise; Y5 solves SDPs via Gibbs states; Y6 is an experimental PBR test. None intersect this paper's scope.

Recommended action for Yuan

Adapt for Y4 benchmarks (highest priority). Prototype Burke & Mc Goldrick's recursive scheme on top of Y4's cardinality-oracle Grover to quantify end-to-end circuit-depth savings on the DGMVP-feasible space. The log₂(log₂ N) partition threshold means modest numbers of cardinality-flag qubits per block already retain asymptotic complexity.
Cite in the next Y4 draft. Natural companion citation both for discussing hardware-aware Grover and for the oracle-depth-dominated regime (where the overhead vanishes).
Email the authors. Specifically ask whether the partial-diffusion construction has been analysed under depolarising noise — a key bottleneck for Yuan's NISQ-era optimisation work and something the paper leaves open.