Quantum Search without Global Diffusion
Abstract
Quantum search is among the most important algorithms in quantum computing, offering fundamentally new approaches to optimisation and information processing. At its core is quantum amplitude amplification, a technique that achieves a quadratic speedup over classical search by combining two global reflections: the oracle, which marks the target, and the diffusion operator, which reflects about the initial state. We show that this speedup can be preserved when the oracle is the only global operator, with all other operations acting locally on non-overlapping partitions of the search register. We present a recursive construction that, when the initial and target states both decompose as tensor products over these chosen partitions, admits an exact closed-form solution for the algorithm's dynamics. This is enabled by an intriguing degeneracy in the principal angles between successive reflections, which collapse to just two distinct values governed by a single recursively defined angle. Applied to unstructured search, a problem that naturally satisfies the tensor decomposition, the approach retains the O(√N) oracle complexity of Grover search when each partition contains at least log2(log2 N) qubits. On an 18-qubit search problem, partitioning into two stages reduces the non-oracle circuit depth by as much as 51%–96% relative to Grover, requiring up to 9% additional oracle calls.
Executive summary
Burke and Mc Goldrick prove that the global diffusion operator in Grover/amplitude amplification is not strictly necessary: if the state-preparation unitary and target state both factor as tensor products across a partition of the qubit register, the global reflection can be replaced entirely by local, register-scoped reflections, with the oracle as the only operator that still spans the full register. The proof identifies a surprising degeneracy — the principal angles between successive reflection subspaces (whose ambient dimension grows exponentially) collapse to two values governed by a scalar recursion. For unstructured search the overhead is negligible once each partition holds at least log2(log2 N) qubits, and at n = 50 the two-stage scheme matches Grover's success probability at oracle cost increase <0.1% while slashing non-oracle depth. For Yuan, this is directly load-bearing for Y4's Grover-ADMM pipeline: the dominant hardware cost of a Grover rotation for a cardinality-constrained feasible set is the non-oracle diffusion on the full n-qubit register, exactly what this paper localises.
Main contribution
The central object is a family of reflection operators Wi, defined recursively from the oracle Sx and local partial-diffusion reflections Sψi = I − 2|ψi〉〈ψi| acting only on register i:
Wi = (Sψi Wi−1)ti Sψi (Wi−1 Sψi)ti, W0 = Sx
Within the span of the initial and target register states, the product Sψi Wi−1 acts as a rotation with two distinct principal angles {γi, π/2 − γi} where sin(2γi) = sin(2θi) · sin(2ti−1γi−1) and γ1 = θ1. This scalar recursion yields a closed-form success probability per stage and an oracle overhead bounded by 1/∏i cos(θi) relative to standard amplitude amplification. Measure the non-final registers, reinitialise, repeat on the shrinking remainder — the target is recovered stage-by-stage with a geometric-series overhead dominated by the first round.
Key theorems / lemmas / algorithms
- Principal-angle degeneracy (Sec. 4. proof): Despite eigenspaces of dimension 2i−1, principal angles between Wi−1's reflected subspace and the Sψi-subspace collapse to two values governed by a single recursively defined angle γi.
- Stage success probability (Eq. stage_succ): ||Pxm(SψmWm−1)tm|ψ〉||2 = ½[1 − (cos 2θm/cos 2γm) cos(2(2tm+1)γm)], with optimal iteration count t*m = ⌊π/(4γm) − ½⌉.
- Oracle-overhead bound (Sec. final_complexity): Total oracle cost relative to standard AA is bounded by 1/∏k=1m cos(θk); for unstructured single-target search with partition sizes ni ≥ log2(log2 N), the O(√N) Grover complexity is retained.
- Algorithm (Sec. algorithm): Outer loop over stages m→1. At stage i, apply (Sψi Wi−1)ti to the full register, measure registers 1…i−1, reinitialise, descend to the next stage. Qiskit reference implementation publicly available.
Detailed walkthrough
The paper's pitch is structural: standard Grover/AA requires two global reflections — an oracle Sx that marks the target, and a diffusion operator Sψ = I − 2|ψ〉〈ψ| that reflects about the initial state. The second is typically implemented as a multi-controlled operation that touches every qubit; on near-term hardware with limited connectivity, this is expensive. Prior work on partial-diffusion operators (Grover-Radhakrishnan-Korepin, partial search, Zalka-style tradeoffs) reduces but does not eliminate the global reflection; when global diffusion is replaced entirely, the dynamics become a unitary on an exponentially growing subspace and do not admit a closed-form analysis in general.
The key insight of Sec. 3 (algorithm overview) is that if A = ⊗i Ai and |x〉 = ⊗i |xi〉 factor across a register partition of sizes n1+…+nm = n (unstructured search with Hadamards satisfies this trivially with any partition), then the recursive reflection Wi defined above acts on span{|ψi〉, |ψi⊥〉} as a rotation with just two principal angles. The proof in Sec. algorithm/proof.tex is an induction: at each level of the recursion the ambient invariant subspace doubles, but the reflection only "sees" two directions per register. As a consequence, the squared projections onto |ψi〉 and |ψi⊥〉 are cos2(2tiγi) and sin2(2tiγi) irrespective of the lower-register state — which is what makes the stage-by-stage analysis tractable.
The success-probability formula is then derived by tracking projections onto the register-m target component. Measurement collapses the outer registers; the target register is in a mixed state on span{|ψm〉, |ψm⊥〉}. The remaining registers are reinitialised and the search iterates on a shrinking register. Because each stage introduces only a multiplicative 1/cos(θi) in query count, and the partition sizes geometrically shrink (for unstructured search) or can be held uniform, the total oracle budget is a geometric series dominated by the first round.
Sec. 5 (Evaluation) does two things. First, an 18-qubit Qiskit simulation: for target |1〉⊗18, a 3-stage [6,6,6] configuration with (102, 25, 6) iterations per stage measures the target 96.5% of the time vs. the theoretical 96.7%; the 2-stage [9,9] configuration with (201, 17) iterations achieves 99.5% vs. 99.8%. Grover with matched success probability is the baseline. The authors evaluate non-oracle circuit depth under three MCX-decomposition scenarios — S1 (no ancillae), S2 (dirty V-chain using idle qubits), S3 (one clean ancilla). Under S1 at the 3-stage configuration, diffusion depth drops by 95%; under S2 by 97%; with one clean ancilla (S3) by 63%. The oracle-call overheads are 30% (3-stage) and 9% (2-stage). Importantly, the authors compute the break-even oracle-to-diffusion-depth ratio at which the recursive scheme still beats Grover in total depth: for 3-stage it is ∼3.1, 3.2, 1.7 (S1/S2/S3); for 2-stage it is ∼8.8, 10.6, 5.2.
Sec. 5.2 (Scaling analysis) extends to n = 20–50 qubits using the closed-form expressions. The headline: both failure probability and oracle overhead improve with n for fixed stage count. By n = 50, 2-stage failure < 10−7 with <0.1% oracle overhead; 3-stage < 10−4 with <1%; 4-stage < 10−3 with <5%. At this scale the break-even oracle depth ratio under S2 reaches ∼103 for 2-stage, meaning the scheme is depth-favourable even when the oracle is three orders of magnitude deeper than a global diffuser. Applied to an AES-128 key-recovery oracle (oracle-to-diffusion depth ratio ∼10×), the recursive scheme reduces total depth by ∼5%.
A subtle point: success probability per stage saturates below Grover's (99.99%+) because each stage's probability is bounded by the scalar recursion — the paper is explicit that quadratic-speedup preservation comes with a small probability deficit that shrinks with n. For applications that tolerate a ~10−3–10−7 miss rate (essentially all quantum-advantage arguments for combinatorial search), this is a very favourable trade: dramatically shallower non-oracle circuits at essentially free success probability. For applications requiring exact success (deterministic Grover variants), this scheme is not a drop-in.
Figures
Citations to Yuan's papers
Overlap with Y1–Y6
- Y4 (Grover + ADMM for cardinality-constrained BO) — primary overlap: Y4's algorithm relies on Grover rotations over a structured feasible space (cardinality-k n-bit strings, size C(n,k)) with total query cost O(√(C(n,k)/M)). The dominant per-rotation hardware cost is the diffusion over the full n-qubit register. Burke–Mc Goldrick replace that global diffusion with local partial diffusers over register partitions, at ≤9% extra oracle queries. In Y4's Grover-subroutine inner loop this could directly translate to 50–95% non-oracle depth reduction for the cardinality-feasible-space search, especially on heavy-hex topology (which Y4 is very likely to target since IBM Heron is Yuan's hardware of choice in Y6). Caveat: the partition-factoring condition |x〉 = ⊗i |xi〉 is trivial for unstructured search; adapting it to cardinality-constrained Grover (where the marked states live on a cardinality-k shell) requires checking whether the initial cardinality-preserving state can be written as a tensor product across some partition — generally it cannot, which is a nontrivial conceptual hurdle worth investigating.
- Y1 & Y3 (QAOA warm-starting and DGMVP) — tangential: No direct method link; QAOA is not Grover-structured. However the general philosophy of reducing non-oracle/non-cost-Hamiltonian global operators in favour of local structure aligns with parameter/compilation efficiency work. The "scalar reduction on exponentially growing subspaces" analysis style is potentially portable to QAOA trajectories.
- Y2 (quasi-binary QAOA) — no direct overlap.
- Y5 (Gibbs + Pauli sparsity SDP) — no direct overlap.
- Y6 (PBR on Heron2) — hardware scope only: Both target NISQ superconducting hardware; Burke–Mc Goldrick's heavy-hex depth numbers are directly relevant to what Yuan's group runs on Heron.
Recommended action for Yuan
- Adapt to Y4 follow-up. The most tangible opportunity: investigate whether the recursive local-diffusion scheme extends to cardinality-constrained Grover. Key question — does the Dicke-state (or symmetric superposition on the cardinality-k shell) admit a tensor-product decomposition over some block partition? If not exactly, is there an approximate factorisation that preserves the quadratic speedup within an acceptable overhead? Concrete next step: write a one-page analysis applying the Wi recursion to the Dicke-state initial condition used in Y4.
- Cite on next Y4 revision. Even without direct adaptation, this is the state of the art on "Grover with local diffusion" as of April 2026 — any future Y4 experimental section implementing Grover rotations on heavy-hex hardware should cite Burke–Mc Goldrick when discussing diffusion-circuit depth.
- Read deeper before adapting. The proof of the principal-angle degeneracy is the paper's technical core; Yuan should read Sec. algorithm/proof.tex carefully before building on the scheme, since the degeneracy is what makes the scalar recursion tractable and is also likely what breaks in structured-subspace generalisations.