Improvement of performance of Grover's algorithm on three generations of Heron family IBM QPUs without and with topological dynamical decoupling

Authors: Tihomir G. Tenev, Nayden P. Nedev, Nikolay V. Vitanov · arXiv:2604.23228 · submission cycle 2026-04-28 · score 8/10 (HIGH)

Abstract

We investigate the performance of Grover's algorithm on three different generations of IBM Heron QPUs. On Heron family of IBM QPUs the success probabilities for three, four and five qubits without dynamical decoupling is better than results reported for previous generations of QPUs. The success probability as function of number of iterations of Grover operator is considered. A study of the improvement of results of Grover's algorithm for five qubit case with the help of topological dynamical decoupling is considered. For a six qubit case on Heron r3 QPU a clear result for finding the sought-after bitstring is reported for theoretically suboptimal number of iterations of Grover operator with the help of dynamical decoupling.

Executive summary

An empirical hardware study running Grover's search at 3–6 qubits on three IBM Heron-generation devices (Torino r1, Marrakesh r2, Pittsburgh r3) with and without dynamical-decoupling (DD) sequences. Headline results: (i) success probabilities without DD on the new Heron r3 (Pittsburgh) already exceed prior generations' DD-protected numbers for n=3, 4, 5; (ii) on the 6-qubit case the theoretically optimal 6 iterations are below the random-guess floor on all three devices, but with T4 or XY4 DD at 2–3 iterations Pittsburgh recovers a clean target signal up to ~0.06 success probability; (iii) the recently proposed topological DD family T_n matches or slightly beats XY4 in some configurations, with smaller-pulse-count sequences (T2, T4) generally best. This is squarely on the same Heron platform Y6 used for the PBR test, and the Grover algorithm underlies Y4's cardinality-constrained quantum advantage claim — so this paper is a useful empirical baseline for both lines.

Main contribution

A controlled cross-device benchmark of Grover at increasing problem sizes on the three Heron generations (calibration-data-aware run scheduling, fixed transpiler seed=1234, 10 000 shots/run, balanced 0/1 target bitstrings), combined with a head-to-head of CPMG, XY4, and the topological T_n DD sequences (Nedev 2025) on the 5-qubit and 6-qubit cases. The empirical conclusion is that (a) Heron r3 (Pittsburgh) hardware is the first IBM superconducting platform to support useful unprotected Grover at 5 qubits and DD-assisted Grover at 6 qubits, and (b) at the 5-qubit case the best T_n sequence outperforms XY4 by ~14% on Pittsburgh.

Key experimental protocol

Hardware: ibm_torino (Heron r1), ibm_marrakesh (Heron r2), ibm_pittsburgh (Heron r3). Calibration data (T1, T2, readout error, 2Q gate error) collected over > 2 weeks; runs scheduled when most metrics were better than the period mean.
Compilation: Qiskit 2.1.0, qiskit-ibm-runtime 0.40.1, transpiler optimization level 3, seed=1234. Same physical qubit subset across all 5-qubit runs on a given device.
Targets: balanced bitstrings — "010" (3q), "0101" (4q), "01011" (5q), "010110" (6q) — chosen to mitigate the population asymmetry between |0⟩ and |1⟩ states.
Iteration sweep: 5q with 1–4 Grover iterations (theoretical optimum 4); 6q with 1–6 (theoretical optimum 6).
2Q-gate counts at 5q: 127, 263, 402, 538 across iterations 1–4 (same on all three devices).
2Q-gate counts at 6q: Torino/Marrakesh 375 / 752 / 1117 / 1523 / 1916 / 2315; Pittsburgh slightly different (386 / 765 / 1151 / 1539 / 1908 / 2302) due to native-gate variation.
DD sequences: CPMG = X-X; XY4 = X-Y-X-Y; topological T_n sequences from Nedev 2025 with phases φ_k = (k-1)(n/2-k)/(n/2) π; sweep T2–T12.
Statistics: 10 000 shots per run, 99% binomial confidence intervals.

Detailed walkthrough

The 3–5 qubit case (Section III-A): the success-probability ladder is monotone across Heron generations on every problem size. For n=5, Pittsburgh (Heron r3) reaches ~0.35 unprotected — "almost twice as good" as DD-protected previous-generation IBM results. The authors attribute this to better T1, T2 and lower 2Q error on Heron r3, tabulated in the paper.

The 5q-iteration sweep (Section III-B) reveals the canonical noise-vs-signal trade-off: the highest unprotected success probability is attained at fewer than the theoretically optimal 4 iterations — 2 on Torino, 2–3 on Marrakesh, 3 on Pittsburgh (peaking near 0.38). The 2Q-gate count grows by ~135 per iteration (127 → 263 → 402 → 538), and gate-error accumulation overtakes the algorithmic amplification past the device-dependent sweet spot. This is the same mechanism Y3 quantifies for QAOA on portfolio optimisation: in the thermal-relaxation regime, deeper circuits stop helping.

The DD comparison (Section III-C, 5 qubits): on Torino, T8 gives the largest enhancement (~30% over the unprotected case); on Marrakesh, XY4 and T4 are tied; on Pittsburgh, T2 edges out XY4 by ~14%. The trend across T_n is non-monotone: there is an oscillation in success probability vs. pulse count, and shorter sequences (T2, T4) outperform T10, T12. The authors attribute this to a trade-off between the number of inserted DD blocks (which grows when each block is short) and the protection per block. The use of star-topology qubit selection means two qubits dominate the gate load while three are mostly idle — DD has stronger effect on the idle qubits, partially explaining why pulse-count and timing details matter so much.

The 6q case (Section III-E) is the headline experimental result. Without DD, the success probability at the optimal 6 iterations is at or below the 1/64 random-guess floor on all three devices — Grover's algorithm fails on Heron at 6 qubits unprotected. With T4 or XY4, Pittsburgh produces an unambiguous target peak at 2–3 iterations: ~0.06 with T4 at 2 iterations, ~0.05 at 3 iterations — well above the 1/64 floor and clearly distinguishable from non-target bitstrings (Figures 8–10 in the paper, viewable on the arXiv abstract page; see skip note below). On Torino and Marrakesh the 6q result is at-or-below the floor regardless of DD.

Section IV's conclusion is properly cautious: the Grover circuit at 6 qubits on Heron r3 returns the right answer with small but resolvable probability for sub-optimal iteration counts when paired with topological or XY4 DD. This is, to the authors' knowledge, the first 6-qubit Grover demonstration with a clean target signal on superconducting hardware. The signal is fragile: at 4–6 iterations on Pittsburgh the success probability collapses back toward the floor.

Figure rendering: the paper's figures are EPS-only and this digest pipeline lacks an EPS converter, so figures could not be embedded inline. Refer to figures 1, 7, and 8–10 in the source PDF for the iteration sweep plots (5q, 6q) and the bitstring histograms confirming target dominance on Pittsburgh under DD.

Citations to Yuan's papers

No direct citation to any of Y1–Y6 found in bibliography.

Overlap with Y1–Y6

Y1 (warm-started QAOA): Indirect — both papers iterate on a single quantum subroutine on real hardware, but Y1 is QAOA-based and this is straight Grover. The shared ground is the methodology of QPU-noise-aware execution.
Y2 (quasi-binary portfolio QAOA): Tangential — no direct method overlap.
Y3 (QAOA DGMVP, thermal noise crossover): Genuinely relevant. Y3's central conclusion is that thermal relaxation forecloses quantum advantage on current devices but DD-protected execution might restore it. This paper supplies fresh empirical numbers on the gate-error / decoherence rates on the same Heron family Y3 simulates and offers DD recipes (specifically T2, T4) that demonstrably help Grover. Y3's QAOA simulations could re-cost circuits using the 2Q-gate counts and DD overheads measured here.
Y4 (Grover + ADMM cardinality-constrained): Strong scope overlap. Y4 hinges on Grover-type amplitude amplification with O(√(C(n,k)/M)) rotations achieving epsilon-approximate optima. The Heron-r3 6-qubit result here is the most relevant published baseline for what Y4's algorithm should be expected to deliver on near-term superconducting hardware: above ~6 qubits Grover degrades severely, and DD becomes essential. Y4's hardware-resource estimation should cite this paper for current 2Q-gate counts and DD overheads on Heron.
Y5 (GW + Pauli sparsity): Unrelated — SDP relaxations and amplitude amplification operate in different problem classes.
Y6 (PBR test on Heron2): Direct platform overlap. Y6 ran the PBR no-go test on ibm_marrakesh (Heron r2) at 156 qubits; this paper runs Grover on the same Heron r2 and r1, r3 siblings. The pulse-count, gate-error, and DD numbers reported here can be cross-referenced with the calibration data Y6 used for the PBR test. Specifically, the topological DD T_n sequences from Nedev 2025 might also help the Y6 follow-up at sites where current PBR violations are marginal.

Recommended action for Yuan

Cite in Y4's hardware-cost discussion. If Y4 (arXiv:2603.14744) makes hardware-feasibility claims for its Grover step, this paper's 2Q-gate counts and per-iteration noise scaling on Heron r3 are the right benchmark numbers to anchor the analysis.
Cite in any Y6 follow-up on the same Heron family, especially if the work extends to comparative DD studies. The Vitanov group (Sofia) and the topological-DD sequence proposal (Nedev 2025) should be on Yuan's radar for any paper combining Heron + foundations tests.
Read deeper rather than email immediately — this is a standard hardware benchmark and adoption is mostly a citation/comparison action, not a collaboration trigger. If a Y4 hardware demonstration is on the roadmap, a co-implementation with Sofia's DD pipeline could be a useful extension paper.