Evidence for the utility of quantum computing before fault tolerance
Kim et al. — Nature 618, 500-505 (2023)
In Plain Language
What this paper does: This high-profile IBM paper claimed "evidence for quantum utility" — that a 127-qubit quantum computer could produce results that are difficult for classical computers to simulate. It modeled a kicked Ising chain (a physics model for interacting magnets) using error mitigation.
Why it matters: This is the most contested claim in recent quantum computing: can current hardware do anything classically intractable? The paper's results were challenged by classical simulation groups. Reproducing the key experimental signatures tests whether the claims hold up.
Our scope: Mechanism verification, not a replication. The original ran 127 qubits at 60 Trotter steps with a learned noise model (PEA). We ran 5-9 qubits at 10 steps with simple ZNE. Our scale is trivially classically simulable — we tested whether the error mitigation methodology works, not the quantum utility claim.
What we found: All 3 mechanism claims confirmed on a 9-qubit subset. ZNE achieved a 14.1x improvement on the emulator and 2-3x on hardware. The mitigation technique works as described, but our small-scale test cannot address the paper's central quantum utility argument.
Key Terms
Kicked Ising model—A physics model where quantum spins (tiny magnets) interact and are periodically "kicked" — used to study quantum dynamics and chaos
ZNE—Zero Noise Extrapolation — run the same circuit at different noise levels, then extrapolate to estimate what the zero-noise answer would be
Quantum utility—The claim that a quantum computer can produce useful results faster or better than any classical computer for a specific task
Backends Tested
Failure Modes
Claim-by-Claim Comparison
Each claim from the paper is tested on multiple quantum backends. Published values are compared against our measurements.
Unmitigated magnetization M_z decays monotonically with Trotter depth due to noise accumulation
| Backend | Measured | Discrepancy | Status |
|---|---|---|---|
| QI Emulator | Yes | match | PASS |
| IBM Torino | Yes | match | PASS |
| ibm_marrakesh | Yes | match | PASS |
| QI Tuna-9 | Yes | match | PASS |
| tuna9_12edge | Yes | match | PASS |
| ibm_torino_9q_trex | Yes | match | PASS |
ZNE error mitigation recovers ideal M_z at Clifford point (theta_h=0) across depths
| Backend | Measured | Discrepancy | Status |
|---|---|---|---|
| QI Emulator | Yes | match | PASS |
| IBM Torino | -- | -- | |
| ibm_marrakesh | Yes | match | PASS |
| QI Tuna-9 | Yes | match | PASS |
| tuna9_12edge | Yes | match | PASS |
| ibm_torino_9q_trex | No | mismatch | PARTIAL |
ibm_torino_9q_trex: TREX (readout error mitigation, not ZNE) on 9-qubit Tuna-9 topology on IBM Torino. Max TREX error 20.3% at d=10. TREX only corrects readout errors, not gate noise, so it cannot recover ideal M_z at deep circuits. At d=1: TREX 0.948 (5.2% error). At d=10: TREX 0.797 (20.3% error). TREX MAE 0.113 vs raw MAE 0.150 — only marginal improvement. Confirms that readout mitigation alone is insufficient for deep circuits.
ZNE error mitigation substantially improves accuracy over unmitigated results
| Backend | Measured | Discrepancy | Status |
|---|---|---|---|
| QI Emulator | 14.1 | -4.1000 | PASS |
| IBM Torino | -- | -- | |
| ibm_marrakesh | 3.1 | +6.9000 | PARTIAL_SUCCESS |
| QI Tuna-9 | 2.3 | +7.7000 | PARTIAL |
| tuna9_12edge | 8 | +2.0000 | PASS |
| ibm_torino_9q_trex | 1.3 | +8.7000 | PARTIAL |
ibm_marrakesh: ZNE gate folding on IBM Marrakesh achieves 3.1x improvement (M_z error 3.2% raw -> 1.0% ZNE). Lower than emulator's 14.1x because hardware has non-depolarizing noise (coherent errors, crosstalk) that ZNE gate folding cannot fully amplify linearly. Paper's PEA method learns the actual noise model, achieving ~10x on 127 qubits.
QI Tuna-9: ZNE on Tuna-9 9-qubit topology achieves 3.1x at d=1, 1.5x at d=3 (mean 2.3x). Below paper's ~10x with PEA, but matches IBM marrakesh's 3.1x with same basic ZNE method. Hardware has non-depolarizing noise (dephasing-dominated) that simple gate folding cannot fully exploit. At d=3, fold=3 requires 180 CZ gates — hardware decoherence limits ZNE effectiveness.
ibm_torino_9q_trex: TREX (readout mitigation) on 9-qubit topology achieves only 1.3x improvement over raw (TREX MAE 0.113 vs raw MAE 0.150). Worst of all mitigation methods tested. TREX corrects readout errors only — for deep Ising circuits where gate noise dominates, readout mitigation provides minimal benefit. Contrast with H2 VQE where TREX achieved 119x improvement (shallow circuit, readout-dominated error). Key finding: mitigation method must match dominant error source.
Cross-Backend Summary
| Backend | Claims Tested | Passed | Pass Rate | Primary Issue |
|---|---|---|---|---|
| QI Emulator | 3 | 3 | 100% | -- |
| IBM Torino | 1 | 1 | 100% | -- |
| ibm_marrakesh | 3 | 2 | 67% | PARTIAL_SUCCESS |
| QI Tuna-9 | 3 | 2 | 67% | PARTIAL |
| tuna9_12edge | 3 | 3 | 100% | -- |
| ibm_torino_9q_trex | 3 | 1 | 33% | PARTIAL |
Key Findings
QI Emulator: 3/3 claims matched. The simulation pipeline correctly reproduces the published physics.
IBM Torino: 1/1 claims matched. Hardware results match published values within error bars.
ibm_marrakesh: 2/3 claims matched. Hardware noise prevents full reproduction.
QI Tuna-9: 2/3 claims matched. Hardware noise prevents full reproduction.
tuna9_12edge: 3/3 claims matched. Hardware results match published values within error bars.
ibm_torino_9q_trex: 1/3 claims matched. Hardware noise prevents full reproduction.