
LiH (lithium hydride) — the molecule we computed from first principles and ran on three quantum platforms.
We Computed a Molecule from First Principles and Ran It on Three Quantum Platforms
From molecular geometry to quantum hardware measurements in one automated pipeline. The emulator nailed it. IBM Fez tried its best.
Most quantum chemistry tutorials start with a Hamiltonian that's already been computed for you. We wanted to see the full pipeline: start from a molecule's geometry, compute the electronic structure from scratch, map it to qubits, generate measurement circuits, and run them on real hardware.
The result: two molecules (H2 and LiH), three platforms (QI emulator, IBM Fez, IBM Torino), 12 unique circuit designs, and nearly 300,000 total measurements. Here’s what we learned.
The Pipeline
Quantum chemistry on a quantum computer requires four layers of translation:
- Molecular integrals — PySCF computes one- and two-electron integrals from the molecular geometry and basis set (STO-3G). For LiH, this means solving the Hartree-Fock equations for 4 electrons in 6 orbitals.
- Active space selection — We can't put all orbitals on a quantum computer. CASCI(2,2) selects 2 electrons in 2 active orbitals, freezing the core. This maps to 4 qubits via Jordan-Wigner.
- Qubit Hamiltonian — OpenFermion converts the fermionic Hamiltonian to a sum of Pauli operators. H2 gives 5 terms. LiH gives 27 terms across 9 measurement groups.
- Measurement circuits — Each Pauli term requires measuring qubits in specific bases (Z, X, or Y). Terms sharing the same basis requirements are grouped into one circuit. H2 needs 3 circuits. LiH needs 9.
Every step is classical computation except the final measurement. The quantum computer’s job is narrow but critical: prepare the trial wavefunction and measure it in the right bases. A classical optimizer then adjusts the circuit parameters to minimize the energy — this loop is VQE (Variational Quantum Eigensolver). Everything else — integrals, mapping, energy reconstruction — runs on a laptop.
H2: The Warmup (2 Qubits, 1 CNOT)
Molecular hydrogen at R=0.735 Å is the canonical VQE (Variational Quantum Eigensolver) test case. The Hamiltonian has 5 Pauli terms (ZI, IZ, ZZ, XX, YY) plus a constant offset, requiring 3 measurement circuits (one each for the Z, X, and Y bases). The ansatz — the parameterized trial circuit — is a single Ry rotation followed by a CNOT. About as simple as quantum chemistry gets.
The QI emulator achieves chemical accuracy (1.3 mHa error, entirely from finite sampling). IBM Fez gets the right quantum state 89.6% of the time but noise pushes the energy up by 87 mHa. Torino, running the same circuit, is 3.8x worse.
H2 platform comparison table
| Platform | Chip | Energy (Ha) | Error (mHa) | |01〉 fidelity |
|---|---|---|---|---|
| QI Emulator | qxelarator | −1.1382 | 1.3 | 98.3% |
| IBM Fez | Heron r2 (156q) | −1.0506 | 87.0 | 89.6% |
| IBM Torino | Heron (133q) | −0.8100 | 328.0 | 76.0% |
FCI (full configuration interaction) reference: −1.1395 Ha (Hartree, the atomic unit of energy). Chemical accuracy threshold: 1.6 mHa (milliHartree, about 1 kcal/mol) — the point where quantum errors become chemically irrelevant.
H2 Potential Energy Surface on the QI Emulator
We also ran the full H2 potential energy surface: 15 bond distances from 0.3 to 3.0 Å, each requiring 3 circuits — 45 circuits total, 184,320 measurements. The emulator traced the exact FCI curve with 10 out of 15 points within chemical accuracy. The remaining 5 points missed by 1.7–3.8 mHa, entirely due to finite-shot statistics.
This confirms that the VQE algorithm itself is exact for H2 — any deviation on real hardware is purely noise.
LiH: The Real Test (4 Qubits, 3 CNOTs)
With H2 as a sanity check, we moved to the real target. Lithium hydride doubles the qubit count and triples the entangling gates: 4 qubits, a 3-CNOT entangling layer, 27 Pauli terms, and 9 measurement circuits. The Hamiltonian was computed entirely from first principles:
- PySCF Hartree-Fock + CASCI(2,2) for LiH at R=1.6 Å
- OpenFermion Jordan-Wigner mapping to 4-qubit Pauli operators
- Hardware-efficient ansatz: two layers of Ry rotations interleaved with a CNOT chain (q0→q1→q2→q3)
- Classical VQE optimization (8 parameters) matched the CASCI energy exactly
The 27 Pauli terms group into 9 measurement circuits. One Z-basis circuit handles the 11 diagonal terms (ZZ, ZI, IZ combinations). Four two-qubit rotation circuits handle 12 terms (paired XX, YY, XY). The remaining four circuits each handle a single four-qubit cross-term (YXXY, YYXX, XXYY, XYYX) — these require rotating all four qubits simultaneously.
The Three-Way Comparison
| Platform | Chip | Active-space E (Ha) | vs Ideal (mHa) | |1111〉 fidelity |
|---|---|---|---|---|
| QI Emulator | qxelarator | −10.7547 | 0.2 | 100% |
| IBM Fez | Heron r2 (156q) | −10.4003 | 354.3 | 81.0% |
Ideal VQE: −10.7546 Ha. The QI emulator is 0.2 mHa off — pure shot noise. IBM Fez is 354 mHa off — hardware noise depolarizing the quantum state.
Where does the 354 mHa error come from? The Z-basis measurement tells the story: the emulator produces |1111〉 100% of the time, while IBM Fez spreads 19% of probability across wrong bitstrings. Every Pauli expectation value is biased toward zero — classic depolarization — adding up to ~300 mHa of systematic energy shift.
Z-basis distribution + per-term Pauli analysis
| State | QI Emulator | IBM Fez |
|---|---|---|
| |1111〉 (correct) | 100.0% | 81.0% |
| |0111〉 | 0% | 5.2% |
| |1110〉 | 0% | 4.3% |
| |1011〉 | 0% | 3.2% |
| Other (12 states) | 0% | 6.3% |
On a perfect device, each Z-basis term should return exactly ±1.0. On IBM Fez, they return 0.73–0.89:
| Pauli term | Coefficient | QI 〈P〉 | IBM 〈P〉 | IBM error |
|---|---|---|---|---|
| ZIII | +0.617 | −1.000 | −0.867 | +0.133 |
| IZII | +0.617 | −1.000 | −0.859 | +0.141 |
| IIZI | +0.371 | −1.000 | −0.892 | +0.108 |
| IIIZ | +0.371 | −1.000 | −0.820 | +0.180 |
| ZZII | −0.122 | +1.000 | +0.765 | −0.235 |
| IIZZ | −0.084 | +1.000 | +0.875 | −0.126 |
Each of the 6 terms is off by 0.1–0.27. Multiplied by their Hamiltonian coefficients (0.05–0.62) and summed, that produces ~300 mHa of systematic error. The off-diagonal terms (XX, YY, XY) contribute less because their coefficients are 10–50x smaller.
How Noise Scales with Circuit Complexity
H2 vs LiH noise scaling comparison
| H2 (2 qubits) | LiH (4 qubits) | Scaling | |
|---|---|---|---|
| Qubits | 2 | 4 | 2x |
| CNOT gates | 1 | 3 | 3x |
| Pauli terms | 5 | 27 | 5.4x |
| Circuits | 3 | 9 | 3x |
| Total shots | 12,288 | 36,864 | 3x |
| QI emulator error | 1.3 mHa | 0.2 mHa | — |
| IBM Fez error | 87 mHa | 354 mHa | 4.1x |
Going from 2 to 4 qubits (and 1 to 3 CNOTs), IBM noise grows 4.1x — faster than the 3x increase in gate count. Each additional CNOT introduces entangling errors and extends the circuit duration, giving decoherence more time to act.
The emulator, by contrast, actually gets better (1.3 → 0.2 mHa). LiH’s prepared state is closer to a computational basis state than H2’s, so shot noise matters less.
The implication for scaling: a 6-qubit molecule like BeH2 would need ~5 CNOTs, and the noise trend suggests IBM errors of 700+ mHa — approaching the Hamiltonian’s entire energy range. Without error mitigation, molecules beyond 4 qubits are likely noise-dominated on current hardware.
TREX Error Mitigation: Does It Help?
For H2, TREX (Twirled Readout EXtraction) achieved chemical accuracy in a single shot. Can it rescue LiH?
We re-ran the LiH experiment using IBM’s EstimatorV2 with resilience_level=1, which enables TREX — a readout error correction technique that randomly flips qubits before measurement and inverts the results, averaging out systematic readout bias.
| Method | Energy (Ha) | Error vs CASCI (mHa) | Improvement |
|---|---|---|---|
| Raw Sampler (IBM Fez) | −7.508 | 354 | — |
| TREX (IBM Fez) | −7.703 | 160 | 2.2x |
| QI Emulator | −7.862 | 0.2 | — |
TREX cut the error roughly in half (354 → 160 mHa), but didn’t come close to chemical accuracy. The per-term analysis reveals why: TREX fixes readout errors cleanly on qubits with low gate noise (ZIII improved from −0.867 to −0.999, nearly perfect), but can’t fix gate errors from the CNOT chain (IZII only improved from −0.859 to −0.902).
This makes physical sense. TREX only corrects the measurement step. For H2 with 1 CNOT, readout error was the dominant noise source, so TREX was enough. For LiH with 3 CNOTs, gate errors from the entangling layer dominate — and those require deeper techniques like zero-noise extrapolation (ZNE) or probabilistic error cancellation (PEC).
The Active Space Limitation
Even with a perfect, noiseless quantum computer, our LiH result wouldn’t match the exact (FCI) energy. The reason is the active space: we only put 2 electrons in 2 orbitals on the quantum computer, while LiH actually has 4 electrons in 6 orbitals. The truncated model captures just 1.3% of the total correlation energy (0.3 out of 20.5 mHa).
Active space energy breakdown
| Method | Energy (Ha) | vs FCI (mHa) |
|---|---|---|
| Hartree-Fock (classical) | −7.8619 | +20.5 |
| CASCI(2,2) / QI emulator | −7.8621 | +20.2 |
| FCI (exact) | −7.8823 | 0.0 |
The quantum computer perfectly solves the problem it was given — the 2-orbital active space Hamiltonian. But the problem itself is too small. To reach chemical accuracy for LiH, you'd need CASCI(4,6) — 12 qubits — or a larger basis set.
This is the fundamental tension in near-term quantum chemistry: the molecules that fit on today's hardware are the ones classical computers already solve easily. The advantage comes at scale — 20+ qubits with active spaces that classical CASCI can't handle. We're not there yet, but the pipeline is ready.
What We Built
The entire experiment was orchestrated by Claude Code using three MCP servers:
- IBM Quantum MCP — submitted 12 circuits (3 for H2, 9 for LiH) to ibm_fez and ibm_torino, polled for results
- QI Circuits MCP — ran 54 circuits on the qxelarator emulator (45 for H2 PES + 9 for LiH)
- Python (PySCF + OpenFermion) — computed molecular integrals, built Hamiltonians, optimized VQE parameters, generated all circuits
The prompt was “compute the ground state energy of lithium hydride.” From there, the AI handled the chemistry, the qubit mapping, the circuit generation, the job submission, and the energy reconstruction from raw measurement counts. No manual circuit writing, no copy-pasting QASM.
Bottom Line
We proved three things:
- The QI emulator is a reliable quantum chemistry reference. Chemical accuracy on H2 (1.3 mHa) and perfect CASCI reproduction on LiH (0.2 mHa). If your algorithm works here, the algorithm is correct.
- IBM Fez hardware gives qualitatively correct results, but needs error mitigation. 81% state fidelity and the correct dominant state on LiH, but 354 mHa of raw noise. TREX halved this to 160 mHa by fixing readout errors, but the 3-CNOT circuit introduces gate errors that TREX can’t touch. Reaching chemical accuracy on LiH will require deeper mitigation (ZNE or PEC).
- The first-principles pipeline works end-to-end. Geometry → integrals → qubit Hamiltonian → circuits → hardware → energy. Every step is automated. Changing the molecule means changing one line of code.
Next: LiH with ZNE or PEC on IBM Fez to push past TREX’s 160 mHa floor. BeH2 (6 qubits) on the QI emulator. And eventually, LiH on Tuna-9 hardware — 9 qubits is more than enough for CASCI(2,2), and we already know Tuna-9’s readout correction works well for VQE.
All experiment code and data: github.com/dereklomas/haiqu. Analysis scripts: experiments/lih_compare.py, experiments/h2_vqe_compare.py. Interactive Hamiltonian explorer: haiqu.org/hamiltonians.
Sources & References
- PySCF: Python-based simulations of chemistry frameworkhttps://pyscf.org/
- OpenFermion: quantum chemistry on quantum computershttps://quantumai.google/openfermion
- Peruzzo et al. 2014 — original VQE paperhttps://arxiv.org/abs/1304.3061
- Cross-platform quantum comparison/blog/cross-platform-quantum-comparison
- Error mitigation showdown/blog/error-mitigation-showdown
- Quantum MCP servers/blog/quantum-mcp-servers
- IBM Quantumhttps://quantum.ibm.com/
- Quantum Inspirehttps://www.quantum-inspire.com/