LiH (lithium hydride) — the molecule we computed from first principles and ran on three quantum platforms.

Experiment2026-02-12AI x Quantum Research Team

We Computed a Molecule from First Principles and Ran It on Three Quantum Platforms

From molecular geometry to quantum hardware measurements in one automated pipeline. The emulator nailed it. IBM Fez tried its best.

VQELiHH2PySCFOpenFermionIBM QuantumQuantum Inspirequantum chemistryfirst principlesCASCI

Most quantum chemistry tutorials start with a Hamiltonian that's already been computed for you. We wanted to see the full pipeline: start from a molecule's geometry, compute the electronic structure from scratch, map it to qubits, generate measurement circuits, and run them on real hardware.

The result: two molecules (H₂ and LiH), three platforms (QI emulator, IBM Fez, IBM Torino), 12 unique circuit designs, and nearly 300,000 total measurements. Here’s what we learned.

The Pipeline

Quantum chemistry on a quantum computer requires four layers of translation:

Molecular integrals — PySCF computes one- and two-electron integrals from the molecular geometry and basis set (STO-3G). For LiH, this means solving the Hartree-Fock equations for 4 electrons in 6 orbitals.
Active space selection — We can't put all orbitals on a quantum computer. CASCI(2,2) selects 2 electrons in 2 active orbitals, freezing the core. This maps to 4 qubits via Jordan-Wigner.
Qubit Hamiltonian — OpenFermion converts the fermionic Hamiltonian to a sum of Pauli operators. H₂ gives 5 terms. LiH gives 27 terms across 9 measurement groups.
Measurement circuits — Each Pauli term requires measuring qubits in specific bases (Z, X, or Y). Terms sharing the same basis requirements are grouped into one circuit. H₂ needs 3 circuits. LiH needs 9.

Every step is classical computation except the final measurement. The quantum computer’s job is narrow but critical: prepare the trial wavefunction and measure it in the right bases. A classical optimizer then adjusts the circuit parameters to minimize the energy — this loop is VQE (Variational Quantum Eigensolver). Everything else — integrals, mapping, energy reconstruction — runs on a laptop.

H₂: The Warmup (2 Qubits, 1 CNOT)

Molecular hydrogen at R=0.735 Å is the canonical VQE (Variational Quantum Eigensolver) test case. The Hamiltonian has 5 Pauli terms (ZI, IZ, ZZ, XX, YY) plus a constant offset, requiring 3 measurement circuits (one each for the Z, X, and Y bases). The ansatz — the parameterized trial circuit — is a single Ry rotation followed by a CNOT. About as simple as quantum chemistry gets.

The QI emulator achieves chemical accuracy (1.3 mHa error, entirely from finite sampling). IBM Fez gets the right quantum state 89.6% of the time but noise pushes the energy up by 87 mHa. Torino, running the same circuit, is 3.8x worse.

H₂ platform comparison table

Platform	Chip	Energy (Ha)	Error (mHa)	\|01⟩ fidelity
QI Emulator	qxelarator	−1.1382	1.3	98.3%
IBM Fez	Heron r2 (156q)	−1.0506	87.0	89.6%
IBM Torino	Heron (133q)	−0.8100	328.0	76.0%

FCI (full configuration interaction) reference: −1.1395 Ha (Hartree, the atomic unit of energy). Chemical accuracy threshold: 1.6 mHa (milliHartree, about 1 kcal/mol) — the point where quantum errors become chemically irrelevant.

H₂ Potential Energy Surface on the QI Emulator

We also ran the full H₂ potential energy surface: 15 bond distances from 0.3 to 3.0 Å, each requiring 3 circuits — 45 circuits total, 184,320 measurements. The emulator traced the exact FCI curve with 10 out of 15 points within chemical accuracy. The remaining 5 points missed by 1.7–3.8 mHa, entirely due to finite-shot statistics.

This confirms that the VQE algorithm itself is exact for H₂ — any deviation on real hardware is purely noise.

LiH: The Real Test (4 Qubits, 3 CNOTs)

With H₂ as a sanity check, we moved to the real target. Lithium hydride doubles the qubit count and triples the entangling gates: 4 qubits, a 3-CNOT entangling layer, 27 Pauli terms, and 9 measurement circuits. The Hamiltonian was computed entirely from first principles:

PySCF Hartree-Fock + CASCI(2,2) for LiH at R=1.6 Å
OpenFermion Jordan-Wigner mapping to 4-qubit Pauli operators
Hardware-efficient ansatz: two layers of Ry rotations interleaved with a CNOT chain (q0→q1→q2→q3)
Classical VQE optimization (8 parameters) matched the CASCI energy exactly

The 27 Pauli terms group into 9 measurement circuits. One Z-basis circuit handles the 11 diagonal terms (ZZ, ZI, IZ combinations). Four two-qubit rotation circuits handle 12 terms (paired XX, YY, XY). The remaining four circuits each handle a single four-qubit cross-term (YXXY, YYXX, XXYY, XYYX) — these require rotating all four qubits simultaneously.

The Three-Way Comparison

Platform	Chip	Active-space E (Ha)	vs Ideal (mHa)	\|1111⟩ fidelity
QI Emulator	qxelarator	−10.7547	0.2	100%
IBM Fez	Heron r2 (156q)	−10.4003	354.3	81.0%

Ideal VQE: −10.7546 Ha. The QI emulator is 0.2 mHa off — pure shot noise. IBM Fez is 354 mHa off — hardware noise depolarizing the quantum state.

Where does the 354 mHa error come from? The Z-basis measurement tells the story: the emulator produces |1111⟩ 100% of the time, while IBM Fez spreads 19% of probability across wrong bitstrings. Every Pauli expectation value is biased toward zero — classic depolarization — adding up to ~300 mHa of systematic energy shift.

Z-basis distribution + per-term Pauli analysis

State	QI Emulator	IBM Fez
\|1111⟩ (correct)	100.0%	81.0%
\|0111⟩	0%	5.2%
\|1110⟩	0%	4.3%
\|1011⟩	0%	3.2%
Other (12 states)	0%	6.3%

On a perfect device, each Z-basis term should return exactly ±1.0. On IBM Fez, they return 0.73–0.89:

Pauli term	Coefficient	QI ⟨P⟩	IBM ⟨P⟩	IBM error
ZIII	+0.617	−1.000	−0.867	+0.133
IZII	+0.617	−1.000	−0.859	+0.141
IIZI	+0.371	−1.000	−0.892	+0.108
IIIZ	+0.371	−1.000	−0.820	+0.180
ZZII	−0.122	+1.000	+0.765	−0.235
IIZZ	−0.084	+1.000	+0.875	−0.126

Each of the 6 terms is off by 0.1–0.27. Multiplied by their Hamiltonian coefficients (0.05–0.62) and summed, that produces ~300 mHa of systematic error. The off-diagonal terms (XX, YY, XY) contribute less because their coefficients are 10–50x smaller.

How Noise Scales with Circuit Complexity

H₂ vs LiH noise scaling comparison

	H₂ (2 qubits)	LiH (4 qubits)	Scaling
Qubits	2	4	2x
CNOT gates	1	3	3x
Pauli terms	5	27	5.4x
Circuits	3	9	3x
Total shots	12,288	36,864	3x
QI emulator error	1.3 mHa	0.2 mHa	—
IBM Fez error	87 mHa	354 mHa	4.1x

Going from 2 to 4 qubits (and 1 to 3 CNOTs), IBM noise grows 4.1x — faster than the 3x increase in gate count. Each additional CNOT introduces entangling errors and extends the circuit duration, giving decoherence more time to act.

The emulator, by contrast, actually gets better (1.3 → 0.2 mHa). LiH’s prepared state is closer to a computational basis state than H₂’s, so shot noise matters less.

The implication for scaling: a 6-qubit molecule like BeH₂ would need ~5 CNOTs, and the noise trend suggests IBM errors of 700+ mHa — approaching the Hamiltonian’s entire energy range. Without error mitigation, molecules beyond 4 qubits are likely noise-dominated on current hardware.

TREX Error Mitigation: Does It Help?

For H₂, TREX (Twirled Readout EXtraction) achieved chemical accuracy in a single shot. Can it rescue LiH?

We re-ran the LiH experiment using IBM’s EstimatorV2 with resilience_level=1, which enables TREX — a readout error correction technique that randomly flips qubits before measurement and inverts the results, averaging out systematic readout bias.

Method	Energy (Ha)	Error vs CASCI (mHa)	Improvement
Raw Sampler (IBM Fez)	−7.508	354	—
TREX (IBM Fez)	−7.703	160	2.2x
QI Emulator	−7.862	0.2	—

TREX cut the error roughly in half (354 → 160 mHa), but didn’t come close to chemical accuracy. The per-term analysis reveals why: TREX fixes readout errors cleanly on qubits with low gate noise (ZIII improved from −0.867 to −0.999, nearly perfect), but can’t fix gate errors from the CNOT chain (IZII only improved from −0.859 to −0.902).

This makes physical sense. TREX only corrects the measurement step. For H₂ with 1 CNOT, readout error was the dominant noise source, so TREX was enough. For LiH with 3 CNOTs, gate errors from the entangling layer dominate — and those require deeper techniques like zero-noise extrapolation (ZNE) or probabilistic error cancellation (PEC).

The Active Space Limitation

Even with a perfect, noiseless quantum computer, our LiH result wouldn’t match the exact (FCI) energy. The reason is the active space: we only put 2 electrons in 2 orbitals on the quantum computer, while LiH actually has 4 electrons in 6 orbitals. The truncated model captures just 1.3% of the total correlation energy (0.3 out of 20.5 mHa).

Active space energy breakdown

Method	Energy (Ha)	vs FCI (mHa)
Hartree-Fock (classical)	−7.8619	+20.5
CASCI(2,2) / QI emulator	−7.8621	+20.2
FCI (exact)	−7.8823	0.0

The quantum computer perfectly solves the problem it was given — the 2-orbital active space Hamiltonian. But the problem itself is too small. To reach chemical accuracy for LiH, you'd need CASCI(4,6) — 12 qubits — or a larger basis set.

This is the fundamental tension in near-term quantum chemistry: the molecules that fit on today's hardware are the ones classical computers already solve easily. The advantage comes at scale — 20+ qubits with active spaces that classical CASCI can't handle. We're not there yet, but the pipeline is ready.

What We Built

The entire experiment was orchestrated by Claude Code using three MCP servers:

IBM Quantum MCP — submitted 12 circuits (3 for H₂, 9 for LiH) to ibm_fez and ibm_torino, polled for results
QI Circuits MCP — ran 54 circuits on the qxelarator emulator (45 for H₂ PES + 9 for LiH)
Python (PySCF + OpenFermion) — computed molecular integrals, built Hamiltonians, optimized VQE parameters, generated all circuits

The prompt was “compute the ground state energy of lithium hydride.” From there, the AI handled the chemistry, the qubit mapping, the circuit generation, the job submission, and the energy reconstruction from raw measurement counts. No manual circuit writing, no copy-pasting QASM.

Bottom Line

We proved three things:

The QI emulator is a reliable quantum chemistry reference. Chemical accuracy on H₂ (1.3 mHa) and perfect CASCI reproduction on LiH (0.2 mHa). If your algorithm works here, the algorithm is correct.
IBM Fez hardware gives qualitatively correct results, but needs error mitigation. 81% state fidelity and the correct dominant state on LiH, but 354 mHa of raw noise. TREX halved this to 160 mHa by fixing readout errors, but the 3-CNOT circuit introduces gate errors that TREX can’t touch. Reaching chemical accuracy on LiH will require deeper mitigation (ZNE or PEC).
The first-principles pipeline works end-to-end. Geometry → integrals → qubit Hamiltonian → circuits → hardware → energy. Every step is automated. Changing the molecule means changing one line of code.

Next: LiH with ZNE or PEC on IBM Fez to push past TREX’s 160 mHa floor. BeH₂ (6 qubits) on the QI emulator. And eventually, LiH on Tuna-9 hardware — 9 qubits is more than enough for CASCI(2,2), and we already know Tuna-9’s readout correction works well for VQE.

All experiment code and data: github.com/dereklomas/haiqu. Analysis scripts: experiments/lih_compare.py, experiments/h2_vqe_compare.py. Interactive Hamiltonian explorer: haiqu.org/hamiltonians.