Early fault-tolerant simulations of the Hubbard model

Simulation of the Hubbard model is a leading candidate for the first useful applications of a fault-tolerant quantum computer. A recent study of quantum algorithms for early simulations of the Hubbard model [Kivlichan et al 2019 Quantum 4 296] found that the lowest resource costs were achieved by split-operator Trotterization combined with the fast-fermionic Fourier transform (FFFT) on an L × L lattice with length L = 2 k . On lattices with length L ≠ 2 k , Givens rotations can be used instead of the FFFT but lead to considerably higher resource costs. We present a new analytic approach to bounding the simulation error due to Trotterization that provides much tighter bounds for the split-operator FFFT method, leading to 16× improvement in error bounds. Furthermore, we introduce plaquette Trotterization that works on any size lattice and apply our improved error bound analysis to show competitive resource costs. We consider a phase estimation task and show plaquette Trotterization reduces the number of non-Clifford gates by a factor 5.5× to 9× (depending on the parameter regime) over the best previous estimates for 8 × 8 and 16 × 16 lattices and a much larger factor for other lattice sizes not of the form L = 2 k . In conclusion, we find there is a potentially useful application for fault-tolerant quantum computers using around one million Toffoli gates.

Simulation of the Hubbard model is a leading candidate for the first useful applications of a fault-tolerant quantum computer.A recent study of quantum algorithms for early simulations of the Hubbard model [Kivlichan et al. Quantum 4 296 (2019)] found that the lowest resource costs were achieved by split-operator Trotterization combined with the fast-fermionic Fourier transform (FFFT) on an L × L lattice with length L = 2 k .On lattices with length L ̸ = 2 k , Givens rotations can be used instead of the FFFT but lead to considerably higher resource costs.We present a new analytic approach to bounding the simulation error due to Trotterization that provides much tighter bounds for the split-operator FFFT method, leading to 16× improvement in error bounds.Furthermore, we introduce plaquette Trotterization that works on any size lattice and apply our improved error bound analysis to show competitive resource costs.We consider a phase estimation task and show plaquette Trotterization reduces the number of non-Clifford gates by a factor 5.5× to 9× (depending on the parameter regime) over the best previous estimates for 8 × 8 and 16 × 16 lattices and a much larger factor for other lattice sizes not of the form L = 2 k .In conclusion, we find there is a potentially useful application for fault-tolerant quantum computers using around one million Toffoli gates.
Free, non-interacting, fermionic systems can be efficiently solved on a classical computer and occasionally by pen and paper calculation, thereby determining their exact energy spectrum.In contrast, even simple interacting electronic systems can prove intractable.Hubbard proposed a simple model of interacting electrons that describes the physics of real world systems of interest [1] and might elucidate the mechanisms behind high-temperature superconductivity.Nevertheless, it is sufficiently complex that we can not currently simulate the Hubbard model for large lattices at high accuracy.An important quantity is the ground state energy per lattice site in the two dimensional square lattice (TDL).Density matrix renormalization group approximates the TDL with finite cylinders of widths of up to 7 sites and lengths of up to 64 sites [2].This technique has an uncontrolled, systematic error due to finite width.However, using a variety of simulation techniques [2], each with their own source of uncontrolled error, there is agreement up to ±0.5% in the ground state energy density in the weak doping regime.This error is regarded as the aggregated accuracy of current classical methods and achieving better accuracy is extremely challenging.From a complexity theory perspective, the ability to find the ground states of any Hubbard model Hamiltonian [3] would enable a solution of any problem in the complexity class Quantum-Merlin-Arthur (QMA).
The Hubbard model is one of the earliest examples where quantum algorithms have been proposed [4].Recently, the Hubbard model has been identified as a potential application for noisy, near term quantum computers [5][6][7].However, these are heuristic algorithms without performance guarantees, and noise may prohibit them from achieving the required accuracy to be competitive with classical simulations.Building a faulttolerant device is a longer term goal, but offers rigorous performance guarantees and gentle resource scaling with respect to accuracy.Kivlichan et al [8] considered estimation of the ground state energy of the Hubbard TDL using phase estimation [9,10] and second-order Trotter formulae.For instance, for an 8 × 8 Hubbard simulation at 0.5% relative error, they estimated that a transmon-based surface code architecture may require only 62,000 physical qubits and 23 million non-Clifford T gates.While such a device would be much larger than any currently existing, these resource estimates are orders of magnitude smaller than those needed to break RSA [11,12], or even more costly, to perform combinatorial optimisation [13,14].
For this task, Kivlichan et al [8] considered several variants of second-order Trotter [15][16][17][18][19] to approximate the unitary used in phase estimation.The best variant they reported was called split-operator with a fast fermionic Fourier transform [20], abbreviated as SO-FFFT.When investigating quantities such as the energy density, which converge towards some infinite lattice value, it is crucial to have fine control over lattice size to understand the rate of convergence and related finite size effects.Unfortunately, SO-FFFT only works for lattices with length and width of the form L = 2 k for integer k.While Kivlichan et al [8] also explored approaches suitable for any lattice size, the resource costs were around an order of magnitude larger than SO-FFFT.
More modern techniques than Trotterization have dramatically improved the asymptotic scaling with respect to accuracy using linear combinations of unitaries [21] methods such as truncated Taylor series [22][23][24] and qubitization [25][26][27][28].However, for relative errors in the range 0.5% − 0.05%, the total permitted error is quite large so that constant factors are more important than asymptotic scaling.In this regime, it is a game of constant factors, and so far refinements of second-order Trotterization have been more fruitful than development of more sophisticated algorithms.
In this work, we both tighten the resource analysis of previous second-order Trotter approaches including SO-FFFT and introduce a new method that we called plaquette Trotterization.We begin by analysing the Trotter error per step starting from a commutator bound [8,19,29].Exactly evaluating the commutator bound is itself very difficult.Typically, commutator bounds are further loosened into a sum over many nested commutators between local operators (achieved by liberal use of the triangle inequality), which are enumerated computationally or by hand.Instead, we present a tighter, novel approach to evaluating the commutator bound in terms of freefermionic Hamiltonians that can then be efficiently evaluated.This accounts for a large portion of our observed resource savings over the SO-FFFT estimates of Kivlichan et al [8].We further discuss the synthesis of Z-axis rotations into T -gates and the trade-offs between space and time costs due to Hamming weight phasing.We find that for phase estimation of the TDL ground state density, our improvements gives 5.5× to 9× reductions in the number of T -gates when compared against SO-FFFT on the 8 × 8 and 16 × 16 lattices.Plaquette Trotterization can be used at all other lattices sizes where previous resource estimates were significantly higher.
Other variants of Trotter exist that we do not quantitatively study here.Higher order Trotter has asymptotically better scaling than second order Trotter, but worse constant factors.Furthermore, the complexity of evaluating the commutator bound increases rapidly with the Trotter order, so exact performance estimates are difficult, perhaps even intractable.Nevertheless, since we are interested in a pre-asymptotic regime with large relative error, this favours second order Trotter.Randomly compiled Trotter schemes such as qDRIFT [30,31] can eliminate dependence on the number of Hamiltonian terms at the cost of worse scaling with respect to other parameters.This can be especially attractive for N -qubit quantum chemistry Hamiltonians with O(N 4 ) terms [32], but Hubbard model Hamiltonians are very sparse with only O(N ) local terms, and only 2 Hamiltonian terms when grouped into interaction and hopping terms.For such sparse Hamiltonians, qDRIFT will typically perform worse than standard second order Trotter.

I. THE HUBBARD MODEL HAMILTONIAN
We consider Hamiltonians H = H h + H I with hopping terms of the form where σ labels the spin and Hermiticity entails that R i,j = R * j,i and we set R i,i = 0.This part of the Hamiltonian is free fermionic and therefore exactly solvable.Here we quickly review the most salient facts, but refer the reader to App.A for a longer discussion.We can find a unitary U that diagonalizes R, so the eigenvalues of R give the energy of the canonical fermionic modes of the Hamiltonian.Due to R i,i = 0, the spectrum is symmetric about zero and the maximum (minimum) energies of H h correspond to summing over all the positive (negative) energy fermionic modes.This leads to ||H h || = ||R|| 1 where || . . .|| is the operator norm and || . . .|| 1 is the Schatten 1-norm or trace norm.
Before we define our form for H I , we define the more commonly encountered unshifted form where i indexes lattice sites and n = a † a is the number operator and u is the interaction strength.Shifting the chemical potential of the interaction Hamiltonian and adding the identity operator, we obtain the shifted interaction Hamiltonian where N is the total electron number operator Since the chemical potential shift (u/2) N and identity shift (u/4)1 1 both commute with H h and HI , we know the shifted and unshifted Hamiltonians commute and share an energy eigenbasis composed of eigenstates of N .
Given such an eigenstate |ψ⟩ with η electrons (so that N |ψ⟩ = η|ψ⟩), then from (H h + HI )|ψ⟩ = Ẽ|ψ⟩ we know (H h + H I )|ψ⟩ = E|ψ⟩ with E = Ẽ − uη/2 + u/4.Therefore, within an η-electron subspace, shifting the Hamiltonian changes the spectrum by a known additive constant that can be removed by classical processing.We will see that shifting the chemical potential both tightens Trotter error bounds and reduces gate count per Trotter step.Herein, we write our interaction Hamiltonian as where we introduced the shorthand ẑ := 2n − 1 1 and assume throughout that u > 0. Many of our results hold for a general Hubbard Hamiltonian, but we also focus on TDL where R i,j = τ when i and j are nearest neighbours on a square lattice and otherwise R i,j = 0.For TDL, the hard simulation regime corresponds to 4 < u/τ < 12 and 10% − 20% below halffilling [2].

II. IMPROVING THE SPLIT-OPERATOR ALGORITHM
The second order where the second product runs in reverse index ordering.While there is a product of 2m unitaries, this can be reduced to 2(m − 1) by merging the middle two exponentials and the first/last exponentials.Given such a decomposition, the error in the Trotter step is bounded by ϵ ≤ W s 3 where W is a constant established by the well-known commutator bounds [8,19,29] The value of W is highly dependent on our choice of Trotterization decomposition and can be difficult to evaluate.The split-operator (SO) approach uses two terms H 1 = H I and H 2 = H h (or interchanged).However, to be implemented on a quantum computer e isH h requires further decomposition.For any hopping Hamiltonian, it is possible to exactly diagonalize e isH h using a product of O(L 4 ) Givens rotations [33][34][35], and while this is formally efficient, O(L 4 ) is very large in practice.For the special case of a periodic square L×L lattice with L = 2 k for integer k, we can instead use a fast-fermionic Fourier transform [20] (FFFT) to diagonalize e isH h and we refer to this approach as SO-FFFT.It was found by Kivlichan et al. [8] that SO-FFFT achieves gate counts that are orders of magnitude better than using Givens rotations or any other approach that they considered.We improve on the prior analysis of SO-FFFT in two regards and refer to our improved results as SO-FFFT+.First, we reduce the gate complexity of implementing e isH I by exploiting our choice of chemical shift.Using the standard form of H I , each term ni,↑ ni,↓ transforms under Jordan-Wigner to 3 non-trivial Pauli operators, and so e isH I requires 3L 2 Z-axis rotations.Using our shifted Hamiltonian, each ẑi,↑ ẑi,↓ term Jordan-Wigner transforms to a single Pauli operator of the form Z ⊗ Z, so we only need L 2 arbitrary Z-axis rotations to realise e isH I .Some Z-axis rotations are also required to realised e isH h , and our resulting resource savings are summarised in Table.I. Second, we greatly improve the Trotter error bound W SO for split-operator methods.Kivlichan et al. [8] bounded W SO by expanding the commutator bounds in terms of Pauli operators, liberally applying the triangle inequality and computationally counting the number of non-zero terms.However, each application of the triangle inequality dramatically loosens the bound.Whereas, for the TDL we prove that W SO ≤ min[W SO1 , W SO2 ] where where ||H h || = ||R|| 1 can be efficiently computed because H h is free-fermionic.The minimisation over W SO1 and W SO2 is due to the two possible ordering for SO.We give an example of the resulting improvements in Table I, where the reductions average 16×.The proof has two main steps.First, bounding , which is the most technical step.Then combining these bounds to obtain W SO1 and W SO2 is straightforward (see App. 1 for details).Each step of the proof will introduce a lemma that is applicable to any Hubbard Hamiltonian, so the results can be easily extended beyond TDL.
First Lemma and proof sketch.-Ourfirst bound is the especially elegant result that for any Hubbard Hamiltonian We provide a detailed proof in App.C 2 and sketch the main steps here.Using the anti-commutation relations of fermionic operators, we have where B i,j = σ R i,j a † i,σ a j,σ .We will construct a unitary transformation to simplify the first set of terms.We define the unitary V j := (1 1 + iẑ j,↑ ẑj,↓ )/ √ 2 and use anticommutation relations to verify that Furthermore, for k ̸ = i, j the unitary V k commutes with B i,j so we can consider the unitary over all sites and we have Summing over i ̸ = j, gives and so Taking the operator norm, making a single application of the triangle inequality and using unitary invariance of the norm we deduce Eq. (11).
Second Lemma and proof sketch.-Forany Hubbard Hamiltonian we have where T i = j (B i,j + B † i,j )/2.Since T i is free-fermionic we can efficiently evaluate ∥T i ∥ 2 .Furthermore, the commutator of two free-fermionic Hamiltonians, is again free fermionic so each ∥[T i , H h ]∥ is efficiently computable.A detailed proof is given in App.C 3 and here we outline the main steps.
The first step in proving this lemma is an application of the triangle inequality, so that Next we use anti-commutation relations to find Taking the operator norm, applying the triangle inequality, using ∥z i,↑ z i,↓ ∥ = 1, and summing over all indices i gives Eq. ( 16).For the special case of TDL, one finds that whenever L ≥ 4 these take constant values ∥T i ∥ = 4τ and Since there are L 2 values of i, the summation gives which leads to one term in each of Eq. ( 9) and Eq.(10).

III. PLAQUETTE TROTTERIZATION
In addition to improving the analysis of SO-FFFT, we introduce a different second-order Trotter layering that we call plaquette Trotterization or PLAQ for short.A significant shortcoming of SO-FFFT is that the FFFT only works for lattices of size L = 2 k and the FFFT requires a substantial number of non-Cliffords to perform.We partition the edges of the square lattices into two colours, pink and gold, such that each edge belongs to a single plaquette (as shown in Fig. 1) and we partition the Hamiltonian as H h = H p h + H g h where the H p h (H g h ) contains all the interactions corresponding to pink (gold) edges.Such a partition is possible whenever L is even, though PLAQ can be easily modified to also work for odd L by having a small number of plaquettes with only 1 or 2 edges.Setting H 1 = H I , H 2 = H p h and H 3 = H g h and using the general bound of Eq. ( 8) we obtain a bound containing W SO2 of Eq. ( 10) and some additional terms.
|| the additional terms can be combined to obtain Since all the additional terms involved are free-fermionic these can be efficiently computed (for details see App C 3 and Table III therein).Table I shows that this leads to only a slight increase for PLAQ error bounds relative to SO-FFFT+.To implement unitaries of the form e isH p h requires some further decomposition.We note that where is a plaquette Hamiltonian acting on the 4 hopping terms corresponding to the k th plaquette in the pink set and with spin index σ.Each of these individual plaquettes commute so,

IV. ROTATION SYNTHESIS
When L = 2 k , we can compare the gate counts per Trotter step of SO-FFFT+ and PLAQ.The total T -gates plus Pauli rotations for SO-FFFT+ is larger than for PLAQ, and when L = 16 this difference is over a factor 2.7×.However, in a fault-tolerant device arbitrary angle Z-axis rotations require further synthesis, and using standard techniques [37][38][39][40][41] this is typically in the range of 10 − 50 T -gates per Z-axis rotation.Such counting suggests SO-FFFT+ has a lower T -count than PLAQ on applicable lattice sizes.However, if we use a more sophisticated synthesis strategy described below, then PLAQ achieves a lower gate count per Trotter step.
When a collection of Z-axis rotations have equal angle (up to a unimportant sign), they can be more efficiently synthesized using Hamming weight phasing (HWP) [8,42].The basic idea of HWP, is that the rotation ⊗ m j e iθZj acting on m qubits in computational basis state |x⟩ will impart a phase e iθ(2w(x)−m) where w(x) = j x j is the Hamming weight of bit-string x.
Upper bounds on the total Toffoli count (all T -gates catalyzed using Toffoli states) for phase estimation of the TDL with u = 4, τ = 1 and additive error ϵ = 0.0051L 2 (approx.half a percent of the total system energy [8,36]).In (i) we fix L = 8 and compare PLAQ and SO-FFFT+ allowing for a varying number of logical ancilla α used for Hamming weight phasing.In (ii), we fix the number of logical ancilla to L 2 /2 and vary L. Also shown in (ii) are the previous data points for SO-FFFT taken from Table 1 of Kivlichan et al. [8].
Using an ancillary register of size log 2 (m) and computing a binary representation of the Hamming weight of x, so |x⟩ → |x⟩|w(x)⟩ we can then apply ⌈log 2 (m) + 1⌉ Z-axis rotations to the ancillary register in the state |w(x)⟩ in order to acquire the correct phase and then uncompute the Hamming weight calculation.This exponentially reduces the number of Z-axis rotations required, but requires α additional ancilla and α Toffoli gates (equivalently 4α Tgates [43]).Gidney showed [8,42] that when m = 2 k we have α = m − 1 and that α ≤ m − 1 for general m.In the limit of large m, an arbitrary rotation now costs closer to 4 T -gates.We present more details on HWP in App E 2. We remark that by combining the results of Ref. [44] and Ref. [45] one can tighten Gidney's bounds to show the Hamming weight can be computed with the cost determined by α = m − w(m) where w(m) is the Hamming weight of the binary representation of integer m.
PLAQ is especially well suited to exploit HWP because in e isH p I or e isH g I , every Z-axis rotation is of equal angle.TABLE II.Resources estimates using PLAQ to perform phase estimation of the TDL with u/τ = 4 and u/τ = 8 and error of approximately half a percent of the total system energy [8,36]).The total number of logical qubits is NQ = 2L 2 + α + 2 with α = L 2 /2 qubits used for Hamming weight phasing.The +2 refers to 1 qubit used for phase estimation and 1 qubit used for repeat until success synthesis [37].If T gates are performed by catalysis then an additional qubits is needed.Within a fault-tolerant quantum computer, NQ refers to the number of logical qubits.To obtain a full architecturespecific resource estimate one must also include the overhead of quantum error correction, the footprint for any magic state distillation factories, any routing overheads or storage of a Tcatalyst.
In contrast, SO-FFFT+ can only partly exploit HWP because the rotation angles depends on the eigenvalues of R (recall Eq. 1).While the matrix R has some degeneracy that can be exploited, it can not exploit HWP to the same, comprehensive extent as PLAQ.Therefore, given the numbers in Table I, we expect that HWP can lend an advantage to PLAQ over SO-FFFT+ (for L = 2 k lattices) and in the next section we find that this is indeed the case.

V. PHASE ESTIMATION
Next, we consider using the phase estimation algorithm using the same assumptions and parameters used in Kivlichan et al. [8].In particular, we ideally wish to perform phase estimation on the unitary exp(iHt).However, due to Trotter errors we instead perform some other unitary U that approximates exp(iHt).By unitarity of U , we can always find an effective Hamiltonian H eff such that U = exp(iH eff t).Furthermore, from App E of Ref. [8], we know that H eff can be chosen so that ∥H − H eff ∥ ≤ W t 2 (assuming the regime that W t 3 ≪ 1, which we always numerically verify) and where W is the Trotter error constant bounded earlier.Whatever value of W we obtain, we can always reduce the error by decreasing t.However, smaller t entails more bits of precision are needed in phase estimation, leading to more uses of U and higher gate complexity.Indeed, optimising the value of t, one finds that the optimal gate complexity is proportional to √ W (see App. F for details).Assuming preparation of an approximate ground state with fidelity f overlap with the true ground state, then an ideal implementation of phase estimation would with probability f return the ground state energy (up to additive error W t 2 ).How to efficiently prepare states with sufficiently good overlap with the ground state is an important and non-trivial problem that we will not address here.
Our results for PLAQ are presented in Table II and the resource counting methodology is described in more detail in App.F. In Fig. 2, we convert all gates to Toffoli gates and make a comparison between PLAQ and spiltoperator estimates.We choose to work in units of Toffoli gates instead of T -gates since recent, low-overhead magic state distillation factories output Toffoli states [46,47].The Toffoli count for PLAQ can be reproduced from Table II by assuming that 2 T -gates can be performed using 1 Toffoli gate (given access to a catalyst [42,46,48]) so that the total Toffoli count is N TOF + (N T /2).
Fig. (2i ) illustrates the trade-off between gate count and the number of ancilla used for HWP.We see that when few ancilla are available to support HWP, then SO-FFFT+ has lower T -count than PLAQ.However, PLAQ is better able to exploit HWP, so as we increase the available ancillas it outperforms SO-FFFT+.
In Fig. (2ii ) we allocate a constant fraction of all qubits to HWP and vary L, observing comparable performance between PLAQ and SO-FFFT+.Crucially, PLAQ improves on the best previous T -counts of SO-FFFT by a factor 5.5× (at L = 8) and a factor 7.8× (at L = 16) and provides this performance across all even L rather than the sparse data-points provided by the split-operator method.Recall that Givens rotations and fermionic swap networks can operate at any L value, but they have orders of magnitude higher Toffoli counts and would not be visible on the scale of our plots.A curious feature is that the Toffoli count reduces with L, which is a consequence of allowing for a relative error that grows with L 2 .The growth in the allowed error leads to a reduction in the number of Trotter steps needed, which almost exactly matched the O(L 2 ) gate-count per Trotter step.In a fault-tolerant device, a factor r reduction in T -count will lead to a factor r reduction in runtime and consequently also slight reduction in the number of physical qubits per logical qubit.The physical qubit requirements and algorithm runtime for specific architectures is discussed further in the companion paper Ref. [47] VI.CONCLUSIONS The Hubbard model was already identified as an interesting physical system to simulate on an early faulttolerant quantum computer.We have further improved these prospects by tightening error bounds and reducing gate complexity, so that now there is a potential demon-stration of a useful quantum advantage requiring fewer than 1 million Toffoli gates.To the best of our knowledge, this is the lowest Toffoli count amongst rigorously studied quantum algorithms outside the reach of current classical computers.This makes implementing this algorithm a compelling milestone goal for fault-tolerant quantum computers.
Theorem 1 For the Hubbard model on an L × L square lattice with periodic boundary conditions, the second order trotter formaule obey where Note that ||H h || can be efficiently computed and we give some values in Table III.
Compared to the bounds numerically obtained by Kivlichan et al. [8] for the parameter choices u/τ = 4, 8, we find that our bounds are about 16 times better.Note that, technically ours is a bound on a different Trotter sequence because we have shifted the chemical potential of the Hamiltonian.Unless u ≫ τ , the best bound is given by W SO1 .
Proof.-We make use of well known commutator bounds for second order Trotter [8,19] so that and

First lemma for Trotter error bounds
Here we prove the following Lemma Lemma 1 For any Hubbard-type Hamiltonian where H I = (u/4) i ẑi,↑ ẑi,↓ is the interacting Hamiltonian and H h is the hopping part, we have the following bound Furthermore, for the special case of the Hubbard model on an L × L square lattice with periodic boundary conditions, in Table III we provide precomputed values of ∥H h ∥/τ so that Eq.C4 can be easily evaluated.Here we present a proof of the Lemma in full generality.
Throughout we use the notation where Our first step is to compute [H I , H h ].We see that [ẑ k,↑ ẑk,↓ , B i,j ] = 0 when i, j, k are distinct and otherwise they anti-commute (see App. B).Therefore, Nesting the commutator again with H I and we get where we have used that [ẑ i,↑ ẑi,↓ , H I ] = 0 and we replaced [B i,j , H I ] using Eq.(C7).Using that (ẑ j,↑ ẑj,↓ ) 2 = 1 1 and [ẑ i,↑ ẑi,↓ , ẑj,↑ ẑj,↓ ] = 0 we can evaluate the square to get We define the unitary and note that it satisfies Furthermore, for k ̸ = i, j the unitary V k commutes with B i,j so we can consider the unitary over all sites V := n j=1 V j , and we have This enables us to rewrite Eq. (C9) as follows Summing over all i ̸ = j, gives us Taking the operator norm, using the triangle inequality and unitary invariance of the operator norm yields Since H h is a free fermionic Hamiltonian the energy is known (recall Eq. (A9)), so we have where R is the matrix of Hamiltonian coefficients from Eq. ( 1).This completes the proof of the Lemma 1.  Hamiltonians for an L × L square, periodic Hubbard model.This was found by using ∥H h ∥ = τ || R||1 where R is the adjacency matrix for the L × L square, periodic lattice and similarly for the nested commutator.We note that ∥H h ∥/τ ≤ (3/2)L 2 for all L ≥ 4.

Second lemma for Trotter error bounds
Our second crucial lemma states that Lemma 2 For any Hubbard-type Hamiltonian where H I = (u/4) i ẑi,↑ ẑi,↓ is the interacting Hamiltonian and H h is the hopping part, we have the following bound To prove Lem. 2, we use the same abbreviated notation introduced in Eq. (C5) and Eq.(C6) of Sec.C 2. We begin by observing that Taking the norm, using the triangle inequality, gives For a given vertex i, we have that and we have used that for such indices the operators anticommute.Furthermore, since Symmetrizing the above two results, we obtain Using the shorthand which represents all hopping terms that interact with site i, we have Nesting the commutator and using the identity Using, Eq. (C25) we have Taking the operator norm, using the triangle inequality and and substituting into Eq.(C20) completes the proof of Lem. 2 The above lemma applies to all Hubbard type Hamiltonians.In the special case of interest Corollary 1 For the special case of the TDL Hubbard model on an L × L square lattice with periodic boundary conditions, we have and Therefore, we have For the square lattice, the operator T i is a hopping interaction corresponding to a star graph so We know from App.A that ||T i || = ||R star || 1 and one easily finds that ||R star || 1 = 4τ .
To evaluate [T i , B] we note that both operators are free fermionic, therefore the commutator is also a free fermionic operator with coefficients R flower := [R star , R].Since R star acts locally on only 5 electrons, the resulting ]|| 1 will be independent of the lattice size proved L is large enough.Computing this numerically for L = 5, we get Substituting this back into our nested commutator, we have and so for an L by L lattice we have which proves Corollary 1.

Appendix D: PLAQ Trotter error
The previous method requires a unitary that simultaneously diagonalizes all the hopping terms in H h .There are two previously proposed options for this: using Givens rotations or the Fast Fermionic Fourier Transform (FFFT).Given rotations can be used on any size lattice, whereas the FFFT can only be performed on lattices of size L = 2 k with integer k.Whereas, the Givens rotation method typically leads to much higher gate counts.
We introduce plaquette Trotterization or simply PLAQ.It is applicable to any lattice with even L, so is more versatile than the FFFT approach.Furthermore, it provides gates counts that are comparable or superior to those obtained with FFFT, and far lower than using Givens rotations.We split H h into two components H p h and H g h , so that as described in the main text.That is, we partition the edges into pink and gold sets, such that every edge belongs to one set and the partitioning is regular.This can be done straightforwardly on a periodic lattice if L is even.If L is odd, then a similar results can be obtained by slightly breaking the regularity of the partitioning, but we ignore this case for brevity.Already, all even L is a much more fine grained than the SO-FFFT Trotterization scheme [8] that required L = 2 k with integer k.
Given this partitioning, we implement a single Trotter step as e −itH ≈ e −i(t/2)H I e −i(t/2)H p h e −itH g h e −i(t/2)H p h e −i(t/2)H I .(D2) Using well-known second order commutator bounds, we have that the error is bounded by W PLAQ t 3 where Performing the summations we get Noting that H 2 + H 3 = H h and H 1 = H I we can simplify two terms to get Indeed, these first two terms of W PLAQ exactly match the error contributions we calculated earlier, so and represent the excess error from using plaquette Trotterization instead of split-operator Trotterization.By symmetry between the pink and gold sets, both commutators have the same norm and so This is a nested commutator of free fermionic Hamiltonians.Using R p and R g for the Hamiltonian coefficients, we have which can be efficiently computed.
For ||[[R p , R g ], R g ]|| 1 on the periodic, square L × L lattice, we present a precomputed set of values in Table III and note that across this range the values are upper bounded by 10  3 L 2 τ 3 .

Appendix E: Realising plaquette Trotterization
Here we give explicit circuits for realising the steps of the plaquette Trotterization and therefore count the number of non-Clifford gates required.

Resource cost without ancilla
First, we count gates without use of Hamming weight phasing.Given r reps of e itH ≈ e i(t/2)H I e i(t/2)H p h e itH g h e i(t/2)H p h e i(t/2)H I , (E1) we can merge the first and last terms e itH ≈ e i(t/2)H I [e i(t/2)H p h e itH g h e i(t/2)H p h e itH I ] r e −i(t/2)H I , (E2) so for large r only the terms in the square brackets are non-negligible.We will describe costs using a vector ⃗ N = {N TOF , N T , N R } that represent the number of Toffoli gates, T gates and arbitrary single qubit Z rotations.Note that these costs are per Trotter step.There are various conversions between these costs that we discuss later.
For each step, we make 1 implementation of the interaction step e −itH I and 3 implementations of tile steps e −i(t/2)H p h or e −itH g h (the cost of which is equal), so There are L 2 interactions, but because of our choice of chemical potential each interaction corresponds to 1 arbitrary rotation, so it is possible to realise this with cost ⃗ N int = {0, 0, L 2 }.Without our choice of chemical potential, we would have counted three times as many gates.
Next, we consider implementation of e −i(t/2)H p h or e −itH g h .The lattice contains L 2 /4 plaquettes of each colour.Accounting for spin that makes L 2 /2 plaquettes inside one spin sector, so we can write ⃗ N tile = (L 2 /2) ⃗ N plaq .Here ⃗ N plaq represents the cost of realising unitaries of the form exp(iτ K) where K is a plaquette operator where we drop the spin index for brevity and the non-zero sub-block of R plaq is To see this, note that a square is a small periodic loop of size 4. Labeling the fermions going around the loop, we have that i and j interact if those numbers differ by 1 (modulo 4), which leads to R above.We can diagonalize R plaq to find the Givens rotation that diagonalizes exp(iτ K).In general, the cost would be 3 Givens rotations, 4 arbitrary rotations, and the inverse Givens rotations.However, when we diagonalize R plaq we find that it has only two non-trivial eigenvalues, which will dramatically reduce the gate count.That is, Letting F i,j be a fermionic operator such that Then using Note that the qubit implementation of F i,j is non-local when i and j are non-adjacent with respect to the Jordon-Wigner string ordering.However, we can always use fermionic swap f SWAP operations to make modes adjacent, and since fermionic swap can be realised using Clifford operations we neglect this cost.This is valid for the resource costing model used in this paper.However, additional care is required when costing in other settings.Next, we can realise the plaquette time evolution via exp(iτ K) = V e i2τ a † 2 a2 e −i2τ a † 3 a3 V † (E10) = F 3,1 F 2,4 F 2,3 e i2τ a † 2 a2 e −i2τ a † 3 a3 F † 2,3 F † 2,4 F † 3,1 However, we can compile further.Assuming a Jordon-Wigner encoding, we have that F 2,3 is realised by where the cost of synthesis is a significant factor.However, within each factor of the Trotterization, all the arbitrary rotations are diagonalized to Z j rotations by the same angle.Therefore, given access to ancillas we can use Hamming weight phasing to significantly reduce the total number of non-Cliffords.

Using ancilla and Hamming weight phasing
Hamming weight phasing is a gadget that uses ancilla and Toffoli gates to reduce the number of rotations needed.This is a slight refinement of the idea introduce by Gidney [42] and expanded up to in Ref. [8].Previously, it has been noted that Hamming weight phasing requires α ≤ m − 1 and that this will be loose when m ̸ = 2 q for some integer q.
Our proof of the above claim combines two previously established facts.It has been observed that for quantum circuits, the Toffoli and ancilla cost of computing (and then uncomputing) some Boolean function is captured by the classical notion of multiplicative complexity [44].Furthermore, it has been shown [45] that multiplicative complexity of Hamming weight computations is precisely captured by Eq. (E15).Note that in Ref. [45]

FIG. 1 .
FIG.1.The interaction graph for a 4 × 4 Hubbard model on a square periodic lattice.Each site contains 2 fermions, one for each spin direction.For PLAQ-Trotterization, we divide the lattice into disjoint pink and gold plaquettes, such that each edge belongs to only 1 plaquette.

Theorem 2
Using Hamming weight phasing, it is possible to implement m phase gates m j=1 exp(iθZ j ) using k = F loor[Log[2, m] + 1] arbitrary rotations, α Toffoli gates and α clean ancilla, whereα = m − w(m) ≤ m − 1, (E15)where w(m) is the number of 1s in the binary expansion of m.

2 ,
they use the notation H N (m) where we use w(m).The PLAQ Trotter step contains layers of L 2 or L 2 /2 identical rotations.If m is a factor of L 2 /2, we can break up the task in batches that each use Hamming weight phasing to transform the cost model as follows:⃗ N = {0, 12L 2 , 4L 2 } (E16) → {4L 2 α/m, 12L 2 , 4L 2Floor[log 2 (m) + 1]/m} t where α = m − w(m) and we assume access to α logical ancilla.If we convert each Toffoli gate into 4 T gates, we get a total cost of ⃗ N = {0, N T , N R } where In the worst case α = m − 1 and Floor[log 2 (m) + 1] = log 2 (m) + 1 for whichN T = 16(m − 1) m + 12 L 2 , (E18) N R = 4 m [log 2 (m) + 1]L 2 ,Since the N R gates require further synthesis, HWP significantly improves the T -count.However, this must be balanced against cost of the extra logical ancilla.

TABLE I .
[8]orous upperbounds on the error constant W (total error W s3for simulating e iHs ) and non-Clifford (NT is the number of T -gates and NR is the number of arbitrary angle Z-axis rotations) costs of implementing a single Trotter step of the TDL Hubbard model on a L × L lattice with u/τ = 4.We compare previous SO-FFFT results[8]with our improved SO-FFFT+ analysis for the allowable lattice sizes L = 2 k .We also show results for PLAQ, we highlight that it works on other lattice sizes L ̸ = 2 k with some examples.
requires 4L2T -gates and L 2 Z-axis rotations.The full Trotter step uses 2 implementations of e isH p h /2 and one of e isH g h , which totals to 12L 2 T -gates and 3L 2 Z-axis rotations.Adding a cost of L 2 /2 Z-axis rotations for e isH I , implemented just as in SO-FFFT+, we arrive at a total of 12L 2 T -gates and 4L 2 arbitrary Z rotations per Trotter step.
exactly and without further Trotter error.Each plaquette Hamiltonian acts on 4 fermions, which can be implemented with 8 T gates and 2 Z-axis rotations of equal magnitude (see App E for details).Counting L 2 /4 plaquettes of colour pink and 2 spin sectors, e isH p h /2 , H I ], H I ]|| .In App.C 3, we present Lemma 2 and Corollary 1 to obtain a bound on ||[[H I , H h ], H I ]||.Combined these results gives bounds on W SO1 and W SO2 h

TABLE III .
Numerically computed values of normalised hopping Hamiltonian ∥H h ∥/τ and nested commutator of plaquette interaction