Efficient estimation of Pauli observables by derandomization

We consider the problem of jointly estimating expectation values of many Pauli observables, a crucial subroutine in variational quantum algorithms. Starting with randomized measurements, we propose an efficient derandomization procedure that iteratively replaces random single-qubit measurements with fixed Pauli measurements; the resulting deterministic measurement procedure is guaranteed to perform at least as well as the randomized one. In particular, for estimating any $L$ low-weight Pauli observables, a deterministic measurement on only of order $\log(L)$ copies of a quantum state suffices. In some cases, for example when some of the Pauli observables have a high weight, the derandomized procedure is substantially better than the randomized one. Specifically, numerical experiments highlight the advantages of our derandomized protocol over various previous methods for estimating the ground-state energies of small molecules.

We consider the problem of jointly estimating expectation values of many Pauli observables, a crucial subroutine in variational quantum algorithms. Starting with randomized measurements, we propose an efficient derandomization procedure that iteratively replaces random single-qubit measurements with fixed Pauli measurements; the resulting deterministic measurement procedure is guaranteed to perform at least as well as the randomized one. In particular, for estimating any L low-weight Pauli observables, a deterministic measurement on only of order log(L) copies of a quantum state suffices. In some cases, for example when some of the Pauli observables have a high weight, the derandomized procedure is substantially better than the randomized one. Specifically, numerical experiments highlight the advantages of our derandomized protocol over various previous methods for estimating the ground-state energies of small molecules.

I. INTRODUCTION
Noisy Intermediate-Scale Quantum (NISQ) devices are becoming available [39]. Though less powerful than fully error-corrected quantum computers, NISQ devices used as coprocessors might have advantages over classical computers for solving some problems of practical interest. For example, variational algorithms using NISQ hardware have potential applications to chemistry, materials science, and optimization [3, 7, 18-20, 27, 36, 38, 40].
In a typical NISQ variational algorithm, we need to estimate expectation values for a specified set of operators {O 1 , O 2 , . . . , O L } in a quantum state ρ that can be prepared repeatedly using a programmable quantum system. To obtain accurate estimates, each operator must be measured many times, and finding a reasonably efficient procedure for extracting the desired information is not easy in general. In this paper, we consider the special case where each O j is a Pauli operator; this case is of particular interest for near-term applications.
Suppose we have quantum hardware that produces multiple copies of the n-qubit state ρ. Furthermore, for every copy, we can measure all the qubits independently, choosing at our discretion to measure each qubit in the X, Y , or Z basis. We are given a list of L n-qubit Pauli operators (each one a tensor product of n Pauli matrices), and our task is to estimate the expectation values of all L operators in the state ρ, with an error no larger than ε for each operator. We would like to perform this task using as few copies of ρ as possible.
If all L Pauli operators have relatively low weight (act nontrivially on only a few qubits), there is a simple randomized protocol that achieves our goal quite * Electronic address: hsinyuan@caltech.edu efficiently: For each of M copies of ρ, and for each of the n qubits, we chose uniformly at random to measure X, Y , or Z. Then we can achieve the desired prediction accuracy with high success probability if M = O(3 w log L/ 2 ), assuming that all L operators on our list have weight no larger than w [15,21]. If the list contains high-weight operators, however, this randomized method is not likely to succeed unless M is very large.
In this paper, we describe a deterministic protocol for estimating Pauli-operator expectation values that always performs at least as well as the randomized protocol, and performs much better in some cases. This deterministic protocol is constructed by derandomizing the randomized protocol. The key observation is that we can compute a lower bound on the probability that randomized measurements on M copies successfully achieve the desired error ε for every one of our L target Pauli operators. Furthermore, we can compute this lower bound even when the measurement protocol is partially deterministic and partially randomized; that is, when some of the measured single-qubit Pauli operators are fixed, and others are still sampled uniformly from {X, Y, Z}.
Hence, starting with the fully randomized protocol, we can proceed step-by-step to replace each randomized single-qubit measurement by a deterministic one, taking care in each step to ensure that the new partially randomized protocol, with one additional fixed measurement, has success probability at least as high as the preceding protocol. When all measurements have been fixed, we have a fully deterministic protocol. In numerical experiments, we find that this deterministic protocol substantially outperforms randomized protocols [13,16,21,34,37]. The improvement is especially significant when the list of target observables includes operators with relatively high weight. Further performance gains are possible by executing (at least) linear-depth circuits before measurements [11,24,25,47]. Such procedures do, however, require deep quantum circuits. In contrast, our protocol only requires single-qubit Pauli measurements which are more amenable to execution on near-term devices.
We provide some statistical background in Sec. II, explain the randomized measurement protocol in Sec. III, and analyze the derandomization procedure in Sec. IV. Numerical results in Sec. V show that our derandomized protocol improves on previous methods. Sec. VI contains concluding remarks. Further examples and details of proofs are in the appendices.

II. STATISTICAL BACKGROUND
Let ρ be a fixed, but unknown, quantum state on n qubits. We want to accurately predict L expectation values [X, X]. We can approximate each ω (ρ) by empirically averaging (appropriately marginalized) measurement outcomes that belong to Pauli measurements that hit o : It is easy to check that eachω exactly reproduces ω (ρ) in expectation (provided that h(o ; P) ≥ 1). Moreover, the probability of a large deviation improves exponentially with the number of hits.
See Appendix B 1 for a detailed derivation. We call the function defined in Eq. (3) the confidence bound. It is a statistically sound summary parameter that checks whether a set of Pauli measurements (P) allows for confidently predicting a collection of Pauli observables (O) up to accuracy ε each.
In particular, order log(L) randomized Pauli measurements suffice for estimating any collection of L low-weight Pauli observables. It is instructive to compare this result to other powerful statements about randomized measurements, most notably the "classical shadow" paradigm [21,37]. For Pauli observables and Pauli measurements, the two approaches are closely related. The estimators (2) are actually simplified variants of the classical shadow protocol (in particular, they don't require median of means for k = 1 to n do loop over qubits output P ∈ {X, Y, Z} n×M prediction) and the requirements on M are also comparable. This is no coincidence; information-theoretic lower bounds from [21] assert that there are scenarios where the scaling M ∝ log(L) max 3 w(o ) /ε 2 is asymptotically optimal and cannot be avoided. Nevertheless, this does not mean that randomized measurements are always a good idea. High-weight observables do pose an immediate challenge, because it is extremely unlikely to hit them by chance alone.

IV. DERANDOMIZED PAULI MEASUREMENTS
The main result of this work is a procedure for identifying "good" Pauli measurements that allow for accurately predicting many (fixed) Pauli expectation values. This procedure is designed to interpolate between two extremes: (i) completely randomized measurements (good for predicting many local observables) and (ii) completely deterministic measurements that directly measure observables sequentially (good for predicting few global observables).
Note that we can efficiently compute concrete confidence bounds (3), as well as expected confidence bounds averaged over all possible Pauli measurements (5). Combined, these two formulas also allow us to efficiently compute expected confidence bounds for a list of measurements that is partially deterministic and partially randomized. Suppose that P subsumes deterministic assignments for the first (m − 1) Pauli measurements, as well as concrete choices for the first k Pauli labels of the m-th measurement, see Fig. 1

(center). Then
. This formula allows us to build deterministic measurements one Pauli-label at a time.
We start by envisioning a collection of M completely random n-qubit Pauli measurements. That is, each Pauli label is random and Eq.
Crucially, Eq. (6) allows us to efficiently identify a minimizing assignment: Doing so, replaces an initially random single-qubit measurement setting by a concrete Pauli label that minimizes the conditional expectation value over all remaining (random) assignments. This procedure is known as derandomization [1,33,43] and can be iterated. Fig. 1 provides visual guidance, while pseudocode can be found in Algorithm 1. There are a total of n × M iterations.
Step (k, m) is contingent on comparing three conditional expectation values E P Conf ε (O; P)|P , P[k, m] = W and assigning the Pauli label that achieves the smallest score. These update rules are constructed to ensure that (appropriate modifications of) Eq. (7) remain valid throughout the procedure. Combining all of them implies the following rigorous statement about the resulting Pauli measurements P .
Theorem 2 (Derandomization promise). Algorithm 1 is guaranteed to output Pauli measurements P with below average confidence bound: We see that derandomization produces deterministic Pauli measurements that perform at least as favorably as (averages of) randomized measurement protocols. But the actual difference between randomized and derandomized Pauli measurements can be much more pronounced. In the examples we considered, derandomization reduces the measurement budget M by at least an order of magnitude, compared to randomized measurements. Furthermore, because Algorithm 1 implements a greedy update procedure, we have no assurance that our derandomized measurement procedure is globally optimal, or even close to optimal.

V. NUMERICAL EXPERIMENTS
The ability to accurately estimate many Pauli observables is an essential subroutine for variational quantum eigensolvers (VQE) [18,28,36,38,40]. Randomized Pauli measurements [15,21] -also known as classical shadows in this context -offer a conceptually simple solution that is efficient both in terms of quantum hardware and measurement budget.
Derandomization can and should be viewed as a refinement of the original classical shadows idea. Supported by rigorous theory (Theorem 2), this refinement is only contingent on an efficient classical preprocessing step, namely running Algorithm 1. It does not incur any extra cost in terms of quantum hardware and classical post-processing, but can lead to substantial performance gains. Numerical experiments visualized in Ref. [21,Figure 5] have revealed unconditional improvements of about one order of magnitude for a particular VQE experiment [30] (simulating quantum field theories).
In this section, we present additional numerical studies that support this favorable picture. These address a slight variation of Algorithm 1 that does not require fixing the total measurement budget M in advance. We focus on the electronic structure problem: determine the ground state energy for molecules with unknown electronic structure. This is one of the most promising VQE applications in quantum chemistry and material science. Different encoding shemes -most notably Jordan-Wigner (JW) [26], Bravyi-Kitaev (BK) [5] and Parity (P) [5,42] -allow for mapping molecular Hamiltonians to qubit Hamiltonians that correspond to sums of Pauli observables. Several benchmark molecules have been identified whose encoded Hamiltonians are just simple enough for an explicit classical minimization, so that we can compare Pauli estimation techniques with the exact answer. Fig. 2 illustrates one such comparison. We fix a benchmark molecule BeH 2 , a Bravyi-Kitaev encoding (BK) and plot the ground state energy approximation error against the number of Pauli measurements. The plot highlights that derandomization outperforms the original classical shadows procedure (randomized Pauli measurements) [21], locally-biased classical shadows [17], and another popular technique known as largest degree first (LDF) grouping [16,44]. The discrepancy between randomized and derandomized Pauli measurements is particularly pronounced.
This favorable picture extends to a variety of other benchmark molecules and other encoding schemes, see Table 3. For a fixed measurement budget, derandomization consistently leads to a smaller estimation error than other state-of-the-art techniques.

VI. CONCLUSION AND OUTLOOK
We consider the problem of predicting many Pauli expectation values from few Pauli measurements. Derandomization [1,33,43] provides an efficient procedure that replaces originally randomized singlequbit Pauli measurements by specific Pauli assignments. The resulting Pauli measurements are deterministic, but inherit all advantages of a fully randomized measurement protocol. Furthermore, the derandomization procedure could accurately capture the fine-grained structure of the observables in question. Predicting molecular ground state energies based on derandomized Pauli measurements scales favorably and improves upon many existing techniques [15,16,37,44]. Source code for an implementation of the proposed procedure is available at [23].
Randomized measurements have also been used to estimate entanglement entropy [6,21,41,46], topological invariants [9,14], benchmark physical devices [8,12,21,29], and predict outcomes of physical experiments [22]. Derandomization provides a principled approach for adapting randomized measurement pro-  [5] for different measurement schemes: The error for derandomized shadow is the root-mean-squared error (RMSE) over ten independent runs. The error for the other methods shows the RMSE over infinitely many runs and can be evaluated efficiently using the variance of one experiment [16].  Table 3: Average estimation error using 1000 measurements for different molecules, encodings, and measurement schemes: The first column shows the molecule and the corresponding ground state electronic energy (in Hartree). We consider the following abbreviations: derandomized classical shadow (Derand.), locally-biased classical shadow (Local S.), largest degree first (LDF) heuristic and original classical shadow (Shadow) [21] cedures to fine-grained structure and is closely related to an algorithmic technique -multiplicative weight update [2] -commonly used in machine learning and game theory. So far, we have only considered estimations of Pauli observables, but measurement design via derandomization should apply more broadly. We look forward to extension of derandomization in other tasks such as estimating non-Pauli observables and entanglement entropies, as well as improvements to the cost function f (W ) in Algorithm 1.
Many near-term applications of quantum devices rely on repeatedly estimating a large number of low-weight Pauli observables. For example, low-energy eigenstates of a many-body Hamiltonian may be prepared and studied using a variational method, in which the Hamiltonian, a sum of local terms, is measured many times. Using randomized measurements, we can predict many low-weight observables simultaneously at comparatively little cost. It is known that a logarithmic number of randomized Pauli measurements allows for accurately predicting a polynomial number of low-weight observables [21].
This desirable feature provably extends to derandomized measurements. From Theorem 2 and Eq. (5), we infer that the measurement budget M = 4 log(2L/δ) max 3 w(o ) /ε 2 suffices to ensure that Algorithm 1 outputs Pauli measurements P that obey Conf ε (O; P) ≤ δ/2. With Lemma 1, we may convert this into an error bound: empirical averages (2) formed from appropriate measurement outcomes are guaranteed to obey |ω − tr(O o ρ)| ≤ ε for all 1 ≤ ≤ L with high probability (at least 1 − δ). This error bound is roughly on par with the best rigorous result about predicting local Pauli observables from randomized Pauli measurements [15]. But this argument implicitly assumes that Conf ε (O; P ) (which we can compute) is comparable to E P [Conf ε (O; P)] (which is characterized by Eq. (5)). This assumption is extremely pessimistic, because often Conf ε (O; P ) E P [Conf ε (O; P)]. If this is the case, derandomized Pauli measurements perform substantially better.

Few global Pauli observables.
We have seen that derandomized measurements never perform worse than randomized measurements. But they can perform much better. This discrepancy is best illustrated with a simple example: design Pauli measurements to predict both a complete Y -string (o 1 = [Y, . . . , Y ]) and a complete Z-string (o 2 = [Z, . . . , Z]). Here, randomized measurements are a terrible idea, because it is exponentially unlikely to hit either string by chance alone.
Contrast this with derandomization. For the very first assignment (k = 1,m = 1), Algorithm 1 starts by computing three conditional expectations. Comparing them reveals f (Y ) = f (Z) < f (X) and the algorithm determines that assigning X is likely a bad idea. The two remaining choices should be equivalent and the algorithm assigns, say, P [1, 1] = Y . This initial choice does affect the expected confidence bound associated with the second Pauli label (k = 2,m = 1): f (Y ) < f (X) = f (Z). Taking into account the already assigned first Pauli label, both X and Z become equally unfavorable and the algorithm sticks to assigning P [2, 1] = Y . This situation now repeats itself until the first Pauli measurement is completely assigned: The algorithm has successfully kept track of an entire global Pauli string.
It is now time to assign the first Pauli label of the second Pauli measurement (k = 1, m = 2). While X is still a bad idea, taking into account that we have already measured o 1 once also breaks the symmetry between Y and Z assignments: In words: measure both global observables equally often. Although statistically optimal, this measurement protocol is neither surprising nor particularly interesting. What is encouraging, though, is that Algorithm 1 has (re-)discovered it all by itself.

Very many global Pauli observables (non-example):
The derandomization algorithm is not without flaws. The greedy update rule in line 8 of Algorithm 1 can be misguided to produce non-optimal results. This happens, for instance, for a very large collection of global Pauli observables that appears to have favorable structure but actually doesn't. For instance, set o 1 = [X, . . . , X] and o = [Z;õ ], whereõ ∈ {X, Y, Z} n−1 ranges through all 3 n−1 possible Pauli strings of size (n − 1). There are L = 3 n−1 + 1 target observables, all of which are global and therefore incompatible. However, 3 n−1 of them start with a Pauli-Z label. This imbalance leads the algorithm to believe that assigning P [1, m] = Z for all 1 ≤ m ≤ M is always a good idea (provided that M is not much larger than 3 n−1 ). By doing so, it completely ignores the first target observable which starts with an X-label. But at the same time, it cannot capitalize on this particular decision, because observables o 2 to o L are actually incompatible. This results in an imbalanced output P that treats observables o 2 to o L roughly equally, but completely forgets about o 1 . Needless to say, the resulting confidence bound will not be minimal either. We emphasize that this highly stylized non-example is not motivated by actual applications. Instead it is intended to illustrate how greedy update procedures can get stuck in local minima.
Now, suppose that o ∈ {I, X, Y, Z} n is another Pauli string that is hit by p (o p). Then, we can appropriately marginalize n-qubit outcome strings q ∈ {±1} n to reproduce ω(ρ) = tr (O o ρ) in expectation: Lemma 1 in the main text is an immediate consequence of this concentration inequality.
Proof. The union bound -also known as Boole's inequality -states that the probability associated with a union of events is upper bounded by the sum of individual event probabilities. For the task at hand, it implies This allows us to treat individual deviation probabilities separately. Fix 1 ≤ ≤ L and note thatω is an empirical average of M = h(o ; P) random signs s that are independent each (they arise from different measurement outcomes). Empirical averages of independent signed random variables tend to concentrate sharply around their true expectation value Es Hoeffding's inequality makes this intuition precise and asserts for any ε > 0 The claim follows, because such an exponential bound is valid for each term in Eq. (B5). This also includes terms with zero hits (M = 0), because Pr [|ω − ω | ≥ ε] ≤ 1 = exp (−0/2) -and the claim follows.

Derivation of Eq. (6)
Note that each hitting count h(o ; P) = M m=1 1 {o p m } is a sum of M indicator functions that can take binary values each. This structure allows us to rewrite the confidence bound (3) as where ν = 1 − exp −ε 2 /2 ∈ (0, 1). Next, note that each remaining indicator function can be further decomposed into a product of more elementary indicator functions: Now, note that the exponent

Appendix C: Details regarding numerical experiments
We consider a molecular electronic Hamiltonian that has been encoded into an n-qubit system. The Hamiltonian can be written as a sum of Pauli observables. Each molecule is represented by a fermionic Hamiltonian in a minimal STO-3G basis, ranging from 4 to 16 spin orbitals. The 8-qubit H 2 example is represented using a 6-31G basis. The fermionic Hamiltonian is mapped to a qubit Hamiltonian using three different common encodings: Jordan-Wigner (JW) [26], Bravyi-Kitaev (BK) [5] and Parity (P) [5,42]. The Pauli decomposition considered here has already been featured in many existing works; see [4,17,27] for more details. In our numerical experiments, the measurement procedure is applied to the exact ground state of the encoded n-qubit Hamiltonian H: The ground state |g is obtained by exact diagonalization using the Lanczos method, see e.g. [31] for a recent survey. We focus on root-mean squared error (RMSE) to quantify the measurement error. For M independent repetitions of the measurement procedure giving rise to M estimatesÊ 1 , . . . ,Ê M , the RMSE is given by: where E GS is the exact ground state electronic energy tr(Hρ) = ψ| H |ψ . We consider the ground state electronic energy of the molecule without the static Coulomb repulsion energy between the nuclei. Hence the total ground state energy of the molecule is the sum of the ground state electronic energy and the static Coulomb repulsion energy (Born-Oppenheimer approximation). We do not focus on the static Coulomb repulsion energy because it is not encoded in the molecular electronic Hamiltonian H and is considered to be a fixed value. We elaborate the alternative measurement procedures with which we compared our derandomized procedure.

LDF grouping:
The largest-degree-first (LDF) grouping strategy and other heuristics have been considered and investigated in [45]. The conclusion is that the LDF grouping strategy results in good performance (differing from the best heuristics by at most 10%) and is generally recommended. The measurement error (RMSE) of LDF grouping strategy can be computed exactly given an exact representation of the ground state |g ; see [17] for details.
2. Classical shadow : The measurement procedure measures each qubit in a random X, Y, Z Pauli basis. This procedure is known to allow estimation of any L few-body observables from only order log(L) measurements [10,15,21]. However, the performance would degrade significantly when we consider many-body observables. Hence, this approach will likely perform less well for molecular Hamiltonians due to the presence of many high-weight Pauli observables.
3. Locally-biased classical shadow : This is an improvement over classical shadows, proposed by [17], designed to overcome disadvantages in estimating the expectation of many-body observables. The idea is to bias the distribution over different Pauli bases (X, Y or Z) for each qubit to minimize the variance when we measure the quantum Hamiltonian given in Equation (C1). Ref. [17] demonstrated that this approach would yield similar or better performance compared to LDF grouping and outperforms classical shadows.
In what follows, we provide a detailed description of the cost function used to derandomize the single-qubit Pauli observables for our numerical experiments. In Algorithm 1, we used the cost function The conditional expectation is given by Eq. (6) and is restated here for convenience where η, ν > 0 are hyperparameters that need to be chosen properly. In the numerical experiments, we consider η = 0.9 and ν = 1 − exp(−η/2). The larger V (o , P ) is, the lower the single-observable cost function exp −V (o , P ) will be. The following discussion provides an intuitive understanding for the role of the two terms in V (o , P ). When the entire set of M measurements has been decided, V (o , P ) will consist only of the first term and is proportional to the number of times the observable o has been measured.
For quantum chemistry applications, the coefficients of different Pauli observable are different, e.g., in Eq. (C1), the Hamiltonian H consists of Pauli observable P with varying coefficients α P . In such a case, one would want to measure each Pauli observable o with a number of times proportional to |α o | [32]. In order to include the proportionality to |α o |, we consider the following modified cost function that depends on the coefficients α, The definition of V (o , P ) is given in Eq. (C8). Recall that V (o , P ) will be proportional to the number of times the observable o has been measured, hence the weight factor w o will promote the proportionality of V (o , P ) to w o ∝ |α o |. While the cost function is derived from derandomizing the powerful randomized procedure [21], it is not clear if this is the optimal cost function. We believe other cost functions that are tailored to the particular application could yield even better performance; we leave such an exploration as goal for future work.