Provably efficient machine learning for quantum many-body problems

Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over more traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter. In contrast, under widely accepted complexity theory assumptions, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases of matter. Our arguments are based on the concept of a classical shadow, a succinct classical description of a many-body quantum state that can be constructed in feasible quantum experiments and be used to predict many properties of the state. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, 2D random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.

INTRODUCTION: Solving quantum many-body problems, such as finding ground states of quantum systems, has far-reaching consequences for physics, materials science, and chemistry. Classical computers have facilitated many profound advances in science and technology, but they often struggle to solve such problems. Scalable, fault-tolerant quantum computers will be able to solve a broad array of quantum problems but are unlikely to be available for years to come. Meanwhile, how can we best exploit our powerful classical computers to advance our understanding of complex quantum systems? Recently, classical machine learning (ML) techniques have been adapted to investigate problems in quantum many-body physics. So far, these approaches are mostly heuristic, reflecting the general paucity of rigorous theory in ML. Although they have been shown to be effective in some intermediate-size experiments, these methods are generally not backed by convincing theoretical arguments to ensure good performance.
RATIONALE: A central question is whether classical ML algorithms can provably outperform non-ML algorithms in challenging quantum many-body problems. We provide a concrete answer by devising and analyzing classical ML algorithms for predicting the properties of ground states of quantum systems. We prove that these ML algorithms can efficiently and accurately predict ground-state properties of gapped local Hamiltonians, after learning from data obtained by measuring other ground states in the same quantum phase of matter. Furthermore, under a widely accepted complexitytheoretic conjecture, we prove that no efficient classical algorithm that does not learn from data can achieve the same prediction guarantee. By generalizing from experimental data, ML algorithms can solve quantum many-body problems that could not be solved efficiently without access to experimental data. RESULTS: We consider a family of gapped local quantum Hamiltonians, where the Hamiltonian H(x) depends smoothly on m parameters (denoted by x). The ML algorithm learns from a set of training data consisting of sampled values of x, each accompanied by a classical representation of the ground state of H(x). These training data could be obtained from either classical simulations or quantum experiments. During the prediction phase, the ML algorithm predicts a classical representation of ground states for Hamiltonians different from those in the training data; ground-state properties can then be estimated using the predicted classical representation. Specifically, our classical ML algorithm predicts expectation values of products of local observables in the ground state, with a small error when averaged over the value of x. The run time of the algorithm and the amount of training data required both scale polynomially in m and linearly in the size of the quantum system. Our proof of this result builds on recent developments in quantum information theory, computational learning theory, and condensed matter theory. Furthermore, under the widely accepted conjecture that nondeterministic polynomial-time (NP)-complete problems cannot be solved in randomized polynomial time, we prove that no polynomial-time classical algorithm that does not learn from data can match the prediction performance achieved by the ML algorithm.
In a related contribution using similar proof techniques, we show that classical ML algorithms can efficiently learn how to classify quantum phases of matter. In this scenario, the training data consist of classical representations of quantum states, where each state carries a label indicating whether it belongs to phase A or phase B. The ML algorithm then predicts the phase label for quantum states that were not encountered during training. The classical ML algorithm not only classifies phases accurately, but also constructs an explicit classifying function. Numerical experiments verify that our proposed ML algorithms work well in a variety of scenarios, including Rydberg atom systems, two-dimensional random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases. CONCLUSION: We have rigorously established that classical ML algorithms, informed by data collected in physical experiments, can effectively address some quantum many-body problems. These rigorous results boost our hopes that classical ML trained on experimental data can solve practical problems in chemistry and materials science that would be too hard to solve using classical processing alone. Our arguments build on the concept of a succinct classical representation of quantum states derived from randomized Pauli measurements. Although some quantum devices lack the local control needed to perform such measurements, we expect that other classical representations could be exploited by classical ML with similarly powerful results. How can we make use of accessible measurement data to predict properties reliably? Answering such questions will expand the reach of near-term quantum platforms. ▪

QUANTUM PHYSICS
Provably efficient machine learning for quantum many-body problems Hsin-Yuan Huang 1 *, Richard Kueng 2 , Giacomo Torlai 3 , Victor V. Albert 4 , John Preskill 1,3 Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground-state properties of gapped Hamiltonians after learning from other Hamiltonians in the same quantum phase of matter. By contrast, under a widely accepted conjecture, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, two-dimensional random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.
S olving quantum many-body problems, such as finding ground states of quantum systems, has far-reaching consequences for physics, materials science, and chemistry. Although classical computers have facilitated many profound advances in science and technology, they often struggle to solve such problems. Powerful methods, such as density functional theory (1, 2), quantum Monte Carlo (3)(4)(5), and density-matrix renormalization group (6,7), have enabled solutions to certain restricted instances of many-body problems, but many general classes of problems remain outside the reach of even the most advanced classical algorithms.
Scalable, fault-tolerant quantum computers will be able to solve a broad array of quantum problems but are unlikely to be available for years to come. Meanwhile, how can we best exploit our powerful classical computers to advance our understanding of complex quantum systems? Recently, classical machine learning (ML) techniques have been adapted to investigate problems in quantum many-body physics (8,9) with promising results (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27). So far, these approaches are mostly heuristic, reflecting the general paucity of rigorous theory in ML. Although they were shown to be effective in some intermediate-size experiments (28)(29)(30), these methods are generally not backed by convincing theoretical arguments to ensure good performance, particularly for problem instances where traditional classical algorithms falter.
In general, simulating quantum many-body physics is hard for classical computers because accurately describing an n-qubit quantum system may require an amount of classical data that is exponential in n. In prior work, this bottleneck has been addressed using classical shadows-succinct classical descriptions of quantum many-body states that can be used to accurately predict a wide range of properties with rigorous performance guarantees (31,32). Furthermore, this quantum-to-classical conversion technique can be readily implemented in various existing quantum experiments (33)(34)(35). Classical shadows create opportunities for addressing quantum problems using classical methods, such as ML. In this paper, we build on the classical shadow formalism and devise polynomial-time classical ML algorithms for quantum many-body problems that are supported by rigorous theory.
We consider two applications of classical ML, indicated in Fig. 1. The first application we examine is learning to predict classical representations of quantum many-body ground states. We consider a family of Hamiltonians, where the Hamiltonian H(x) depends smoothly on m real parameters (denoted by x). The ML algorithm is trained on a set of training data consisting of sampled values of x, each accompanied by the corresponding classical shadow for the ground state r(x) of H(x). These training data could be obtained from either classical simulations or quantum experiments. During the prediction phase, the ML algorithm predicts a classical representation of r(x) for values of x different from those in the training data. Ground-state properties can then be estimated using the predicted classical representation.
This learning algorithm is efficient, provided that the ground-state properties to be predicted do not vary too rapidly as a function of x. Suf-ficient upper bounds on the gradient can be derived for any family of gapped, geometrically local Hamiltonians in any finite spatial dimension, as long as the property of interest is the expectation value of a sum of few-body observables. The conclusion is that any such property can be predicted with a small average error, where the amount of training data and the classical computation time are polynomial in m and linear in the system size. Furthermore, we show that classical algorithms that do not learn from data cannot make accurate predictions in polynomial time without violating widely accepted complexity-theoretic conjectures. Together, we rigorously establish the advantage of ML algorithms with data over those without data (36) in a physically relevant task.
The classical ML algorithm could generalize from training data that are obtained either through quantum experiments or classical simulations; the same rigorous performance guarantees apply in either case. If the training data are obtained from quantum experiments, the rigorous result shows that classical ML can explore and predict properties of new physical systems that are challenging to prepare and measure in the laboratory. Even if the experimentalists only have limited measurement capability, such as being able to measure a specific property of r(x), the theorem established in this work immediately implies that a classical ML model can predict that specific property accurately. If the training data are generated classically, it could be more efficient and more accurate to use the ML model to predict properties for new values of the input x rather than doing new simulations, which could be computationally very demanding. Promising insights into quantum many-body physics are already being obtained using classical ML based on classical simulation data (10, 12, 14, 17, 19, 20, 23-25, 37, 38). Our rigorous analysis identifies general conditions that guarantee the success of classical ML models and elucidate the advantages of classical ML models over non-ML algorithms, which do not learn from data. These results enhance the prospects for interpretable ML techniques (38)(39)(40) to further shed light on quantum many-body physics.
In the second application we examine, the goal is to classify quantum states of matter into phases (41) in a supervised learning scenario. Suppose that during training we are provided with sample quantum states that carry labels indicating whether each state belongs to phase A or phase B. Our goal is to classify the phase for new quantum states that were not encountered during training. We assume that, during both the training and classification stages, each quantum state is represented by its classical shadow, which could be obtained either from a classical computation or from an experiment on a quantum device. The classical ML model, then, trains on labeled classical shadows and learns to predict labels for new classical shadows.
We assume that the A and B phases can be distinguished by a nonlinear function of marginal density operators of subsystems of constant size. This assumption is reasonable because we expect the phase to be revealed in subsystems that are larger than the correlation length but do not depend on the total system size. We show that if such a function exists, a classical ML model can learn to distinguish the phases using an amount of training data and classical processing that are polynomial in the system size. We do not need to know anything about this nonlinear function in advance, apart from its existence.
Here, we review the classical shadow formalism (31) and use this formalism to derive rigorous guarantees for ML algorithms in predicting ground-state properties and classifying quantum phases of matter. We also describe numerical experiments in a wide range of physical systems to support our theoretical results.

Constructing efficient classical representations of quantum systems
We begin with an overview of the randomized measurement toolbox (31,32,(42)(43)(44)(45), relegating further details to section S1 (46). We approximate an n-qubit quantum state r by per-forming randomized single-qubit Pauli measurements on T copies of r. That is, we measure every qubit of the unknown quantum state r in a random Pauli basis X, Y, or Z to yield a measurement outcome of ±1. Collapse of the wave function implies that this measurement procedure transforms r into a random pure product state s t j ig are eigenstates of the selected Pauli matrices. Performing one randomized measurement grants us classical access to one such snapshot. Performing a total of T randomized measurements grants us access to an entire collection Each element is a highly structured single-qubit pure state, and there are nT of them in total. So, 3nT bits suffice to store the entire collection in classical memory. The randomized measurements can be performed in actual physical experiments or through classical simulations. Resulting data can then be used to approximate the underlying n-qubit state r and I denotes the 2 × 2 identity matrix. This classical shadow representation (31, 32) exactly reproduces the global density matrix in the limit T→∞, but T ¼ O const r log n ð Þ=e 2 ½ already provides an e-accurate approximation of all reduced-r-body density matrices (in trace distance). This, in turn, implies that we can use s T (r) to predict any function that depends on only reduced-density matrices, such as expectation values of (sums of) local observables and (sums of) entanglement entropies of small subsystems. Classical storage and postprocessing costs also remain tractable in this regime. To summarize, the classical shadow formalism equips us with an efficient quantum-to-classical converter that allows classical machines to efficiently and reliably estimate subsystem properties of any quantum state r.

Predicting ground states of quantum many-body systems
We consider the task of predicting groundstate representations of quantum many-body Hamiltonians in finite spatial dimensions. Suppose that a family of geometrically local, vector x (parametrization) to a Hermitian matrix of size 2 n × 2 n (n-qubit Hamiltonian). We do not impose any additional structure on this mapping; in particular, we do not assume knowledge about how the physical Hamiltonian depends on the parameterization. The goal is to learn a modelŝ x ð Þ that can predict properties of the ground state r(x) associated with the Hamiltonian. This problem arises in many practical scenarios. Suppose diligent experimental effort has produced experimental data for ground-state properties of various physical systems. We would like to use these data to train an ML model that predicts ground-state representations of hitherto unexplored physical systems.

An ML algorithm with rigorous guarantee
We will prove that a classical ML algorithm can predict classical representations of ground states after training on data belonging to the same quantum phase of matter. Formally, we consider a smooth family of Hamiltonians H (x) with a constant spectral gap. During the training phase of the ML algorithm, many values of x are randomly sampled, and for each sampled x, the classical shadow of the corresponding ground state r(x) of H(x) is provided, either by classical simulations or quantum experiments. The full training data of size N are given by where T is the number of randomized measurements in the construction of the classical shadows at each value of x l .
We train classical ML models using the size-N training data, such that when given the input x l , the ML model can produce a classical representationŝ x ð Þ that approximates s T [r(x l )]. During prediction, the classical ML model producesŝ x ð Þ for values of x different from those in the training data. Althoughŝ x ð Þand s T [r(x l )] classically represent exponentially large density matrices, the training and prediction can be done efficiently on a classical computer using various existing classical ML models, such as neural networks with large hidden layers (47)(48)(49)(50) and kernel methods (51,52). In particular, the predicted output of the trained classical ML models can be written as the extrapolation of the training data using a learned metric k(x, For example, prediction using a trained neural network with large hidden layers (46) is equivalent to using the metric k x; To derive a provable guarantee, we consider the simple metric k Þwith cutoff L, which we refer to as the l 2 -Dirichlet kernel. We prove that the predic tion will be accurate and efficient if the function f O (x) does not vary too rapidly when x changes in any direction. Sufficient upper bounds on the gradient magnitude of f O (x) can be derived using quasi-adiabatic continuation (53,54).
Under the l 2 -Dirichlet kernel, the classical ML model is equivalent to learning a truncated Fourier series to approximate the function f O (x). The parameter L is a cutoff for the wave number k that depends on (upper bounds on) the gradient of f O (x). Using statistical analysis, one can guarantee that E x tr Oŝ x ð Þ ½ Àf O x ð Þ j j 2 ≤ e as long as the amount of training data obeys N ¼ m O 1=e ð Þ in the m→∞ limit. The conclusion is that any such f O (x) can be predicted with a small constant average error, where the amount of training data and the classical computation time are polynomial in m and at most linear in the system size n. Moreover, the training data need only contain a single classical shadow snapshot at each point x l in the parameter space (i.e., T = 1). An informal statement of the theorem is given below; we explain the proof strategy in section S5 and provide more details in section S6 (46). We also discuss how one could generalize the proof to long-range interacting systems, electronic Hamiltonians, and other settings, including when one cannot perform classical shadow tomography (31), in section S6.2 (46).

Theorem 1 (learning to predict groundstate representations; informal)
For any smooth family of Hamiltonians H x ð Þ : f x∈ À1; 1 ½ m g in a finite spatial dimension with a constant spectral gap, the classical ML algorithm can learn to predict a classical representation of the ground state r(x) of H(x) that approximates few-body reduced-density matrices up to a constant error e when averaged over x. The required training data size N and computation time are polynomial in m and linear in the system size n.
Though formally efficient in the sense that N scales polynomially with m for any fixed approximation error e, the required amount of training data scales badly with e. This unfortunate scaling is not a shortcoming of the considered ML algorithm, but a necessary feature. In section S7 (46), we show that the data size and time complexity cannot be improved further without making stronger assumptions about the class of gapped local Hamiltonians.
However, in cases of practical interest, the Hamiltonian may obey restrictions such as translational invariance or graph structure that can be exploited to obtain better results. Incorporating these restrictions can be achieved by using a suitable k(x, x l ), such as one that corresponds to a large-width convolutional neural network (CNN) (48) or a graph neural network (49). Rigorously establishing that neural networkbased ML algorithms can achieve improved prediction performance and efficiency for particular classes of Hamiltonians requires further investigation.

Computational hardness for non-ML algorithms
In the following proposition, we show that a classical polynomial-time algorithm that does not learn from data cannot achieve the same guarantee in estimating ground-state properties without violating the widely believed conjecture that nondeterministic polynomial-time (NP)-complete problems cannot be solved in randomized polynomial time. This proposition is a corollary of standard complexity-theoretic results (55,56). See section S8 (46) for the detailed statement and proof.

Proposition 1 (informal)
Consider a randomized polynomial-time classical algorithm A that does not learn from data. Suppose for any smooth family of two-dimensional (2D) Hamiltonians H x ð Þ : x∈ À1; 1 ½ m f g with a constant spectral gap, A can efficiently compute expectation values of one-body observables in the ground state r(x) of H(x) up to a constant error when averaged over x. Then, there is a randomized classical algorithm that can solve NP-complete problems in polynomial time.
It is instructive to observe that a classical ML algorithm with access to data can perform tasks that cannot be achieved by classical algorithms that do not have access to data. This phenomenon is studied in (36), where it is shown that the complexity class defined by classical algorithms that can learn from data is strictly larger than the class of classical algorithms that do not learn from data. (The data can be regarded as a restricted form of randomized advice string.) We caution that obtaining the data to train the classical ML model could be challenging. However, if we focus only on data that could be efficiently generated by quantum-mechanical processes, it is still possible that a classical ML algorithm that learns from data could be more powerful than classical computers. In section S8 (46), we present a contrived family of Hamiltonians that establishes this claim based on the (classical) computational hardness of factoring.

Classifying quantum phases of matter
Classifying quantum phases of matter is another important application of ML to physics.
We will consider this classification problem in the case where quantum states are succinctly represented by their classical shadows. For simplicity, we consider the classification of two phases (denoted A and B), but the analysis naturally generalizes to classifying any number of phases.

ML algorithms
We envision training a classical ML algorithm with classical shadows, where each classical shadow carries a label y indicating whether it represents a quantum state r from phase A [y(r) = 1] or phase B [y(r) = −1]. We want to show that a suitably chosen classical ML algorithm can learn to efficiently classify the phase for new classical shadows beyond those encountered during training. Following a strategy standard in learning theory, we consider a classical ML algorithm that maps each classical shadow to a corresponding feature vector in a high-dimensional feature space and then attempts to find a hyperplane that separates feature vectors in the A phase from feature vectors in the B phase. The learning is efficient if the geometry of the feature space is efficiently computable and if the feature map is sufficiently expressive. Thus, our task is to construct a feature map with the desired properties.
In the simpler task of classifying symmetrybreaking phases, there is typically a local order parameter O ¼ Under this criterion, the classification function may be chosen to be y r ð Þ ¼ sign tr Or ð Þ ½ . Hence, classifying symmetry-breaking phases can be achieved by finding a hyperplane that separates the two phases in the high-dimensional feature space that subsumes all r-body reduced-density matrices of the quantum state r. The feature vector consisting of all r-body reduced-density matrices of the quantum state r can be accurately reconstructed from the classical shadow representation S T (r) when T is sufficiently large.
Finding a suitable choice of hyperplane in the feature space can be cast as a convex optimization problem known as the soft-margin support vector machine (SVM), discussed in more detail in section S10.1 (46). With a sufficient amount of training data, the hyperplane found by the classical ML model will generalize so that the phase y(r) can be predicted accurately for a previously unseen quantum state r. The classical ML model is not merely a black box; it also discovers the order parameter (encoded by the hyperplane), guiding physicists toward a deeper understanding of the phase structure.
For more exotic quantum phases of matter, such as topologically ordered phases, the above classical ML model no longer suffices. The topological phase of a state is invariant under a constant-depth quantum circuit, and a phase containing the product state 0 j i n is called the trivial phase. Using these notions, we can prove that no observable-not even one that acts on the entire system-can be used to distinguish between two topological phases. The proof, given in section S9 (46), uses the observation that random single-qubit unitaries can confuse any global or local order parameter.

Proposition 2
Consider two distinct topological phases A and B (one of the phases could be the trivial phase). No observable O exists such that tr Or ð Þ > 0; ∀r ∈ phase A; tr Or ð Þ ≤ 0; Although this proposition implies that no linear function tr(Or) can be used to classify topologically ordered phases, it does not exclude nonlinear functions, such as quadratic functions tr Or r ð Þ, degree-d polynomials tr Or d À Á , and more general analytic functions. For example, it is known that the topological entanglement entropy (57, 58), a nonlinear function of r, can be used to classify a wide variety of topologically ordered phases. For this purpose, it suffices to consider a subsystem whose size is large compared with the correlation length of the state but is independent of the total size of the system. The correlation length in the ground state of a local Hamiltonian increases when the spectral gap between the ground state and the first excited state becomes smaller (59). On the other hand, a linear function on the full system will fail even with constant correlation length.
To learn nonlinear functions, we need a more expressive ML model. For this purpose, we devise a powerful feature map that takes the classical shadow S T (r) of the quantum state r to a feature vector that includes arbitrarily-large r-body reduced-density matrices, as well as an arbitrarily-high-degree polynomial expansion where t,g > 0 are hyperparameters. The direct sum ⊕ R r¼0 is a concatenation of all r-body reduced-density matrices, and the other direct sum ⊕ D d¼0 subsumes all degree-d polynomial expansions. The computational cost of finding a hyperplane in feature space that separates the training data into two classes is dominated by the cost of computing inner products between feature vectors. The inner product ϕ shadow ð Þ S T r ð Þ ½ ; ϕ shadow ð Þ S Tr ð Þ ½ can be analytically computed by reorganizing the direct sums, writing it as a double series, and wrapping both series into an exponential, which gives where S T r ð Þ and S Tr ð Þ are classical shadow representations of r andr, respectively. The computation time for the inner product is O(nT 2 ), linear in the system size n and quadratic in T, the number of copies of each quantum state that are measured to construct the classical shadow.

Rigorous guarantee
By statistical analysis, we can establish a rigorous guarantee for the classical ML model , where a is the trainable vector defining the classifying hyperplane. The result is the following theorem, proven in section S10 (46).

Theorem 2 (classifying quantum phases of matter; informal)
If there is a nonlinear function of few-body reduced-density matrices that classifies phases, then the classical algorithm can learn to classify these phases accurately. The required amount of training data and computation time scale polynomially in system size.
If there is an efficient procedure based on few-body reduced-density matrices for classifying phases, the proposed ML algorithm is guaranteed to find the procedure efficiently. This includes local order parameters for classifying symmetry-breaking phases and topological entanglement entropy in a sufficiently large local region for partially classifying topological phases (57,58). We expect that, to classify topological phases accurately, the classical ML model will need access to local regions that are sufficiently large compared with the correlation length, and as we approach the phase boundary, the correlation length increases. As a result, the classifying function for topological phases may depend on r-body subsystems with a larger r, and the amount of training data and computation time required would increase accordingly. The classical ML model not only classifies phases accurately but also constructs a classifying function explicitly. Our classical ML model may also be useful for classifying and understanding symmetryprotected topological (SPT) phases. SPT phases are characterized much like topological phases but with the additional constraint that all structures involved (states, Hamiltonians, and quantum circuits) respect a particular symmetry. It is reasonable to expect that an SPT phase can be identified by examining reduced-density matrices on constant-size regions (60)(61)(62)(63)(64)(65), where the size of the region is large compared with the correlation length. The existence of classifying functions based on reduced matrices has been rigorously established in some cases (66)(67)(68)(69)(70)(71)(72)(73). In section S12 (46), we prove that the ML algorithm is guaranteed to efficiently classify a class of gapped spin-1 chains in one dimension. For more general SPT phases, the ML algorithm should be able to corroborate known classification schemes, determine new and potentially more-compact classifiers, and shed light on interacting SPT phases in two or more dimensions for which complete classification schemes have not yet been firmly established.
The hypothesis of theorem 2, stating that phases can be recognized by inspecting regions of constant size independent of the total system size, is particularly plausible for gapped phases, but it might apply to some gapless phases as well. Our classical ML model would be able to efficiently classify such gapless phases. On the other hand, the contrapositive of theorem 2 asserts that if the classical ML model is not able to distinguish between two distinct gapless phases, then nonlocal data are required to characterize at least one of those phases.

Numerical experiments
We have conducted numerical experiments assessing the performance of classical ML algorithms in some practical settings. The results demonstrate that our theoretical claims carry over to practice, with the results sometimes turning out even better than our guarantees suggest.

Predicting ground-state properties
For predicting ground states, we consider classical ML models encompassed by Eq. 2. We examine various metrics k(x,x l ) equivalent to training neural networks with large hidden layers (47,50) or training kernel methods (51,74). We find the best ML model and the hyperparameters using a validation set to minimize root mean square error (RMSE) and report the predictions on a test set. The full details of the models and hyperparameters, as well as their comparisons, are given in sections S4.2 and S4.3 (46).

Rydberg atom chain
Our first example is a system of trapped Rydberg atoms (75,76), a programmable and highly controlled platform for Ising-type quantum simulations (77)(78)(79)(80)(81)(82). Following (77), we consider a 1D array of n = 51 atoms, with each atom effectively described as a two-level system composed of a ground state g j i and a highly excited Rydberg state r j i. The atomic chain is characterized by a Hamiltonian H(x) (given in Fig. 2A) whose parameters are the laser detuning x 1 = D/W and the interaction range x 2 = R b /a. The phase diagram (Fig. 2B) features a disordered phase and several broken-symmetry phases, stemming from the competition between the detuning and the Rydberg blockade (arising from the repulsive van der Waals interactions).
We trained a classical ML model using 20 randomly chosen values of the parameter x = (x 1 ,x 2 ); these values are indicated by gray circles in Fig. 2B. For each such x, an approximation to the exact ground state was found using density matrix renormalization group (DMRG) (6) based on the formalism of matrix product states (MPSs) (83). For each MPS, we performed T = 500 randomized Pauli measurements to construct a classical shadow. The classical ML model then predicted classical representations at the testing points in the parameter space, and these predicted classical representations were used to estimate expectation values of local observables at the testing points.
Predictions for expectation values of Pauli operators Z i and X i at the testing points are shown in Fig. 2C and were found to agree well with exact values obtained from the DMRG computation of the ground state at the testing points. Additional predictions can be found in section S4.1 (46). Also shown are results from a more-naïve procedure, in which properties are predicted using only the data at the point in the training set that is closest to the testing point. The naïve procedure predicts poorly, illustrating that the considered classical ML model effectively leverages the data from multiple points in the training set.
This example corroborates our expectation that classical machines can learn to efficiently predict ground-state representations. An important caveat is that the rigorous guarantee in theorem 1 applies only when the training points and the testing points are sampled from the same phase, whereas in this example, the training data include values of x from three different phases. Nevertheless, our numerics show that classical machines can still learn to predict well.

2D antiferromagnetic Heisenberg model
Our next example is the 2D antiferromagnetic Heisenberg model. Spin-½ particles (i.e., qubits) occupy sites on a square lattice, and for each pair (ij) of neighboring sites, the Hamiltonian contains a term J ij X i X j þ Y i Y j þ Z i Z j ð Þ , where the couplings {J ij } are uniformly sampled from the interval [0, 2]. The parameter x is a list of all J ij couplings; hence, in this case, the dimension of the parameter space is m = O(n), where n is the number of qubits. The Hamiltonian H(x) on a 5 × 5 lattice is shown in Fig. 3A.
We trained a classical ML model using 90 randomly chosen values of the parameter x = {J ij }. For each such x, the exact ground state was found using DMRG, and we simulated T = 500 randomized Pauli measurements to construct a classical shadow. The classical ML model predicted the classical representation at new values of x, and we used the predicted classical representation to estimate a two-body correlation function, the expectation value of Þ , for each pair of qubits (ij). In Fig. 3B, the predicted and actual values of the correlation function are displayed for a particular value of x, showing reasonable agreement. Figure 3C shows the prediction performance for all pairs of spins and for variable system sizes. Each red point in the plot represents the RMSE in the correlation function estimated using our predicted classical representation for a particular pair of spins and averaged over sampled values of x. For comparison, each blue point is the RMSE when the correlation function is predicted using the classical shadow obtained by measuring the actual ground state T = 500 times. For most correlation functions, the prediction error achieved by the best classical ML model is comparable to the error achieved by measuring the actual ground state.

Classifying quantum phases of matter
For classifying quantum phases of matter, we consider an unsupervised classical ML model that constructs an infinite-dimensional nonlinear feature vector for each quantum state r by applying the map ϕ shadow ð Þ in Eq. 5 with t,g = 1 to the classical shadow S T (r) of the quantum state r. We then perform a principal components analysis (PCA) (84) in the infinite-dimensional nonlinear feature space. The low-dimensional subspace found by PCA in the nonlinear feature space corresponds to a nonlinear low-dimensional manifold in the original quantum state space. This method is efficient using the shadow kernel k (shadow) given in Eq. 6 and the kernel PCA procedure (85). Details are given in sections S4.4. and S4.5 (46).

Bond-alternating XXZ model
We begin by considering the bond-alternating XXZ model with n = 300 spins. The Hamiltonian is given in Fig. 4A; it encompasses the bond-alternating Heisenberg model (d = 1) and the bosonic version of the Su-Schrieffer-Heeger model (d = 0) (86). The phase diagram in Fig. 4B is obtained by evaluating the partial reflection many-body topological invariant (62,87). There are three distinct phases: trivial, SPT, and symmetry broken.
For each value of J and d considered, we construct the exact ground state using DMRG and find its classical shadow by performing randomized Pauli measurement T = 500 times.
We then consider a 2D principal subspace of the infinite-dimensional nonlinear feature space found by the unsupervised ML based on the shadow kernel, which is visualized in Fig. 4, C and D. We can clearly see that the different phases are well separated in the principal subspace. This shows that even without any phase labels on the training data, the ML model can classify the phases accurately. Hence, when trained with only a small amount of labeled data, the ML model will be able to correctly classify the phases as guaranteed by theorem 2.
Distinguishing a topological phase from a trivial phase We consider the task of distinguishing the toric code topological phase from the trivial phase in a system of n = 200 qubits. Figure 5A illustrates the sampled topological and trivial states. We generate representatives of the nontrival topological phase by applying low-depth geometrically local random quantum circuits to Kitaev's toric code state (88) with code distance 10, and we generate representatives of the trivial phase by applying random circuits to a product state.
Randomized Pauli measurements are performed T = 500 times to convert the states to their classical shadows, and these classical shadows are mapped to feature vectors in the high-dimensional feature space using the feature map ϕ shadow ð Þ . Figure 5B displays a 1D projection of the feature space using the unsupervised classical ML model for various values of the circuit depth, indicating that the phases become harder to distinguish as the circuit depth increases. In Fig. 5C, we show the classification accuracy of the unsupervised classical ML model. We also compare with training CNNs that use measurement outcomes Huang   from the Pauli-6 positive operator-valued measure (POVM) (89) as input to learn an observable for classifying the phases. Because proposition 2 establishes that no observable (even a global one) can classify topological phases, this CNN approach is doomed to fail. On the other hand, if the CNN takes classical shadow representations as input, then it can learn nonlinear functions and successfully classify the phases.

Outlook
We have rigorously established that classical ML algorithms, informed by data collected in physical experiments or using classical calculations, can effectively address some quantum many-body problems. These results boost our hopes that classical ML trained on experimental data can solve practical problems in chemistry and materials science that would be too hard to solve using classical processing alone. Our arguments build on the concept of a classical shadow derived from randomized Pauli measurements. We expect, though, that other succinct classical representations of quantum states could be exploited by classi-cal ML with similarly powerful results. For example, some currently available quantum simulators are highly programmable but lack the local control needed to perform arbitrary single-qubit Pauli measurements. Instead, after preparing a many-body quantum state of interest, one might switch rapidly to a different Hamiltonian and then allow the state to evolve for a short time before performing a computational basis measurement. How can we make use of that measurement data to predict properties reliably ( Þ alternates between J and J′. (B) Phase diagram. The system's three distinct phases are characterized by the many-body topological invariantZ R , discussed in (62,87). Blue denotesZ R ¼ 1, red denotesZ R ¼ À1, and gray denotesZ R ≈0. data that are already routinely available to predict properties of chemical compounds and materials that have not yet been synthesized? Answering such questions will be important goals for future research.

Materials and methods summary
Here, we provide the key ideas for designing ML algorithms to predict ground states and to classify quantum phases of matter. We refer the readers to the supplementary materials (46) for algorithmic details and the proofs of the main theorems.

Predicting ground states
To understand why the ML algorithm works, we begin by considering a simpler task: training an ML model to predict a single groundstate property tr O r ð Þ, where O is an observable and r is the ground state. In this simpler task, the training data are x l →tr Or x l ð Þ ½ f g N l¼1 , where x l ∈ À1; 1 ½ m is a classical description of the Hamiltonian H(x l ) and r(x l ) is the ground state of H(x l ). Intuitively, in a quantum phase of matter, the ground-state property tr O r x ð Þ ½ changes smoothly as a function of the input parameter x. The smoothness condition can be rigorously established as an upper bound on the average magnitude of the gradient of tr O r x ð Þ ½ using quasi-adiabatic evolution (53,54), assuming that the spectral gap of H (x) is bounded below by a nonzero constant throughout the parameter space. The upper bound on the average gradient magnitude enables us to design a simple classical ML model based on an l 2 -Dirichlet kernel for generalizing from the training set to a new input where k x; x l ð Þ¼ X k∈Z m ; k k k 2 ≤L cos pk Á x À x l ð Þ ½ is the l 2 -Dirichlet kernel with cutoff L. Using statistical analysis, we can guarantee that the prediction error is small given a number of training data N polynomial in the number of parameters m.
The main idea of the statistical analysis is to bound the model complexity. In particular, the model complexity depends on the number of wave vectors k in the l 2 -Dirichlet kernel. The more wave vectors k that we include, the higher the model complexity and the more data needed in the ML model to achieve good prediction performance. We show that the number of m-dimensional wave vectors with a Euclidean norm bounded by L is m O(L) , and we only need to consider L to be of order ffiffiffiffiffiffiffi 1=e p to achieve prediction error at most e. We then generalize this idea to the task of predicting the ground-state representation. We consider a training data where s T r x l ð Þ ½ is the classical shadow representation of the quantum state r x l ð Þ obtained from performing randomized Pauli measurement on the stater x l ð Þ. Following the expression for predicting a fixed property, the predicted ground-state representation is given bŷ Using the property of classical shadows, we have tr Os T r x l ð Þ ½ f g ≈tr Or x l ð Þ ½ for a wide range of observables O. By moving the sum outside of the trace, we can reduce the problem to predicting a fixed ground-state property. Hence, if the classical ML model based on an l 2 -Dirichlet kernel can predict ground-state properties accurately, then it can predict the ground-state representation accurately.

Classifying quantum phases of matter
The ML algorithm is based on the SVM model. The underlying idea of SVM is simple and intuitive. Suppose that we have N data points that form two well-separated clusters. We may try to separate these training clusters with a linear hyperplane. When we get a new data point, we simply check which half space it belongs to and assign the label accordingly. However, there could be many hyperplanes that separate these two training clusters. SVM considers the hyperplane that yields the largest margin, which is equivalent to maximizing the distance from each cluster to the hyperplane. Intuitively, maximizing the margin allows the hyperplane to be most robust to the sampling errors of the training data. Using statistical analysis, one can rigorously show that the bigger the margin, the better the generalization performance would be.
SVM can be enhanced using the kernel trick. When the N data points cannot be separated using a linear hyperplane, we need to separate them using a more complex surface. This is achieved by mapping each data point to a high-dimensional vector space through a nonlinear mapping and looking for a linear hyperplane in the high-dimensional space. One can perform the training and prediction in the high-dimensional space by only computing inner products between two points in the highdimensional space. The inner product is often referred to as the kernel function, and this technique of mapping to a much larger space is known as the kernel trick. In many situations, one considers the high-dimensional space to be infinite dimensional. The shadow kernel that we defined in Eq. 6 also corresponds to an infinite-dimensional vector space.
For the task of classifying quantum phases of matter, we assume that there exists a classifying function f(r) based on a nonlinear function of the reduced-density matrices of the quantum state. More precisely, we assume that states r A in phase A satisfy f(r A ) >1 and states in phase B satisfy f(r A ) < −1. This assumption is often satisfied when we focus on states not too close to the phase boundary. We show in the supplementary materials (46) that various SPT phases and topologically ordered phases do satisfy this assumption. Because the shadow kernel corresponds to an inner product in an infinitedimensional space containing all possible nonlinear combinations of the reduced-density matrices, SVM based on the shadow kernel is able to learn the classifying function. The amount of data required to learn this classifying function depends on the margin of the hyperplane in the infinite-dimensional space, which can be shown to scale polynomially in system size.

Numerical experiments
For experiments on predicting ground-state properties, we consider the supervised ML algorithm described in Eq. 2. We examine metrics k x; x l ð Þ∈R based on Gaussian kernel, Dirichlet kernel, and neural tangent kernel (50). Depending on different training data sizes and the number of measurements per quantum state, we found that different kernels perform better than others. For classifying quantum phases of matter, we consider an unsupervised ML algorithm, where no labeled training data are provided. The kernel trick described above can also be applied to unsupervised ML algorithms. A standard example is kernel PCA. PCA tries to find a direction, known as the principal component, such that the data points along this direction are most separated. If the points are not well separated in any direction, then we can consider mapping all points to an infinite-dimensional space. Similar to the supervised setting, we only need to consider inner products between pairs of points in the infinite-dimensional space (kernel function) to find the principal component. Hence, we can also apply the shadow kernel to classify quantum phases of matter in an unsupervised fashion. This is what we considered in the numerical experiments shown in Fig. 4 and Fig. 5.