Machine-learned prediction of the electronic fields in a crystal

We propose an approach for exploiting machine learning to approximate electronic fields in crystalline solids subjected to deformation. Strain engineering is emerging as a widely used method for tuning the properties of materials, and this requires repeated density functional theory calculations of the unit cell subjected to strain. Repeated unit cell calculations are also required for multi-resolution studies of defects in crystalline solids. We propose an approach that uses data from such calculations to train a carefully architected machine learning approximation. We demonstrate the approach on magnesium, a promising light-weight structural material: we show that we can predict the energy and electronic fields to the level of chemical accuracy, and even capture lattice instabilities.


Introduction
A number of studies over the recent years have shown that the electronic structure of crystalline solids depends sensitively on the deformation, and therefore straining a lattice from its equilibrium structure can lead to new properties. For example, the perovskite SrTiO 3 which is usually paraelectric becomes ferroelectric when subjected to a lattice strain [1]. Silicon becomes electrically polarized under strain, and the role of strain on various functional materials has been extensively studied [2]. In metals, strain can lead to deformation twinning [3]. Strain-induced martensitic phase transitions are widely observed and exploited in shape-memory alloys and steels [4,5,6]. Finally, strain engineering is emerging as an important tool in 2D materials [7].
Density functional theory (DFT) [8] is a powerful tool to understand the electronic structure of matter, and thus has been instrumental to the understanding, design, and optimization of materials. Examples include the predictions of energy materials [9], the geometric design of polar metals [10], and the screening for high-performance piezoelectrics [11]. Strain-induced phenomena can also be studied using DFT but it requires the repeated electronic structure calculation of a crystalline lattice unit cell subject to various strains. Consequently, a systematic exploration of the strain space can be computationally expensive. In this work, we study if a neural network approximation can assist in this exploration. We are motivated by the success of machine learning, particularly deep neural networks in image recognition [12,13] and natural language processing tasks [14,15]. There is also a growing literature on the use of these methods in materials science [16].
Another motivation for our work comes from the study of defects in crystalline solids that play a critical role in determining mechanical and other properties of various solids: for example, vacancies are critical for creep and dislocations for plasticity. The perturbations caused by these defects decay very slowly, and therefore their direct study requires very large computational domains. Solving the DFT equations is prohibitive on such large computational domains, and a variety of approaches have been proposed (e.g., quantum mechanics/molecular mechanics [17,18,19,20,21] and embedded DFT [22]). Fago et al. [23] introduced a "local" DFT-based quasicontinuum method where the deformation of the atoms is assumed to follow a piece-wise affine deformation, and the energy density of each region is computed using a unit cell DFT calculation. A new approach was introduced by Suryanarayana et al. [24] that solves the DFT equations by introducing a numerical basis that exploits the decay. Specifically, the electronic fields are taken to be a sum of a piece-wise periodic 'predictor' and a slowly decaying 'corrector.' The approach leads to accurate solutions over millions of atoms. It can resolve the core, the far field, as well as the interactions between far-field stress and core of defects including dislocations [25,26]. The implementation of these approaches also requires the repeated solution of the unit cell subject to distortion.
In this paper, we study a deep neural network approximation for the energy and the underlying electronic fields in a unit cell of magnesium subjected to strain. We generate data by repeatedly solving the unit cell problem and use it to train a deep neural network. An important challenge is the representation of the electronic fields: these are elements of infinite dimensional function spaces whereas neural networks typically approximate maps between finite dimensional spaces. Therefore, we use the approach that combines model reduction and neural networks for high-fidelity approximations of maps between function spaces [27]. We show excellent, specifically chemical, accuracy of the trained neural network approximation over a range of strain. Further, the approximation is able to learn the onset of an instability.
Some recent works have focused on approximating electronic structure quantities, mainly electronic density, using machine learning as a means to bypassing DFT calculations. Chandrasekaran et al. [28] used a representation that encodes the atomic arrangement around any grid point and mapped it to electron charge density and local density of states (LDOS) spectrum at the corresponding grid point. This grid point method allows the quantities at each grid point to be evaluated independently and hence in parallelization. It can also be highly dependent on the discretization used and quickly become intractable as the system size grows. Grisafi and co-workers [29] expanded the electronic charge densities of different hydrocarbons as sums of atom-centered basis functions and machine-learned them using symmetry-adapted Gaussian process regression. The use of such localized basis set allows for transferability across different molecular systems, but is not applicable to metals. A work by Brockherde et al. [30] explores the mappings from potential to ground-state density represented in Fourier basis. One interesting finding is that learning energy indirectlyfrom potential to electron density followed by electron density to energy -yields better predictions than the direct map from potential to energy. It is noteworthy that all these methods use the atomic environment typically within some cutoff radius as the input of the learning maps, and this requires careful selection of descriptors. Our work focuses on crystals or solids and uses strain as its input. Its simplicity allows for highly accurate predictions of electronic fields along with the combination of data-based model reduction and neural networks. It also offers a convenient way to relate electronic structure calculations to larger continuum level constitutive response of materials.

Density functional theory
Given atomic positions {R I }, density functional theory seeks to find the total electronic free energy system F({R I }), the electronic charge density ρ(x), and other electronic functions of interest. To do so, we solve the Kohn-Sham equation for energy states E i and orbitals ψ i (x) (ignoring spins for simplicity of presentation and assuming a non-local pseudo-potential in the Kleinman-Bylander form), is the non-local portion of the psuedopotential and depends on the atomic positions {R I }, and V xc , V H , V ext are the exchange-correlation, Hartree, and external potential due to ions given by with valence electron density ρ, charge density b = b(x; {R I }) describing the local part of the pseudopotential, and exchange-correlation density e xc (ρ). It is convenient to write the Hamiltonian H in operator form, and introduce the corresponding (one-point) density operator whose diagonal component gives the electron density where f describes the occupancy and satisfies i f (E i ) = n with n being the total number of electrons in the system. The total electronic free energy of the system may be written as where β = 1/(k B T ), T is the fictitious electronic temperature, and S is the generalized entropy that determines the occupancy f . We label the first term of (4) the band structure energy ( U = Tr(Hγ)) and note that its density is If we view F as a functional on γ, then the Kohn-Sham equation (1) and an equation for f are the Euler-Lagrange equation associated with the variational problem. Note that the Kohn-Sham equation is non-linear because the exchange-correlation and the Hartree potential depend on the electron density, so it is usually solved by fixed point iteration (also known as the self-consistent field approach).
We specifically consider the entropy associated with the Marzari cold smearing with broadening κ = −0.5634 and E f is the Fermi energy or the Lagrange multiplier that enforces the constraint i f (E i ) = n 1 . The volumetric entropy (i.e. entropy per unit volume) is Putting all these together, we view density functional theory as a map where φ = V H + V ext is the Coulomb potential. 1 We have also repeated our work with the entropy of mixing S(f ) = −f log f 2 + (2 − f 2 ) log(1 − f 2 ) that gives f 2 to be the Fermi-Dirac function.

Crystals
A crystal is a periodic arrangement of N atoms described by a unit cell U bounded by three lattice vectors {a, b, c} and N atomic positions or basis vectors R I , I = 1, . . . , N . It is customary to introduce fractional coordinatesR I with respect to the lattice vectors.
Using the Bloch theorem, the electronic orbitals may be written as ψ i,k = exp(ik · x)Ψ i,k (x) where Ψ i,k is periodic and k is a vector in the Brillouin zone associated with U. The formulas above can be naturally extended (see Appendix A) and we obtain ρ(x), φ(x), u(x), s(x), to be periodic functions while the free energy F is now interpreted as energy per unit cell.
We are interested in the deformations of the crystal, so we choose a reference crystal structure with lattice vectors {a 0 , b 0 , c 0 } and atomic coordinates {R 0 I }. We can then describe the deformation (up to rotations) in terms of Now, given any deformed crystal and any set of atomic coordinates, we can find the electronic states by solving the electronic states as described above. Further, we can find the equilibrium states of the atoms {R e I } by solving ∂F ∂R I = 0, I = 1, . . . N − 1. Finally, the electronic quantities are functions defined on the deformed unit cell or a domain that depends on the strain. It is convenient to define them on a fixed domain, so we map them back to the reference lattice with a change of variablesρ( In summary, the deformation behavior of a crystal is described by the map where the electronic states are computed for the deformed crystal with the atoms in their equilibrium positions.

Implementation
The density functional theory calculations to evaluate the map Φ are conducted using the software ABINIT [32]. We use a plane-wave basis set with a kinetic energy cut-off of 24 Ha (Hartree), a Troullier-Martins norm-conserving pseudopotential with local channel l = 1, and local density approximation (LDA) in the Perdew-Wang 92 functional form as the exchange-correlation energy. Cold smearing of magnitude 0.01 Ha is used [31] and the Brillouin zone integration is performed using a 12 × 12 × 12 k-point sampling. Furthermore, the atomic positions are relaxed using the Broyden-Fletcher-Goldfarb-Shanno minimization.

Approach
We seek to learn an approximation for the map Φ for a given material. We first generate data by evaluating the map using DFT and seek to use this data to learn an approximation. However, note that the quantities ρ, φ, u, s are functions and thus elements of infinite-dimensional linear spaces. In practice, these are evaluated on a finite-dimensional discretization, but still we want our approximation to be independent of the particular discretization. Therefore we use an approach by Bhattacharya et al. [27] that combines model reduction with a deep neural net to learn the map Φ. The idea is to use model reduction to find a finite dimensional representation for each function and then use a deep neural net to learn the map. Specifically, we find maps such that Φ ≈ • Φ ml . In this work, we use principal component approximation (PCA) for model reduction p, and a deep neural net for Φ ml .

Magnesium
Magnesium is a hexagonal close-packed (HCP) material. It is the lightest of all structural materials, and of significant interest as a light-weight structural material for bio-medical, automotive, and protective applications [33,34,35]. We consider the four-atom unit cell shown in Figure 1 where a 0 , b 0 , and c 0 directions are the [1010], [0,1, 1, 0], and [0001] directions, respectively, in the HCP crystallographic notation. The atoms in the reference unit cell are located at fractional coordinates {0, 0, 0}, {1/2, 1/2, 0}, {0, 2/3, 1/2}, and {1/2, 1/6, 1/2} in the reference configuration. The a 0b 0 plane perpendicular to the c 0 -axis or (0001) plane is the basal plane. We have observed in our calculations that the basal planes deform uniformly, but slide relative to each other. After eliminating free translation of the unit cell, the fractional coordinates in the deformed configuration can be taken as

Architecture and training
We use PCA dimensions of d ρ = d φ = d u = d s = 50 and the following neural network architecture: (1) a two-layer dense network with hidden layer widths of 500 and the hyperbolic tangent activation function for each of Φ ρ ml ,Φ φ ml ,Φ u ml and Φ s ml ; (2) a three-layer dense network with hidden layer widths of 50, 100, 50, respectively, and the same type of activation function for Φ R ml and Φ F ml . These hyperparameters are selected based on four-fold cross-validation results.
We generate a total of 3000 data, with each input sample D drawn independently from a normal distribution truncated to two standard deviations satisfying λ a , λ b , λ c ∈ [0.9, 1.1] and θ a , θ b , θ c ∈ [84 • , 96 • ]. Such a distribution reflects the fact that smaller deformations are more likely to be encountered in real materials. Out of all the data generated, 2000 of them are reserved for training and validation, while the rest are used for testing.
Using the training data, we first identify the map p via PCA. This is followed by standardizing both the input and output of Φ ml to zero mean and unit variance, before we train the neural network parameters using the Adam optimization algorithm at a training rate of 0.001, a small l2 regularization of 0.0001 on the weights, and a batch size of 128 for a total of 4000 epochs. Subsequently, given any deformation in the testing data, we generate predictions by applying the map • Φ ml and compare them with the true values.

Computational costs
There are two elements to the computational cost. The first is the online cost of evaluation. This takes fractions of a second (0.002 second) on an Intel Skylake (2.1GHz) core compared to 30 minutes on 14 cores for a full DFT evaluation. Thus, learned approximations provide significant savings. The second is the one-time offline cost of generating the data and training. As noted, each data set takes 30 minutes on 14 cores Intel Skylake (2.1GHz) and we generate 2000 data sets for training. This is comparable to a single evaluation in a MacroDFT calculation. However, since each data set is independent, it is is trivially parallelizable. The cost of training is about 5 minutes on a single core.

Results: Electronic fields and energy
A typical result is shown in Figure 2. We observe that our approach is able to capture the main features of the electronic fields, with very small errors. Figure 3 compares the predicted and actual energies (band structure energy, entropic energy, and total free energy). The mean errors are 0.15 mHa, 0.014 mHa, and 0.10 mHa while the maximum errors are 2.4 mHa, 0.13 mHa, and 1.5 mHa for Figure 3 (a), (b), and (c), respectively. Importantly, since there are four atoms in this unit cell, all the errors are significantly smaller than 1.6 mHa/atom (or 1 kcal/mol) which is widely accepted as the accuracy required for chemical accuracy [36].
We now turn to understanding the training and the actual distribution of errors. We introduce a normalized root-mean-square error (NRMSE), defined as root-mean-square error divided by the range of the data (evaluated on a discretized grid of size N d for field quantities) (12) Figure 4 shows how the normalized root-mean-square error (evaluated on a N d = 36 × 64 × 60 grid for field quantities) averaged over the 1000 test samples decreases with an increasing number of training samples. We see that the error has stabilized at 2000 training samples for this set of test samples. The figure also shows the error due to PCA, i.e., the error associated with the model reduction from an infinite dimensional function space to a finite dimensional representation of the electronic fields. We see that the PCA error is about ten times smaller than the overall error. Figure 5 shows how the NRMSE changes with volumetric strain for these 1000 test samples. We notice that the error in electron density, Coulomb potential, and band structure energy density is extremely small (∼0.1%) in almost all cases with a maximum error of ∼8% in isolated cases. Indeed, the five points with the largest error in all these plots all correspond to the same five test data points and are associated with very large distortions in the unit cell that were poorly Notice that the scale used to display the error is significantly smaller than the scale used to display the quantities except for entropy, which is small.
represented in the normal distribution used to sample the training data. The error is larger in volumetric entropy averaging ∼11% mainly because this is a very small number. Finally, the error displacement of the basis atoms is 0.0035 Bohr which is much smaller than the maximum and average atomic displacements of 0.66 Bohr and 0.097 Bohr, respectively. Again, the five points with the largest error here correspond to the same data points with large error in Figure 5.
An intended goal of this work is the use of the electronic fields as pre-conditioners or predictor fields in larger multiscale calculations. To evaluate their efficacy in doing so, we calculate the total free energy in four ways. First, in the direct approach, we learn the map from the deformation to the total energy (Φ F ml ). In the second sum approach, we evaluate the total energy using equation (4) from the learned electron density, band structure energy and entropy fields, and atomic coordinates. Third, in the orbital approach, we use the learned electron density and atomic coordinates to construct the Hamiltonian, find the electronic orbitals, and then use the combination of the learned electron density and atomic coordinates with the computed orbitals (to compute the kinetic energy). Note that in the third approach, we do not use the band structure energy, but effectively recompute it by computing the orbitals. So, the difference between the direct and orbital energies is a measure of the inconsistency between the learned band structure energy and the learned energy density. Finally, in the SCF approach, we perform one SCF iteration starting from the learned electron density and atomic coordinates before computing the energy. Note that we do not directly use the learned electron density, but recompute it using a SCF iteration. So, the difference between orbital and SCF values indicates whether the learned electron density is close to convergence, or to use this electron density as a precursor. Figure 7 shows how closely the free energy predicted from the four approaches match to their true values. The sum approach leads to a very low average error of 0.15 mHa across all 1000 test data with the maximum error standing at 2.4 mHa. These errors are consistent with those of the band structure energy in Figure 3, emphasizing the contribution of this band structure energy to the total energy. The orbital and SCF approaches lead to further reductions, and they even outperform the direct approach. All of these signify that our proposed approach not only learns the energy but also the fields extremely accurately, and thus can be used as predictors in larger calculations.
Finally, we compare our approximation with linear regression in Figure 8. It shows that the error due to linear regression (LR) is significantly higher than that due to our nonlinear approximation using neural networks (NN), thus demonstrating the efficacy of our architecture (NN).

Results: Stresses and instability
In order to further assess the accuracy of the learned electronic fields and energy, and to understand its efficacy in practice, we study the derivative quantities (i.e. stresses) in the crystal subjected to deformation. According to the Cauchy-Born rule [37], the macroscopic deformation gradient in any macroscopic deformation is equal to the matrix F that maps the reference lattice to the current lattice. The corresponding nonlinear strain measure E = 1/2(F T F − I) where I is the identity. The Cauchy or true stress in the material is given by  (12)) marked as ML with training size for the four scalar field quantities. The NRMSE shown in the plot is averaged over all 1000 test samples. The figure also shows the PCA error. where V is the volume of the deformed unit cell and F is the free energy of the unit cell. It is common [38,39] to approximate this using a small distortion approximation where ε = 1/2(F + F T − I) ≈ E is the linear strain measure. We use this expression in our work, though the results can easily be adapted to the nonlinear counterpart. The linear strain is related to the variables we use to describe the lattice (λ a , λ b , λ c , θ a , θ b , θ c ) through the relation and we may obtain ∂F/∂ε ij using the chain rule (see Appendix B for details). We focus specifically on the state of uniaxial stress where σ ii (for i =1, 2 or 3) is non-zero, but all the other components are zero. We therefore prescribe ε ii , and solve (14) to obtain the other   components of strain. The solution to this equation is equivalent to minimizing the energy, and we use gradient descent with the step size chosen according to the Barzilai-Borwein method [40].
The results are shown in Figure 9 for the stress in the x 1 , x 2 , and x 3 directions as well as the corresponding transverse strains 2 . Figure 9(a) compares the ground truth stress to the ML approach, where the free energy is computed using the direct approach described earlier. We see that the ML approach predicts the stress extremely well except for high compression in the x 2 direction where there is an instability. Still, the approach captures the onset of this instability. Figures 9(b-d) show the corresponding transverse strains. We see that this is not linear in any case. Further, there is a dramatic expansion in the x 1 direction (a-axis) as we reach the instability during severe compression in the x 2 direction. Figure 10(a) shows the corresponding electronic fields and positions of the internal atoms. Specifically, given the value of ε 22 , we find the other components of strain and the corresponding deformation variables. We then interrogate our learned maps to find the electronic fields and atomic positions. The figure also compares the learned and ground truth atomic positions, and we see good agreement except at the instability. We observe that in most cases, the atoms in the unit cell lie very near to the dashed lines, thus exhibiting a deformed HCP structure. However the atomic positions change very abruptly under high compression in the x 2 direction. The learned atomic positions capture this abrupt change to some extent.
Together, Figures 9 and 10 show that as the crystal is compressed in the x 2 direction and reaches its instability, it elongates dramatically along the x 1 direction. Consequently, the lengths of a, b, and c edges become similar. This is accompanied by the relative sliding of the basal planes (see Figure 10(a) where the atoms in thex 3 = 0 plane move in the positive x 2 direction whereas the atoms in thex 3 = 1/2 basal plane move in the negative x 2 direction), bringing the atoms on the middle basal plane to the face centers of the 4-atom unit cell. Thus, the structure approaches that of a face-centered cubic (FCC) lattice as shown in Figure 10(b). In other words, we see a HCP to FCC phase transition. This has been observed under high hydrostatic confinement [41], but is generally overshadowed by the (1012) tension twin mode at lower confinements 3 .

Conclusion
In this paper, we have presented a machine learning approximation for the change in the electronic structure as a crystal is deformed. We have demonstrated the approach on magnesium and shown that the machine-learned predictions reach the level of chemical accuracy. In particular, we not only learn energy values accurately, but also predict electronic fields with minimal error. These show that the models can indeed be sufficiently accurate to be useful as predictors or pre-conditioners for large-scale DFT methods. Finally, we have computed derivative quantities such as stresses under specific loading conditions from the learning models and found that they match very well with the ground truth DFT results. The model can even capture the onset of strain-induced phase transformation. All these further indicate another future direction of extending the learning model to one that can extract DFT-informed constitutive relations that can be easily incorporated in continuum level calculations.
Here we restrict k to the irreducible Brillouin zone (IBZ) of U and add appropriate weights w k satisfying k∈IBZ w k = 1. The energy eigenstates have also been normalized such that |ψ i,k (x)| 2 dx = 1 for all i, k. Finally, we consider the total energy expression. The Kohn-Sham ground state energy is where the first term of the expression is the independent-particle kinetic energy, k∈IBZ w k f (E i,k )|ψ i,k | 2 is the valence electronic density, E H is the Hartree energy (or in other words, the classical Coulomb interaction energy of the electron density interacting with itself), and E ion is the Coulomb energy associated with interactions among the ions at positions {R I } which is computed using the Ewald summation method. There is also an additional energy contribution known as E core arising from the fact that the ion is not a point charge. Its exact expression can be found in [42].
Using the Kohn-Sham equation H KS ψ i,k = E i,k ψ i,k with H KS = − 1 2 ∇ 2 + V H + V ext + V xc and orthonormality condition ψ * i,k (x)ψ i,k (x)dx = δ i,i δ k,k , we may rewrite the ground state energy in terms of the band structure energy U analogous to equation (4) as follows: Then, the total Helmholtz free energy of the system is simply
The first and the fourth lines represent the preprocessing steps that remove the mean and variance from the data, where b (0) , W (0) , b (5) , W (5) are the mean and standard deviation of the input D and output F of the model. The second and third lines indicate the dense neural network with three hidden layers, hyperbolic tangent activation function g(x) = tanh(x), and fitted weights b (k) , W (k) with k = 1, 2, 3. We note that we have intentionally chosen an activation function with smooth derivative which is the case of hyperbolic tangent function to allow for easy minimization of energy F via the gradient descent method. The derivatives of F with respect to h (−1) can be computed as follows: