2011.13517.pdf

Spectral quadrature for the first principles study of crystal defects:

Application to magnesium

Swarnava Ghosh

and Kaushik Bhattacharya

National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830

Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125

November 30, 2020

Abstract

We present an accurate and efficient finite-difference formulation and parallel implementation of

Kohn-Sham Density (Operator) Functional Theory (DFT) for non periodic systems embedded in a bulk

environment. Specifically, employing non-local pseudopotentials, local reformulation of electrostat-

ics, and truncation of the spatial Kohn-Sham Hamiltonian, and the Linear Scaling Spectral Quadrature

method to solve for the pointwise electronic fields in real-space and the non-local component of the

atomic force, we develop a parallel finite difference framework suitable for distributed memory comput-

ing architectures to simulate non-periodic systems embedded in a bulk environment. Choosing examples

from magnesium-aluminum alloys, we first demonstrate the convergence of energies and forces with re-

spect to spectral quadrature polynomial order, and the width of the spatially truncated Hamiltonian. Next,

we demonstrate the parallel scaling of our framework, and show that the computation time and memory

scale linearly with respect to the number of atoms. Next, we use the developed framework to simulate

isolated point defects and their interactions in magnesium-aluminum alloys. Our findings conclude that

the binding energies of divacancies, Al solute-vacancy and two Al solute atoms are anisotropic and are

dependent on cell size. Furthermore, the binding is favorable in all three cases.

Keywords: Spectral quadrature, Linear-Scaling, Density Functional Theory, Defects, Magnesium

1 Introduction

Crystal defects, even when present in small concentrations are crucial in determining macroscopic properties

of materials. Vacancies, present in parts per million are fundamental to creep, spall and radiation aging.

Dislocations, whose density is as small as

−

per atomic row, are the primary carriers of plasticity in

metals, solutes present in parts per hundred are responsible for strengthening by interacting with the motion

of dislocations, further solutes can aggregate to nucleate a precipitate.

First principles calculations based on Kohn-Sham density functional theory (DFT)[21, 27] have become

central to computational materials research, thereby providing fundamental insights into materials properties

and behavior. The success of DFT can be attributed to its excellent predictive power with low accuracy

to cost ratio compared to other wavefunction based electronic structure methods [8, 6, 56]. In spite of its

success and widespread use, the efficient solution of Kohn-Sham equations is still computationally daunting,

thereby restricting the range of physical systems that can be investigated. In particular, crystal defects have

been particularly challenging because they lead to long-range interactions, the reason why they influence

mechanical behavior at small concentrations.

These challenges have led to the development of a number of linear-scaling DFT methods. However,

many of them assume exponential decay of the off-diagonal components of the density matrix

, and truncate

them to a finite width. However, while this is reasonable for insulators (since it requires the existence of a

band-gap), questions about accuracy remains in the case of metals. An alternative linear-scaling approach

– the linear scaling spectral Gauss quadratures (LSSGQ) – was introduced by Suryanarayana, Bhattacharya

arXiv:2011.13517v1 [physics.comp-ph] 27 Nov 2020

and Ortiz [53]. The key idea is to write the density matrix as a (Reimann-Stieltjes) integral over the spectrum

(energy states) of the linearized Hamiltonian and then to approximate it using quadratures. In particular,

Gauss quadratures allows one to use the Lanczos algorithm to evaluate the diagonal components of the

density matrix at

(1)

effort at each spatial point leading to a linear scaling algorithm when one has local

pseudopotentials. Further, the local aspect allows one to introduce variable resolution where fine resolution

is maintained where necessary and sub-grid sampling is used in regions of uniform deformation [53, 40, 41].

However, it is computationally difficult to compute off-diagonal components of the density matrix using

Gauss quadratures, and this makes LSSGQ difficult to extend to nonlocal pseudopotentials.

Suryanarayana [52] subsequently showed that LSSGQ may be considered a generalization of the Fermi

operator expansion (FOE) method using Lagrange polynomials. This work also proves error estimates for

FOE using a polynomial basis, and showed that Gauss quadratures were the most efficient. It also shows

that purification methods using Chebyshev polynomials is equivalent to the use of Clenshaw-Curtis quadra-

tures. This understanding has led to a series of efficient algorithms using spectral quadratures [43, 55] and

applications to first-principles molecular dynamics [55, 64, 48].

The first goal of this work is to retain the efficiency of LSSGQ, but also extend it to nonlocal pseu-

dopotentials. We do so by using LSSGQ for computing the electronic states with atomic positions and

Clenshaw-Curtis quadratures for computing the nonlocal contribution of the forces on the atom. We also

introduce a domain decomposition that enables parallel implementation.

The second goal of this work is to use this algorithm to study defects in magnesium. Magnesium is

abundantly available on the earth’s crust, the lightest among all commonly used structural metals, and has

among the highest strength to weight ratio [23, 35, 39]. Aluminum is a commonly used alloying element,

and the relative strength of AZ class of magnesium alloys can be attributed to the hexagonal closed-packed

(HCP) structure of the magnesium matrix and the

phase Mg

precipitates with body-centred cubic

structure (space group

) [35, 23, 39, 33].

We study isolated vacancy and isolated substitutional aluminum solute in a magnesium lattice along

with defect pairs – vacany pairs, solute-vacancy and solute pairs. These pairs play an important role in

determining the mechanical behavior and processing of magnesium and its alloys. Vacancy clusters can

give rise to prismatic dislocation loops[41, 13] or serve as nuclei for voids which in turn are important for

spall [32, 18, 12]. Such clusters can only form if vacancies in fact can bind. Similarly, aluminum has

limited solubility in magnesium and the resulting Mg

precipitates play a critical role in strengthening

magnesium alloys [39, 23, 33, 34, 14]. This in turn requires both the diffusion of aluminum in a magnesium

lattice and an accumulation of aluminum. The diffusion is greatly aided by the formation of solute-vacancy

pairs while the accumulation of aluminum is aided by the binding of aluminum solutes. Finally, vacancy

diffusion is important for dislocation climb, a critical mechanism in creep, and the formation of solute-

vacancy pairs are again important. Previous DFT studies have shown a solute vacancy binding energy that is

significantly smaller than that experimentally observed, and non-binding of Al solute pairs raising questions

about the mechanism of formation of Al-rich precipitates. We show that the study of these defects require

large computational cells of the type afforded by our algorithm, and suggests that previous contradictory

results may have been artifacts of small computational cells.

We introduce our method in Section 2 and describe the numerical implementation in Section 3. We study

the convergence and performance of our implementation in Section 4. We study defects in Section 5, and

close with brief comments in Section 6.

2 Methodology

2.1 Density Operator Formulation of Kohn Sham Density Functional Theory

We consider a cuboidal domain

Ω

with

atoms and

valence electrons. Let

{

,...,

}

be the positions of the nuclei with valence charges

{

,...,Z

}

, respectively. The free energy of this

system in Kohn-Sham Density Functional Theory (DFT) [21, 27] and expressed in terms of density operator

[36, 61] is

(

γ,

) = 2 Tr

(

−

∇

)

(

) + 2 Tr (

) +

(

ρ,

)

−

θS

(

)

(1)

where

is the density operator,

Tr(

)

denotes the trace of an operator,

(

)

(

)

is the electron density.

The first term in (1) is the kinetic energy of non-interacting electrons and

is the exchange correlation

energy in the local density approximation (LDA). Specifically we use the Perdew-Wang parameterization

[38] of the correlation energy calculated by Ceperley-Alder [9]. The third term,

2 Tr (

)

, is the contribution

of the non-local pseudopotential to the free energy.

is the non-local pseudopotential operator given by

(

′

) =

∑

(

′

) =

∑

Jlm

(

)

Jlm

(

′

)

where

is the contribution to the non-local pseudopotential operator from the

atom, and the summation

over

is over all nuclei whose supports of the non-local projectors

Jlm

overlap with domain

Ω

. The fourth

term is the electrostatic energy which is expressed within the local reformulation framework [54, 37, 15] as

(

ρ,

) = sup

{

−

∫

Ω

|∇

(

)

∫

Ω

(

) +

(

))

(

) d

}

self

(

)

(2)

where

denotes the electrostatic potential,

represents the total pseudocharge density of the nuclei and

self

(

)

is the self energy of the nuclei

self

(

) =

−

∑

∫

Ω

(

)

(

(3)

where

is the pseudocharge density of the

nucleus generating the potential

(the summation over

is over all nuclei in

R

whose pseudocharge densities overlap with

Ω

). The final term in (1) is the electronic

entropy arising from the partial occupancies of the electronic states at a finite electronic temperature

with

(

) =

−

[

(

log

+ (

I−

) log(

I−

)

]

(4)

and

the identity operator.

The

ground state

in DFT obtained by minimizing the functional

(

γ,

)

over all atomic positions

and all density operators

associated with

electrons. It is convenient to nest this minimization problem

by first calculating the

electronic ground state

(

) =

inf

{

s.t.

2 Tr(

}

(

γ,

)

(5)

and then relaxing over all atomic configurations

= inf

R

(

)

(6)

The Euler-Lagrange equation to the variational problem (5) is a nonlinear fixed-point problem:

(

,λ

;

) =

(

1 + exp

(

H−

))

−

(7)

where

is the electronic smearing, the Fermi energy

is the Lagrange multiplier employed to enforce

the constraint on the number of electrons, and the Hamiltonian

−

∇

(8)

with

δE

/δρ

the exchange-correlation potential and

the solution of the Poisson equation

−

∇

(

) =

(

) +

(

)

(9)

subject to appropriate boundary conditions. Note that

eff

is local (diagonal operator) and

hence its action on a function

is given by

(

eff

)(

) =

eff

(

)

(

)

. The action of the non-local

pseudopotential operator

is given by

(

)(

) =

∑

Jlm

(

)

∫

Ω

Jlm

(

′

)

(

′

(10)

After evaluating the electronic ground state, the free energy is calculated using the functional:

(

) =

(

)

−

∫

Ω

(

)

(

∫

Ω

(

)

−

(

))

(

−

θS

self

(

)

(11)

where

= 2 Tr(

)

is the

band structure energy

. The atomic force on the

atom is calculated using the

expression

∂

(

)

∂

∫

Ω

∇

(

)

(

−

4 Tr (

∇

)

(12)

where the first term is local (recall that

(

)

depends only on the local or diagonal components

(

) =

(

)

of the density operator) while the second term is non-local and requires the off-diagonal terms of the

density operator.

2.2 Linear Scaling Spectral Quadrature

We follow the Spectral Quadrature (SQ) method [40, 41, 53, 52, 43, 55] for solving the DFT problem. The

key idea is to ground state quantities as (Reimann-Stieltjes) integrals over the spectrum of the Hamiltonian.

Given the Fermi level

and the Hamiltonian

, we can use (7) to write

(

η,γη

) =

∫

(

,λ

;

η,η

(

) =

∫

(

λ,λ

;

η,η

(

)

(13)

for any function

where

is the spectrum of

, and

η,η

is the spectral measure of

contracted with

We use the spectral theorem [46] to obtain the second equality. We can now use quadratures to approximate

the integral. In this work we use Gauss quadratures to find the electronic ground state and Clenshaw-Curtis

quadratures to find the force on an atom.

Spectral Gauss quadrature

We follow the linear scaling spectral Gauss quadrature (LSSGQ) method of

Suryanarayana, Bhattacharya and Ortiz [53] that exploits the structure of Gauss quadratures to evaluate the

electronic ground state. In Gauss quadratures, we approximate any function

(

)

in terms of Lagrange

polynomials

(

)

(

)

≈

∑

(

)

(

)

(14)

where

is the degree of the expansion and

are the spectral nodes. We can use this expansion to approx-

imate the integral of the function

over the spectrum of

[

] =

∫

(

) d

η,η

(

)

≈

∑

(

)

(

∫

(

η,η

(

)

∑

(

)

(15)

where the spectral weight

denotes the integral

∫

(

η,η

(

)

Now, consider a discretization of the computational domain using a regular finite difference grid with

points, and let

{

}

be a set of orthonormal basis functions associated with this discretization such that

is compactly supported near the

grid point. We can then approximate the integrals that make up the

electronic ground state (cf. (11)) as

∑

, ,S

∑

, N

∑

(16)

where

= 2

∫

λg

(

λ,λ

;

) d

,η

(

)

≈

∑

(

,λ

;

)

(17)

−

∫

(

λ,λ

;

) d

,η

(

)

≈−

∑

(

,λ

;

)

(18)

= 2

∫

(

λ,λ

;

) d

,η

(

)

≈

∑

(

,λ

;

)

(19)

(

λ,λ

;

)

is the Fermi-Dirac function

(

λ,λ

;

) =

1 + exp(

−

)

(20)

and

(

λ,λ

;

)

is the pointwise contribution to the electronic entropy

(

λ,λ

;

) =

(

λ,λ

;

) log

(

λ,λ

;

) + (1

−

(

λ,λ

;

)) log(1

−

(

λ,λ

;

))

(21)

In Gauss quadrature, the weights

{

}

are fixed apriori, and the spectral nodes

{

}

are treated as

unknowns. An efficient way of evaluating the spectral weights and nodes at any grid point

is by employing

the Lanczos iteration

= (

H−

)

−

, k

= 0

,...,K

−

= 0

= 1

(22)

where

= (

)

= 0

,...,K

−

, and

is computed such that

= 1

= 0

,...,K

−

We denote the Jacobi matrix







−







(23)

The nodes

{

}

are calculated as the eigenvalues of

, and the weights

{

}

are the squares of

the first elements of the normalized eigenvectors of

The key observations of LSSGQ are that (i) the number of quadrature points

is independent of system

size and (ii) the vectors

remains zero except for a small region around the grid point

during Lanczos

iteration (22). Therefore, the evaluation of the spectral nodes at each grid point is

(1)

as is the evaluation of

all the electronic quantities of interest (cf. (11) and (16)). This enables us to evaluate the electronic ground

state at linear cost.

We further observe that this limited support of the vectors

enables the restriction of the Hamiltonian

to an appropriate subspace of the real-space computational domain

Ω

. This enables us to use domain

decomposition in our numerical implementation discussed in Section. 3.

Spectral Clenshaw-Curtis quadrature

While the Gauss quadrature provides a very efficient approach

to evaluating the electronic ground state since it depends only on the diagonal terms of the density matrix,

it is unable to evaluate quantities like the contribution of the non-local pseudopotential to the atomic force

that depends on the off-diagonal components. Therefore, we use the spectral Clenshaw-Curtis quadrature

[52, 43, 55].

The Hamiltonian is first scaled and shifted such that its spectrum lies in the interval

[

−

H−

(24)

where

= (

max

min

)

= (

max

−

min

)

, and

max

min

are the maximum and minimum

eigenvalues of

, respectively. Given any function

, we can rewrite (13) using the scaled and shifted

Hamiltonian

(

η,γη

) =

∫

(

,λ

;

η,η

(

) =

∫

−

(

;

η,η

(

) =

∫

−

(

λ,

;

η,η

(

)

(25)

where

= (

−

)

/τ

and

θ/τ

are the scaled Fermi energy and the scaled electronic temperature

respectively. In the Clenshaw-Curtis variant of the spectral quadrature, any function

(

)

is expanded in

terms of Chebyshev polynomials

(

)

as:

(

)

≈

∑

(

)

(

)

(26)

where

is the degree of the expansion,

are the spectral nodes which are fixed in Clenshaw-Curtis

quadrature. The non-local component of the atomic force [43, 55] as

4 Tr (

∇

)

≈

∑

(

∇

(

,λ

;

))

) = 4

∑

(

∇

(

;

))

)

≈

∑

∗

∇

(27)

where

are constants

∫

−

(

λ,

;

)

(

)

√

−

λ ,

(28)

Atomic positions

•

Pseudocharge

•

Guess electron density

•

Nonlocal pseudopotential

Linearized

Hamiltonian at

grid point

Spectral Gauss

Quadrature at grid

point

Fermi energy

solve

Electron density, band structure

energy and electronic entropy at

grid point

Electrostatic

potential at

grid point

Effective potential

at grid point

Potential mixing

Convergence?

Spectral

Clenshaw

Curtis Quadrature at

grid point

Yes

Atomic forces

Self Consistent Field iteration

Figure 1: Flowchart of the Self Consistent Field iteration for solving the DFT problem.

and

(

)

are functions which are determined by the three term recurrence relation for Chebyshev

polynomials:

= 2

−

(29)

Once again, the number of quadratures is independent of the system size and therefore this evaluation scales

linearly.

3 Numerical Implementation

Fig. 1 shows the flowchart of the scheme employed to solve the DFT problem. The non-linear fixed point

problem (Eqn. 7) is solved using the self-Consistent field (SCF) iteration. Briefly, given a charge density

and electrostatic potential, we construct a linearized Hamiltonian which is used to compute the spectral

weights nodes using Lanczos iteration from which the updated electronic states including an updated charge

density and electrostatic potential are determined; the process iterates till convergence. Once the electronic

ground state is established, the atomic relaxation for the overall ground state (Eqn. 6) is achieved using the

Non-Linear Conjugate Gradient (NLCG) method [49].

We discuss further details of the key components of the implementation below. In this paper, we are

interested in isolated defects. Therefore, we consider a computational domain that is embedded in an infinite

perfect crystal. The method can be extended to other situations including periodic boundary conditions and

isolated clusters surrounded by vacuum.

Discretization

Let

Ω =

be a cuboidal computational domain aligned with the coordinate

axis, and let it be discretized using a regular

grid so that the grid spacing is

in the

coordinate where

. We index the grid points with

and let

Ω

be the set of all grid points. We

discretize the gradient and Laplace operators using finite differences on this grid [17, 16].

Hamiltonian at each grid point

In each iteration of SCF, we need to determine the spectral nodes and

weights at each grid point. As evident from (22) and (29), this requires the calculation of the action of

Hamiltonian on vectors

. These vectors are compactly supported around a ball centered at the

grid

point. Therefore, it is sufficient to work with the Hamiltonian

projected on to a smaller subspace around

the

grid point. Specifically, we work with a cuboidal region of side

cut

and centered at the

grid

point which we call

region of influence

. This is shown schematically in Fig. 2a. The controllable parameter

cut

depends on the order of the quadrature which in turn depends on material properties and electronic

smearing temperature

[43, 55].

For grid points close to the edge of

Ω

, the nodal Hamiltonian can extend beyond the computational

domain for grid points which are near the boundary of

Ω

. However, since we consider problems where

our computational domain is embedded in an infinite crystal, we have to compute the Hamiltonian on an

extended region

Ω

′

of size

(

+ 2

cut

)

(

+ 2

cut

)

(

+ 2

cut

)

, and centered at the origin. The

values of the Hamiltonian associated with the annular

Ω

′

Ω

is obtained using precomputed electronic fields

(

and

) for the perfect crystal.

Domain decomposition

We use domain decomposition for parallel implementation. The computational

domain is partitioned into disjoint domains,

Ω =

⋃

Ω

, where

Ω

denotes the domain local to the

processor, and

is the total number of processors. The collection of grid points belonging to the

processor is denoted by

Ω

, where

Ω

⋃

Ω

, and

Ω

⋂

Ω

∅

(null set) for

. In our

implementation, we use the DMDA class available through the Portable, Extensible Toolkit for Scientific

Computation (PETSc) [3, 4] for mesh management. The communication between processes is handled by

Message Passing Interface (MPI) libraries [19, 20].

The region of influence of an grid point

owned by a process

(i.e.

∈

Ω

) may extend to the

spatial regions owned by neighboring processes. In such a situation, the values of the effective potential

eff

from neighboring processes are communicated to the process

using an MPI communicator. In Fig.

2b we schematically illustrate the parallel communication pattern involved for communicating the effective

potential

eff

from neighboring processes. This reduces the number of MPI related calls otherwise required

for matrix vector products.

Spectral weights, nodes and electronic fields

In each SCF iteration, the spectral weights

{

}

and

nodes

{

}

are computed at every grid point

∈

Ω

from the projected Hamiltonian

. Overall, this

is computationally the most expensive part of the method. However, the computation is local and with no

MPI related calls. Further, we do not explicitly store the matrices, and their multiplication with a vector is

performed in a matrix-free manner. These lead to excellent parallel efficiency.

Once the spectral weights

{

}

and spectral nodes

{

}

are computed at all the grid points

∈

Ω

for all the processes

, we first solve the following equation for the Fermi energy

= 2

∑

∈

Ω

∑

(

,λ

;

)

(30)

We utilize Brent’s algorithm [7] for this purpose. The summation across the polynomial degree

and the

grid point

is performed locally in each processes, and the summation across the processes

is performed

with one global MPI communication call.

(a) region of influence

(b) neighbor communicator

Figure 2: (a) Region of influence associated with the

grid point and extended domain. (b) Neighbor com-

municator in domain decomposition. The orange region is the union of partitioned domains that influence

the process

Once the Fermi energy is calculated, we calculate the point-wise band structure energy

, the point-wise

entropy

and the point-wise electron density

, following (17), (18), (19). These are all local.

Electrostatic and effective potential

Once the electron density

is calculated at the grid points, we cal-

culate the total electrostatic potential

at the grid points by solving the Poisson equation (9) on

Ω

subject

to Dirichlet boundary conditions obtained from the perfect crystal outside using the generalized minimal

residual algorithm (GMRES) [47]. Once the electrostatic potential is calculated at every grid point

∈

Ω

we calculate the effective potential

eff

at every grid point

eff

(

) +

(31)

where

(

)

is the exchange correlation potential.

The convergence of the SCF iteration is accelerated by employing mixing schemes. In every SCF iter-

ation, we mix the effective potential

eff

, where we have the option of employing Anderson mixing [2],

Pulay mixing scheme [44] and its periodic [5] and restarted [42] variants.

We check the convergence of SCF iteration by calculating the normalized error in the effective potential.

Free energy

The free energy is computed once the SCF iteration has converged using a discrete version of

Eqn. 11

(

)

≈

(

) =

∑

∈

Ω

[

{

(

−

)

(

−

)

}

−

θs

]

self

(32)

where

is the contribution of the exchange correlation energy at the

grid point, and

self

is the discrete

representation of the self energy (3):

self

−

(

)

∑

∈

Ω

∑

(33)

In evaluating them, the sum over the grid points are carried out locally on each MPI process followed by a

global sum across all the MPI processes

Atomic forces

The final step is the computation of the atomic forces (Eqn. 12). This has two parts, the

contributions of the local pseudopotential and the non-local pseudopotentials. The contribution of the local

pseudopotential to the atomic force is calculated as

J,l

∫

Ω

∇

(

)

(

≈

∑

∈

Ω

(

∇

)

(34)

where

∇

is the gradient operator in the discrete setting. The summation over the the grid points

is local

to every process, followed by one summation over the MPI processes.

The non-local contribution to the atomic force is calculated by employing the Clenshaw-Curtis quadra-

ture described in Section 2.2. At each grid point

∈

Ω

, we calculate the discrete Chebyshev vectors

using the iterative scheme (29), and the Clenshaw-Curtis quadrature weights

using (28) with the discrete

nodal Hamiltonian

. The non-local force is given by

J,nl

−

4 Tr(

∇

)

≈−

∑

∈

Ω

∇

(

∑

)

(35)

The calculation of the atomic forces scales linearly with the number of atoms.

4 Convergence and Performance

We set the electronic temperature to be

= 0.03333 Ha, and use Troullier-Martins non-local pseudopo-

tentials [57]. These yield lattice parameters of

= 6.043 Bohr,

= 9.848 Bohr and

c/a

ratio of 1.629 for

hexagonal closed packed (HCP) magnesium, and

=19.58 Bohr for body centered cubic (BCC) Mg

These agree with the previously reported values of the lattice constants:

=5.972 Bohr,

c/a

=1.61 (DFT) [10],

=6.066 Bohr,

c/a

=1.623 (experiment)[26, 28] for Mg, and the equilibrium lattice constant of the Mg

or MgAl phase is

= 19.96 Bohr (DFT) [11], and

= 19.653 Bohr (experiment) [63] for Mg

Convergence of spectral quadrature

We first verify the convergence with respect to spectral quadrature.

We take the same degree

for both the Gauss and the Clenshaw-Curtis quadrature. We study HCP Mg and

BCC MgAl. For each of these systems, we randomly perturb the atoms from the ground state to obtain the

test configuration, and use a mesh of

= 0

Bohr for pure Mg and

= 0

Bohr for MgAl.

Fig. 3a shows the convergence of the energy and Fig. 3b shows the convergence of atomic forces with

. The error is computed with respect to the reference that is taken to be the solutions for

240

. The

decay is not monotone because neither the free energy functional (Eqn. 1) nor the atomic forces (Eqn. 12) is

variational with respect to

. However, it broadly follows an exponential decay (Error

≈

−

βK

). The best

fit

for the various cases is shown in Table 1. The pure Mg system has a higher rate of convergence than the

100 120

-8

-6

-4

-2

(a) energy

100 120

-7

-6

-5

-4

-3

-2

(b) force

Figure 3: Convergence of (a) energy and (b) force as a function of degree of spectral quadrature.

Table 1: Rates of convergence

of the total energy, local force and the total force with the spectral quadrature

polynomial order

System

Energy

Force (local)

Force (total)

MgAl

0.070

0.075

0.036

0.109

0.088

0.042

MgAl system. Furthermore, in both the cases, the rate of convergence in total energy and the local component

of the atomic force is similar, whereas the rate of convergence in the total force, which require calculating

the non-local force component is smaller by almost a factor 2. This is so because the local component of the

atomic force and energy use only Gauss quadrature which depends on Lagrange polynomials whereas the

non-local component of atomic forces depends on Clenshaw Curtis quadrature with Chebyshev polynomials.

The former is known to be more accurate than the latter [52].

Since we require an accuracy of about

−

and

−

Ha/atom in our energy, we need between

and

for Mg, and between

and

for MgAl. Similarly, since we require an accuracy between

−

and

−

Ha/Bohr in the total force, we need between

and

100

for Mg and MgAl. These guide our

further calculations.

Convergence with radius of truncation

A loose upper bound of the size

of Lanczos vectors in (22)

is given by

≤

, where

is the average mesh spacing, and

is the finite difference order. This

suggests a choice of

cut

. Using the choice of parameters used for this study,

cut

≈

288

Bohr.

This would make the calculations extremely expensive. However, we now show that it is not necessary by

studying the error as a function of

cut

For the Mg and and MgAl systems, we use a Lanczos polynomial degree

and a Chebyshev

polynomial degree

100

, and vary

cut

from

Bohr. Fig. 4a and 4b shows the error in energy

and atomic force as a function of

cut

. The reference values of energy and atomic force is calculated using

cut

Bohr. From these figures, we observe that as

cut

increases, the error in energy and forces

decrease. We once again observe generally exponential decay, though it is not monotone (also see [43]).

This allows us to treat

cut

as a controllable approximation parameter.

-5

-4

-3

-2

-1

(a) energy

-5

-4

-3

-2

(b) force

Figure 4: Convergence of (a) energy and (b) total atomic force as a function of truncation radius

cut

of the

nodal Hamiltonian.

Scaling and performance

Next, we turn to the scaling and efficiency of the formulation and parallel

implementation developed in this work. We choose magnesium crystal with one aluminum solute atom,

placed at the center of the computational domain. The simulation parameters used are

Bohr,

cut

Bohr,

and

100

with a desired accuracy of

−

Ha/atom in energy and

−

Ha/Bohr in atomic force.

We first perform a strong scaling study with a

600

atom system, varying the number of cores from

420

. The wall times for each SCF iteration is presented in Fig 5a. The parallel efficiency on

420

relative to 30

cores is

percent. This data is plotted in terms of speed up in Fig. 5b. Next, we perform a weak scaling

study by selecting systems with sizes ranging from

atoms to

2560

atoms, while increasing the number of

processors from

900

. We choose these such that the number of atoms per MPI process is between two

and three. Fig 5c shows that the variation of CPU time for one SCF iteration versus the number of atoms is

linear (

≈ O

(

)

). Fig. 5d shows that the memory required also scales linearly with respect to the number

of atoms. All of this shows excellent scaling of our algorithm. This is due in part to restricting much of the

parallel communication to the MPI processes that are neighbors, and keeping the global communication to a

minimum.

We note that the spectral quadrature step accounts for greater than

percent of the total time in each SCF

iteration. The prefactor of spectral quadrature can be significantly reduced by incorporating reduced basis

methods such as Discrete Discontinious Basis Projection ([62]). Further, the number of SCF iterations to

achieve a fixed target SCF error increases with system size in metallic systems due to charge sloshing [24].

The introduction of real space preconditioning schemes [50, 29] is likely to reduce this for large metallic

systems.

5 Defects in magnesium

We now study isolated point defects and defect pairs in magnesium. Of particular interest are the formation

energy of isolated defects and binding energy of defect pairs. Let

(

M,n,m

)

be the energy of a crystal

with

solvent atoms,

solute atoms and

vacancies. The formation energy of a defect cluster with

solute atoms and

vacancies is the excess energy of the crystal with defect cluster over the those of perfect

(a) strong scaling

100

200

300

400

500

(b) speed up

(d) memory scaling

Figure 5: Scaling and performance of the framework. The dash dot line is the ideal linear scaling behavior.

crystals of the host and solute:

n,m

(

−

m,n,m

)

−

(

−

(36)

where

is the energy per atom of the solute in its perfect crystalline state. Note that when we have an

isolated solute and no vacancies, the formation energy

is referred to as dilute impurity energy. Further,

the binding energy of this cluster is

n,m

−

n,m

(37)

Note that the defect is stable when the formation energy is negative, and the defect cluster has favorable

binding when the binding energy is positive.

Isolated point defect

We calculate the formation energy of a monovacancy for various computational

domain size from

atoms to

1151

atoms, and the results are shown in Fig. 6a. We observe that the

formation energy strongly depends on cell size, and converges at cell sizes of approximately

1000

atoms

0.65

0.70

0.75

0.80

0.85

(a) Formation energy of a vacancy

(b) Electron density near a vacancy

-4.75

-4.70

-4.65

-4.60

-4.55

-4.50

-4.45

-4.40

-4.35

(d) Electron density near a Al solute

Figure 6: Vacancy (a,b) and an aluminum solute (c,d) in magnesium. The calculated vacancy formation

energy (a) and dilute impurity energy (c) for various computational cell size and computed electron density

along the basal plane for a vacancy (b) and solute (d). The ‘*’ marks in (b,d) indicate the projected positions

of the atoms in the basal plane at a height

above and below this plane.

8456

eV. This is broadly in agreement with values reported in the literature: calculated values

779

768

eV using local pseudo-potential and a coarse grained approach with

1024

billion atoms [40] and

measured values of

eV [31, 22, 58, 60]. Fig. 6b shows the electron density on the basal plane

in the vicinity of the vacancy. Unsurprisingly, it is depleted at the vacancy. An interesting feature is that the

electron density does not display the reflection symmetry of the basal plane (e.g. about the red dashed line).

This is because the three dimensional crystal breaks this symmetry at the plane at a height

above and

below this plane. This is emphasized by indicating the atoms on this upper and lower planes with a ‘*’ in the

figure. This observation plays a role in binding.

Next we compute the dilute impurity energy of an aluminum solute atom, and this is shown in Fig. 6c.

We again observe that this energy depends on the cell size and converges at a few hundred atoms. The

electron density on the basal plane is shown Fig. 6d. The electron density if elevated near the Al solute and

the distribution is more symmetric.