Spectral quadrature for the first principles study of crystal defects:
Application to magnesium
Swarnava Ghosh
a
and Kaushik Bhattacharya
b
a
National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
b
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125
November 30, 2020
Abstract
We present an accurate and efficient finite-difference formulation and parallel implementation of
Kohn-Sham Density (Operator) Functional Theory (DFT) for non periodic systems embedded in a bulk
environment. Specifically, employing non-local pseudopotentials, local reformulation of electrostat-
ics, and truncation of the spatial Kohn-Sham Hamiltonian, and the Linear Scaling Spectral Quadrature
method to solve for the pointwise electronic fields in real-space and the non-local component of the
atomic force, we develop a parallel finite difference framework suitable for distributed memory comput-
ing architectures to simulate non-periodic systems embedded in a bulk environment. Choosing examples
from magnesium-aluminum alloys, we first demonstrate the convergence of energies and forces with re-
spect to spectral quadrature polynomial order, and the width of the spatially truncated Hamiltonian. Next,
we demonstrate the parallel scaling of our framework, and show that the computation time and memory
scale linearly with respect to the number of atoms. Next, we use the developed framework to simulate
isolated point defects and their interactions in magnesium-aluminum alloys. Our findings conclude that
the binding energies of divacancies, Al solute-vacancy and two Al solute atoms are anisotropic and are
dependent on cell size. Furthermore, the binding is favorable in all three cases.
Keywords: Spectral quadrature, Linear-Scaling, Density Functional Theory, Defects, Magnesium
1 Introduction
Crystal defects, even when present in small concentrations are crucial in determining macroscopic properties
of materials. Vacancies, present in parts per million are fundamental to creep, spall and radiation aging.
Dislocations, whose density is as small as
10
−
8
per atomic row, are the primary carriers of plasticity in
metals, solutes present in parts per hundred are responsible for strengthening by interacting with the motion
of dislocations, further solutes can aggregate to nucleate a precipitate.
First principles calculations based on Kohn-Sham density functional theory (DFT)[21, 27] have become
central to computational materials research, thereby providing fundamental insights into materials properties
and behavior. The success of DFT can be attributed to its excellent predictive power with low accuracy
to cost ratio compared to other wavefunction based electronic structure methods [8, 6, 56]. In spite of its
success and widespread use, the efficient solution of Kohn-Sham equations is still computationally daunting,
thereby restricting the range of physical systems that can be investigated. In particular, crystal defects have
been particularly challenging because they lead to long-range interactions, the reason why they influence
mechanical behavior at small concentrations.
These challenges have led to the development of a number of linear-scaling DFT methods. However,
many of them assume exponential decay of the off-diagonal components of the density matrix
γ
, and truncate
them to a finite width. However, while this is reasonable for insulators (since it requires the existence of a
band-gap), questions about accuracy remains in the case of metals. An alternative linear-scaling approach
– the linear scaling spectral Gauss quadratures (LSSGQ) – was introduced by Suryanarayana, Bhattacharya
1
arXiv:2011.13517v1 [physics.comp-ph] 27 Nov 2020
and Ortiz [53]. The key idea is to write the density matrix as a (Reimann-Stieltjes) integral over the spectrum
(energy states) of the linearized Hamiltonian and then to approximate it using quadratures. In particular,
Gauss quadratures allows one to use the Lanczos algorithm to evaluate the diagonal components of the
density matrix at
O
(1)
effort at each spatial point leading to a linear scaling algorithm when one has local
pseudopotentials. Further, the local aspect allows one to introduce variable resolution where fine resolution
is maintained where necessary and sub-grid sampling is used in regions of uniform deformation [53, 40, 41].
However, it is computationally difficult to compute off-diagonal components of the density matrix using
Gauss quadratures, and this makes LSSGQ difficult to extend to nonlocal pseudopotentials.
Suryanarayana [52] subsequently showed that LSSGQ may be considered a generalization of the Fermi
operator expansion (FOE) method using Lagrange polynomials. This work also proves error estimates for
FOE using a polynomial basis, and showed that Gauss quadratures were the most efficient. It also shows
that purification methods using Chebyshev polynomials is equivalent to the use of Clenshaw-Curtis quadra-
tures. This understanding has led to a series of efficient algorithms using spectral quadratures [43, 55] and
applications to first-principles molecular dynamics [55, 64, 48].
The first goal of this work is to retain the efficiency of LSSGQ, but also extend it to nonlocal pseu-
dopotentials. We do so by using LSSGQ for computing the electronic states with atomic positions and
Clenshaw-Curtis quadratures for computing the nonlocal contribution of the forces on the atom. We also
introduce a domain decomposition that enables parallel implementation.
The second goal of this work is to use this algorithm to study defects in magnesium. Magnesium is
abundantly available on the earth’s crust, the lightest among all commonly used structural metals, and has
among the highest strength to weight ratio [23, 35, 39]. Aluminum is a commonly used alloying element,
and the relative strength of AZ class of magnesium alloys can be attributed to the hexagonal closed-packed
(HCP) structure of the magnesium matrix and the
β
phase Mg
17
Al
12
precipitates with body-centred cubic
structure (space group
I
̄
43
m
) [35, 23, 39, 33].
We study isolated vacancy and isolated substitutional aluminum solute in a magnesium lattice along
with defect pairs – vacany pairs, solute-vacancy and solute pairs. These pairs play an important role in
determining the mechanical behavior and processing of magnesium and its alloys. Vacancy clusters can
give rise to prismatic dislocation loops[41, 13] or serve as nuclei for voids which in turn are important for
spall [32, 18, 12]. Such clusters can only form if vacancies in fact can bind. Similarly, aluminum has
limited solubility in magnesium and the resulting Mg
17
Al
12
precipitates play a critical role in strengthening
magnesium alloys [39, 23, 33, 34, 14]. This in turn requires both the diffusion of aluminum in a magnesium
lattice and an accumulation of aluminum. The diffusion is greatly aided by the formation of solute-vacancy
pairs while the accumulation of aluminum is aided by the binding of aluminum solutes. Finally, vacancy
diffusion is important for dislocation climb, a critical mechanism in creep, and the formation of solute-
vacancy pairs are again important. Previous DFT studies have shown a solute vacancy binding energy that is
significantly smaller than that experimentally observed, and non-binding of Al solute pairs raising questions
about the mechanism of formation of Al-rich precipitates. We show that the study of these defects require
large computational cells of the type afforded by our algorithm, and suggests that previous contradictory
results may have been artifacts of small computational cells.
We introduce our method in Section 2 and describe the numerical implementation in Section 3. We study
the convergence and performance of our implementation in Section 4. We study defects in Section 5, and
close with brief comments in Section 6.
2
2 Methodology
2.1 Density Operator Formulation of Kohn Sham Density Functional Theory
We consider a cuboidal domain
Ω
with
N
atoms and
N
e
valence electrons. Let
R
=
{
R
1
,
R
2
,...,
R
N
}
be the positions of the nuclei with valence charges
{
Z
1
,Z
2
,...,Z
N
}
, respectively. The free energy of this
system in Kohn-Sham Density Functional Theory (DFT) [21, 27] and expressed in terms of density operator
[36, 61] is
F
(
γ,
R
) = 2 Tr
(
−
1
2
∇
2
γ
)
+
E
xc
(
ρ
) + 2 Tr (
V
γ
) +
E
el
(
ρ,
R
)
−
θS
(
γ
)
,
(1)
where
γ
is the density operator,
Tr(
.
)
denotes the trace of an operator,
ρ
(
x
)
:=
2
γ
(
x
,
x
)
is the electron density.
The first term in (1) is the kinetic energy of non-interacting electrons and
E
xc
is the exchange correlation
energy in the local density approximation (LDA). Specifically we use the Perdew-Wang parameterization
[38] of the correlation energy calculated by Ceperley-Alder [9]. The third term,
2 Tr (
V
γ
)
, is the contribution
of the non-local pseudopotential to the free energy.
V
is the non-local pseudopotential operator given by
V
(
x
,
x
′
) =
∑
J
V
J
(
x
,
x
′
) =
∑
J
∑
lm
c
Jl
χ
Jlm
(
x
,,
R
J
)
χ
Jlm
(
x
′
,
R
J
)
where
V
J
is the contribution to the non-local pseudopotential operator from the
J
th
atom, and the summation
over
J
is over all nuclei whose supports of the non-local projectors
χ
Jlm
overlap with domain
Ω
. The fourth
term is the electrostatic energy which is expressed within the local reformulation framework [54, 37, 15] as
E
el
(
ρ,
R
) = sup
φ
{
−
1
8
π
∫
Ω
|∇
φ
(
x
,
R
)
|
2
d
x
+
∫
Ω
(
ρ
(
x
) +
b
(
x
,
R
))
φ
(
x
,
R
) d
x
}
+
E
self
(
R
)
,
(2)
where
φ
denotes the electrostatic potential,
b
represents the total pseudocharge density of the nuclei and
E
self
(
R
)
is the self energy of the nuclei
E
self
(
R
) =
−
1
2
∑
I
∫
Ω
b
I
(
x
,
R
)
V
I
(
x
,
R
)d
x
,
(3)
where
b
I
is the pseudocharge density of the
I
th
nucleus generating the potential
V
I
(the summation over
I
is over all nuclei in
R
3
whose pseudocharge densities overlap with
Ω
). The final term in (1) is the electronic
entropy arising from the partial occupancies of the electronic states at a finite electronic temperature
θ
with
S
(
γ
) =
−
2
k
B
Tr
[
(
γ
log
γ
+ (
I−
γ
) log(
I−
γ
)
]
(4)
and
I
the identity operator.
The
ground state
in DFT obtained by minimizing the functional
F
(
γ,
R
)
over all atomic positions
R
and all density operators
γ
associated with
N
e
electrons. It is convenient to nest this minimization problem
by first calculating the
electronic ground state
:
ˆ
F
(
R
) =
inf
{
γ
s.t.
2 Tr(
γ
)=
N
e
}
F
(
γ,
R
)
(5)
and then relaxing over all atomic configurations
F
0
= inf
R
ˆ
F
(
R
)
.
(6)
The Euler-Lagrange equation to the variational problem (5) is a nonlinear fixed-point problem:
γ
=
g
(
H
,λ
f
;
θ
) =
(
1 + exp
(
H−
λ
f
I
k
B
θ
))
−
1
(7)
3
where
k
B
θ
is the electronic smearing, the Fermi energy
λ
f
is the Lagrange multiplier employed to enforce
the constraint on the number of electrons, and the Hamiltonian
H
is
H
=
−
1
2
∇
2
+
V
xc
+
φ
+
V
,
(8)
with
V
xc
=
δE
xc
/δρ
the exchange-correlation potential and
φ
the solution of the Poisson equation
−
1
4
π
∇
2
φ
(
x
,
R
) =
ρ
(
x
) +
b
(
x
,
R
)
(9)
subject to appropriate boundary conditions. Note that
V
eff
=
V
xc
+
φ
is local (diagonal operator) and
hence its action on a function
f
is given by
(
V
eff
f
)(
x
) =
V
eff
(
x
)
f
(
x
)
. The action of the non-local
pseudopotential operator
V
is given by
(
V
f
)(
x
) =
∑
J
V
J
f
=
∑
J
∑
lm
c
Jl
χ
Jlm
(
x
)
∫
Ω
χ
Jlm
(
x
′
,
R
J
)
f
(
x
′
)d
x
′
.
(10)
After evaluating the electronic ground state, the free energy is calculated using the functional:
ˆ
F
(
R
) =
U
+
E
xc
(
ρ
)
−
∫
Ω
V
xc
(
ρ
)
ρ
(
x
)d
x
+
1
2
∫
Ω
(
b
(
x
,
R
)
−
ρ
(
x
))
φ
(
x
,
R
)d
x
−
θS
+
E
self
(
R
)
,
(11)
where
U
= 2 Tr(
γ
H
)
is the
band structure energy
. The atomic force on the
J
th
atom is calculated using the
expression
f
J
=
∂
ˆ
F
(
R
)
∂
R
J
=
∫
Ω
∇
b
J
(
x
,
R
J
)
φ
(
x
,
R
)d
x
−
4 Tr (
V
J
∇
γ
)
(12)
where the first term is local (recall that
φ
(
x
,
R
)
depends only on the local or diagonal components
ρ
(
x
) =
γ
(
x
,
x
)
of the density operator) while the second term is non-local and requires the off-diagonal terms of the
density operator.
2.2 Linear Scaling Spectral Quadrature
We follow the Spectral Quadrature (SQ) method [40, 41, 53, 52, 43, 55] for solving the DFT problem. The
key idea is to ground state quantities as (Reimann-Stieltjes) integrals over the spectrum of the Hamiltonian.
Given the Fermi level
λ
F
and the Hamiltonian
H
, we can use (7) to write
(
η,γη
) =
∫
σ
g
(
H
,λ
f
;
θ
)d
μ
η,η
(
λ
) =
∫
σ
g
(
λ,λ
f
;
θ
)d
μ
η,η
(
λ
)
(13)
for any function
η
where
σ
is the spectrum of
H
, and
μ
η,η
is the spectral measure of
H
contracted with
η
.
We use the spectral theorem [46] to obtain the second equality. We can now use quadratures to approximate
the integral. In this work we use Gauss quadratures to find the electronic ground state and Clenshaw-Curtis
quadratures to find the force on an atom.
Spectral Gauss quadrature
We follow the linear scaling spectral Gauss quadrature (LSSGQ) method of
Suryanarayana, Bhattacharya and Ortiz [53] that exploits the structure of Gauss quadratures to evaluate the
electronic ground state. In Gauss quadratures, we approximate any function
f
(
λ
)
in terms of Lagrange
polynomials
P
η
k
(
λ
)
as
f
(
λ
)
≈
K
∑
k
=0
f
(
λ
η
k
)
P
η
k
(
λ
)
(14)
4
where
K
is the degree of the expansion and
λ
η
k
are the spectral nodes. We can use this expansion to approx-
imate the integral of the function
f
over the spectrum of
H
,
I
[
f
] =
∫
σ
f
(
λ
) d
μ
η,η
(
λ
)
≈
K
∑
k
=0
f
(
λ
η
k
)
(
∫
σ
P
η
k
(
λ
)d
μ
η,η
(
λ
)
)
=
K
∑
k
=1
w
η
k
f
(
λ
η
k
)
(15)
where the spectral weight
w
η
k
denotes the integral
∫
σ
P
η
k
(
λ
)d
μ
η,η
(
λ
)
.
Now, consider a discretization of the computational domain using a regular finite difference grid with
N
d
points, and let
{
η
q
}
N
d
q
=1
be a set of orthonormal basis functions associated with this discretization such that
η
q
is compactly supported near the
q
th
grid point. We can then approximate the integrals that make up the
electronic ground state (cf. (11)) as
U
=
N
d
∑
q
=1
u
q
, ,S
=
N
d
∑
q
=1
s
q
, N
e
=
N
d
∑
q
=1
ρ
q
,
(16)
where
u
q
= 2
∫
σ
λg
(
λ,λ
f
;
θ
) d
μ
η
q
,η
q
(
λ
)
≈
K
∑
k
=1
w
η
q
k
λ
η
q
k
g
(
λ
η
q
k
,λ
f
;
θ
)
,
(17)
s
q
=
−
2
k
B
∫
σ
s
(
λ,λ
f
;
θ
) d
μ
η
q
,η
q
(
λ
)
≈−
2
k
B
K
∑
k
=1
w
η
q
k
s
(
λ
η
q
k
,λ
f
;
θ
)
,
(18)
ρ
q
= 2
∫
σ
g
(
λ,λ
f
;
θ
) d
μ
η
q
,η
q
(
λ
)
≈
2
K
∑
k
=1
w
η
q
k
g
(
λ
η
q
k
,λ
f
;
θ
)
,
(19)
g
(
λ,λ
f
;
θ
)
is the Fermi-Dirac function
g
(
λ,λ
f
;
θ
) =
1
1 + exp(
λ
−
λ
f
k
B
θ
)
,
(20)
and
s
(
λ,λ
f
;
θ
)
is the pointwise contribution to the electronic entropy
s
(
λ,λ
f
;
θ
) =
g
(
λ,λ
f
;
θ
) log
g
(
λ,λ
f
;
θ
) + (1
−
g
(
λ,λ
f
;
θ
)) log(1
−
g
(
λ,λ
f
;
θ
))
.
(21)
In Gauss quadrature, the weights
{
w
η
k
}
K
k
=1
are fixed apriori, and the spectral nodes
{
λ
η
k
}
K
k
=1
are treated as
unknowns. An efficient way of evaluating the spectral weights and nodes at any grid point
q
is by employing
the Lanczos iteration
b
q
k
+1
v
q
k
+1
= (
H−
a
q
k
+1
)
v
q
k
−
b
q
k
v
q
k
−
1
, k
= 0
,
1
,...,K
−
1
v
q
−
1
= 0
,v
q
0
=
η
q
,b
q
0
= 1
,
(22)
where
a
q
k
+1
= (
H
v
q
k
,v
q
k
)
,
k
= 0
,
1
,...,K
−
1
, and
b
q
k
is computed such that
||
v
q
k
||
= 1
,k
= 0
,
1
,...,K
−
1
.
We denote the Jacobi matrix
ˆ
J
q
K
as
ˆ
J
q
K
=
a
q
1
b
q
1
b
q
1
a
q
2
b
q
2
.
.
.
.
.
.
.
.
.
b
q
K
−
2
a
q
K
−
1
b
q
K
−
1
b
q
K
−
1
a
q
K
(23)
5
The nodes
{
w
η
q
k
}
K
k
=1
are calculated as the eigenvalues of
ˆ
J
q
K
, and the weights
{
λ
η
q
k
}
K
k
=1
are the squares of
the first elements of the normalized eigenvectors of
ˆ
J
q
K
.
The key observations of LSSGQ are that (i) the number of quadrature points
K
is independent of system
size and (ii) the vectors
v
q
k
remains zero except for a small region around the grid point
q
during Lanczos
iteration (22). Therefore, the evaluation of the spectral nodes at each grid point is
O
(1)
as is the evaluation of
all the electronic quantities of interest (cf. (11) and (16)). This enables us to evaluate the electronic ground
state at linear cost.
We further observe that this limited support of the vectors
v
q
k
enables the restriction of the Hamiltonian
H
to an appropriate subspace of the real-space computational domain
Ω
. This enables us to use domain
decomposition in our numerical implementation discussed in Section. 3.
Spectral Clenshaw-Curtis quadrature
While the Gauss quadrature provides a very efficient approach
to evaluating the electronic ground state since it depends only on the diagonal terms of the density matrix,
it is unable to evaluate quantities like the contribution of the non-local pseudopotential to the atomic force
that depends on the off-diagonal components. Therefore, we use the spectral Clenshaw-Curtis quadrature
[52, 43, 55].
The Hamiltonian is first scaled and shifted such that its spectrum lies in the interval
[
−
1
,
1]
ˆ
H
=
H−
ω
I
τ
(24)
where
ω
= (
λ
max
+
λ
min
)
/
2
,
τ
= (
λ
max
−
λ
min
)
/
2
, and
λ
max
,
λ
min
are the maximum and minimum
eigenvalues of
H
, respectively. Given any function
η
, we can rewrite (13) using the scaled and shifted
Hamiltonian
ˆ
H
:
(
η,γη
) =
∫
σ
g
(
H
,λ
f
;
θ
)d
μ
η,η
(
λ
) =
∫
1
−
1
g
(
ˆ
H
,
ˆ
λ
f
;
ˆ
θ
)d
μ
η,η
(
ˆ
λ
) =
∫
1
−
1
g
(
ˆ
λ,
ˆ
λ
f
;
ˆ
θ
)d
μ
η,η
(
ˆ
λ
)
(25)
where
ˆ
λ
f
= (
λ
f
−
ω
)
/τ
and
ˆ
θ
=
θ/τ
are the scaled Fermi energy and the scaled electronic temperature
respectively. In the Clenshaw-Curtis variant of the spectral quadrature, any function
f
(
ˆ
λ
)
is expanded in
terms of Chebyshev polynomials
T
η
k
(
ˆ
λ
)
as:
f
(
ˆ
λ
)
≈
K
∑
k
=0
f
(
ˆ
λ
η
k
)
T
η
k
(
ˆ
λ
)
(26)
where
K
is the degree of the expansion,
ˆ
λ
η
k
are the spectral nodes which are fixed in Clenshaw-Curtis
quadrature. The non-local component of the atomic force [43, 55] as
4 Tr (
V
J
∇
γ
)
≈
4
N
d
∑
p
=1
(
η
q
,
(
V
J
∇
g
(
H
,λ
f
;
θ
))
η
q
) = 4
N
d
∑
q
=1
(
η
q
,
(
V
J
∇
g
(
ˆ
H
,
ˆ
λ
f
;
ˆ
θ
))
η
q
)
≈
4
N
d
∑
q
=1
K
∑
k
=0
c
q
k
η
∗
q
V
J
∇
v
q
k
(27)
where
c
q
k
are constants
c
q
k
=
2
π
∫
1
−
1
g
(
λ,
ˆ
λ
f
;
ˆ
θ
)
T
k
(
λ
)
√
1
−
λ
2
d
λ ,
(28)
6
Atomic positions
•
Pseudocharge
•
Guess electron density
•
Nonlocal pseudopotential
Linearized
Hamiltonian at
grid point
Spectral Gauss
Quadrature at grid
point
Fermi energy
solve
Electron density, band structure
energy and electronic entropy at
grid point
Electrostatic
potential at
grid point
Effective potential
at grid point
Potential mixing
Convergence?
No
Spectral
Clenshaw
-
Curtis Quadrature at
grid point
Yes
Atomic forces
Self Consistent Field iteration
Figure 1: Flowchart of the Self Consistent Field iteration for solving the DFT problem.
and
v
q
k
+1
=
T
k
(
ˆ
H
)
η
q
are functions which are determined by the three term recurrence relation for Chebyshev
polynomials:
v
q
k
+1
= 2
ˆ
H
v
q
k
−
v
q
k
−
1
(29)
v
q
1
=
ˆ
H
η
q
,v
q
0
=
η
q
.
Once again, the number of quadratures is independent of the system size and therefore this evaluation scales
linearly.
3 Numerical Implementation
Fig. 1 shows the flowchart of the scheme employed to solve the DFT problem. The non-linear fixed point
problem (Eqn. 7) is solved using the self-Consistent field (SCF) iteration. Briefly, given a charge density
and electrostatic potential, we construct a linearized Hamiltonian which is used to compute the spectral
weights nodes using Lanczos iteration from which the updated electronic states including an updated charge
density and electrostatic potential are determined; the process iterates till convergence. Once the electronic
ground state is established, the atomic relaxation for the overall ground state (Eqn. 6) is achieved using the
Non-Linear Conjugate Gradient (NLCG) method [49].
We discuss further details of the key components of the implementation below. In this paper, we are
interested in isolated defects. Therefore, we consider a computational domain that is embedded in an infinite
perfect crystal. The method can be extended to other situations including periodic boundary conditions and
isolated clusters surrounded by vacuum.
Discretization
Let
Ω =
L
1
×
L
2
×
L
3
be a cuboidal computational domain aligned with the coordinate
axis, and let it be discretized using a regular
n
1
×
n
2
×
n
3
grid so that the grid spacing is
h
i
in the
i
th
7
coordinate where
L
i
=
n
i
h
i
. We index the grid points with
q
and let
K
Ω
be the set of all grid points. We
discretize the gradient and Laplace operators using finite differences on this grid [17, 16].
Hamiltonian at each grid point
In each iteration of SCF, we need to determine the spectral nodes and
weights at each grid point. As evident from (22) and (29), this requires the calculation of the action of
Hamiltonian on vectors
v
q
k
. These vectors are compactly supported around a ball centered at the
q
th
grid
point. Therefore, it is sufficient to work with the Hamiltonian
H
q
projected on to a smaller subspace around
the
q
th
grid point. Specifically, we work with a cuboidal region of side
2
R
cut
and centered at the
q
th
grid
point which we call
region of influence
. This is shown schematically in Fig. 2a. The controllable parameter
R
cut
depends on the order of the quadrature which in turn depends on material properties and electronic
smearing temperature
θ
[43, 55].
For grid points close to the edge of
Ω
, the nodal Hamiltonian can extend beyond the computational
domain for grid points which are near the boundary of
Ω
. However, since we consider problems where
our computational domain is embedded in an infinite crystal, we have to compute the Hamiltonian on an
extended region
Ω
′
of size
(
L
1
+ 2
R
cut
)
×
(
L
2
+ 2
R
cut
)
×
(
L
3
+ 2
R
cut
)
, and centered at the origin. The
values of the Hamiltonian associated with the annular
Ω
′
\
Ω
is obtained using precomputed electronic fields
(
ρ
and
φ
) for the perfect crystal.
Domain decomposition
We use domain decomposition for parallel implementation. The computational
domain is partitioned into disjoint domains,
Ω =
N
p
⋃
p
=1
Ω
p
, where
Ω
p
denotes the domain local to the
p
th
processor, and
N
p
is the total number of processors. The collection of grid points belonging to the
p
th
processor is denoted by
K
p
Ω
, where
K
Ω
=
N
p
⋃
p
=1
K
p
Ω
, and
K
p
Ω
⋂
K
q
Ω
=
∅
(null set) for
p
6
=
q
. In our
implementation, we use the DMDA class available through the Portable, Extensible Toolkit for Scientific
Computation (PETSc) [3, 4] for mesh management. The communication between processes is handled by
Message Passing Interface (MPI) libraries [19, 20].
The region of influence of an grid point
q
owned by a process
p
(i.e.
q
∈
K
p
Ω
) may extend to the
spatial regions owned by neighboring processes. In such a situation, the values of the effective potential
V
eff
from neighboring processes are communicated to the process
p
using an MPI communicator. In Fig.
2b we schematically illustrate the parallel communication pattern involved for communicating the effective
potential
V
eff
from neighboring processes. This reduces the number of MPI related calls otherwise required
for matrix vector products.
Spectral weights, nodes and electronic fields
In each SCF iteration, the spectral weights
{
w
q
k
}
K
k
=1
and
nodes
{
λ
q
k
}
K
k
=1
are computed at every grid point
q
∈
K
Ω
from the projected Hamiltonian
H
q
. Overall, this
is computationally the most expensive part of the method. However, the computation is local and with no
MPI related calls. Further, we do not explicitly store the matrices, and their multiplication with a vector is
performed in a matrix-free manner. These lead to excellent parallel efficiency.
Once the spectral weights
{
w
q
k
}
K
k
=1
and spectral nodes
{
λ
q
k
}
K
k
=1
are computed at all the grid points
q
∈
K
p
Ω
for all the processes
p
, we first solve the following equation for the Fermi energy
λ
f
:
N
e
= 2
N
p
∑
p
=1
∑
q
∈
K
p
Ω
K
∑
k
=1
w
q
k
g
(
λ
q
k
,λ
f
;
θ
)
.
(30)
We utilize Brent’s algorithm [7] for this purpose. The summation across the polynomial degree
k
and the
grid point
q
is performed locally in each processes, and the summation across the processes
p
is performed
with one global MPI communication call.
8
(a) region of influence
(b) neighbor communicator
Figure 2: (a) Region of influence associated with the
q
th
grid point and extended domain. (b) Neighbor com-
municator in domain decomposition. The orange region is the union of partitioned domains that influence
the process
p
.
Once the Fermi energy is calculated, we calculate the point-wise band structure energy
u
q
, the point-wise
entropy
s
q
and the point-wise electron density
ρ
q
, following (17), (18), (19). These are all local.
Electrostatic and effective potential
Once the electron density
ρ
is calculated at the grid points, we cal-
culate the total electrostatic potential
φ
at the grid points by solving the Poisson equation (9) on
Ω
subject
to Dirichlet boundary conditions obtained from the perfect crystal outside using the generalized minimal
residual algorithm (GMRES) [47]. Once the electrostatic potential is calculated at every grid point
q
∈
Ω
,
we calculate the effective potential
V
eff
at every grid point
V
q
eff
=
V
xc
(
ρ
q
) +
φ
q
,
(31)
where
V
xc
(
ρ
q
)
is the exchange correlation potential.
The convergence of the SCF iteration is accelerated by employing mixing schemes. In every SCF iter-
ation, we mix the effective potential
V
eff
, where we have the option of employing Anderson mixing [2],
Pulay mixing scheme [44] and its periodic [5] and restarted [42] variants.
We check the convergence of SCF iteration by calculating the normalized error in the effective potential.
Free energy
The free energy is computed once the SCF iteration has converged using a discrete version of
Eqn. 11
ˆ
F
(
R
)
≈
ˆ
F
h
(
R
) =
N
p
∑
p
=1
∑
q
∈
K
p
Ω
[
u
q
+
h
1
h
2
h
3
{
(
ε
q
xc
−
V
q
xc
)
ρ
q
+
1
2
(
b
q
−
ρ
q
)
φ
q
}
−
2
θs
q
]
+
E
h
self
,
(32)
9
where
ε
q
xc
is the contribution of the exchange correlation energy at the
q
th
grid point, and
E
h
self
is the discrete
representation of the self energy (3):
E
h
self
=
−
1
2
(
h
1
h
2
h
3
)
N
p
∑
p
=1
∑
q
∈
K
p
Ω
∑
I
b
q
I
V
q
I
(33)
In evaluating them, the sum over the grid points are carried out locally on each MPI process followed by a
global sum across all the MPI processes
p
.
Atomic forces
The final step is the computation of the atomic forces (Eqn. 12). This has two parts, the
contributions of the local pseudopotential and the non-local pseudopotentials. The contribution of the local
pseudopotential to the atomic force is calculated as
f
J,l
=
∫
Ω
∇
b
J
(
x
,
R
J
)
φ
(
x
,
R
)d
x
≈
h
1
h
2
h
3
N
p
∑
p
=1
∑
q
∈
K
p
Ω
(
∇
h
b
J
)
q
φ
q
(34)
where
∇
h
is the gradient operator in the discrete setting. The summation over the the grid points
q
is local
to every process, followed by one summation over the MPI processes.
The non-local contribution to the atomic force is calculated by employing the Clenshaw-Curtis quadra-
ture described in Section 2.2. At each grid point
q
∈
K
Ω
, we calculate the discrete Chebyshev vectors
v
k
q
using the iterative scheme (29), and the Clenshaw-Curtis quadrature weights
c
k
q
using (28) with the discrete
nodal Hamiltonian
H
q
. The non-local force is given by
f
J,nl
=
−
4 Tr(
V
J
∇
γ
)
≈−
4
N
p
∑
p
=1
∑
q
∈
K
p
Ω
e
T
q
V
q
J
∇
h
(
K
∑
k
=0
c
k
q
v
q
k
)
.
(35)
The calculation of the atomic forces scales linearly with the number of atoms.
4 Convergence and Performance
We set the electronic temperature to be
k
B
θ
= 0.03333 Ha, and use Troullier-Martins non-local pseudopo-
tentials [57]. These yield lattice parameters of
a
= 6.043 Bohr,
c
= 9.848 Bohr and
c/a
ratio of 1.629 for
hexagonal closed packed (HCP) magnesium, and
a
=19.58 Bohr for body centered cubic (BCC) Mg
17
Al
12
.
These agree with the previously reported values of the lattice constants:
a
=5.972 Bohr,
c/a
=1.61 (DFT) [10],
a
=6.066 Bohr,
c/a
=1.623 (experiment)[26, 28] for Mg, and the equilibrium lattice constant of the Mg
17
Al
12
or MgAl phase is
a
= 19.96 Bohr (DFT) [11], and
a
= 19.653 Bohr (experiment) [63] for Mg
17
Al
12
.
Convergence of spectral quadrature
We first verify the convergence with respect to spectral quadrature.
We take the same degree
K
for both the Gauss and the Clenshaw-Curtis quadrature. We study HCP Mg and
BCC MgAl. For each of these systems, we randomly perturb the atoms from the ground state to obtain the
test configuration, and use a mesh of
h
= 0
.
6
Bohr for pure Mg and
h
= 0
.
5
Bohr for MgAl.
Fig. 3a shows the convergence of the energy and Fig. 3b shows the convergence of atomic forces with
K
. The error is computed with respect to the reference that is taken to be the solutions for
K
=
240
. The
decay is not monotone because neither the free energy functional (Eqn. 1) nor the atomic forces (Eqn. 12) is
variational with respect to
K
. However, it broadly follows an exponential decay (Error
≈
Ae
−
βK
). The best
fit
β
for the various cases is shown in Table 1. The pure Mg system has a higher rate of convergence than the
10
20
40
60
80
100 120
10
-8
10
-6
10
-4
10
-2
10
0
(a) energy
20
40
60
80
100 120
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
(b) force
Figure 3: Convergence of (a) energy and (b) force as a function of degree of spectral quadrature.
Table 1: Rates of convergence
β
of the total energy, local force and the total force with the spectral quadrature
polynomial order
K
.
System
Energy
Force (local)
Force (total)
MgAl
0.070
0.075
0.036
Mg
0.109
0.088
0.042
MgAl system. Furthermore, in both the cases, the rate of convergence in total energy and the local component
of the atomic force is similar, whereas the rate of convergence in the total force, which require calculating
the non-local force component is smaller by almost a factor 2. This is so because the local component of the
atomic force and energy use only Gauss quadrature which depends on Lagrange polynomials whereas the
non-local component of atomic forces depends on Clenshaw Curtis quadrature with Chebyshev polynomials.
The former is known to be more accurate than the latter [52].
Since we require an accuracy of about
1
×
10
−
3
and
1
×
10
−
4
Ha/atom in our energy, we need between
60
and
80
for Mg, and between
60
and
90
for MgAl. Similarly, since we require an accuracy between
1
×
10
−
3
and
1
×
10
−
4
Ha/Bohr in the total force, we need between
50
and
100
for Mg and MgAl. These guide our
further calculations.
Convergence with radius of truncation
A loose upper bound of the size
R
of Lanczos vectors in (22)
is given by
2
R
≤
hn
o
K
, where
h
is the average mesh spacing, and
n
o
is the finite difference order. This
suggests a choice of
2
R
cut
=
hn
o
K
. Using the choice of parameters used for this study,
R
cut
≈
288
Bohr.
This would make the calculations extremely expensive. However, we now show that it is not necessary by
studying the error as a function of
R
cut
.
For the Mg and and MgAl systems, we use a Lanczos polynomial degree
K
L
=
80
and a Chebyshev
polynomial degree
K
C
=
100
, and vary
R
cut
from
4
.
0
to
14
.
0
Bohr. Fig. 4a and 4b shows the error in energy
and atomic force as a function of
R
cut
. The reference values of energy and atomic force is calculated using
an
R
cut
=
16
Bohr. From these figures, we observe that as
R
cut
increases, the error in energy and forces
decrease. We once again observe generally exponential decay, though it is not monotone (also see [43]).
This allows us to treat
R
cut
as a controllable approximation parameter.
11
4
6
8
10
12
14
10
-5
10
-4
10
-3
10
-2
10
-1
(a) energy
4
6
8
10
12
14
10
-5
10
-4
10
-3
10
-2
(b) force
Figure 4: Convergence of (a) energy and (b) total atomic force as a function of truncation radius
R
cut
of the
nodal Hamiltonian.
Scaling and performance
Next, we turn to the scaling and efficiency of the formulation and parallel
implementation developed in this work. We choose magnesium crystal with one aluminum solute atom,
placed at the center of the computational domain. The simulation parameters used are
h
=
0
.
6
Bohr,
R
cut
=
12
.
0
Bohr,
K
L
=
80
and
K
C
=
100
with a desired accuracy of
2
×
10
−
4
Ha/atom in energy and
2
×
10
−
4
Ha/Bohr in atomic force.
We first perform a strong scaling study with a
600
atom system, varying the number of cores from
30
to
420
. The wall times for each SCF iteration is presented in Fig 5a. The parallel efficiency on
420
relative to 30
cores is
92
.
6
percent. This data is plotted in terms of speed up in Fig. 5b. Next, we perform a weak scaling
study by selecting systems with sizes ranging from
64
atoms to
2560
atoms, while increasing the number of
processors from
30
to
900
. We choose these such that the number of atoms per MPI process is between two
and three. Fig 5c shows that the variation of CPU time for one SCF iteration versus the number of atoms is
linear (
≈ O
(
N
)
). Fig. 5d shows that the memory required also scales linearly with respect to the number
of atoms. All of this shows excellent scaling of our algorithm. This is due in part to restricting much of the
parallel communication to the MPI processes that are neighbors, and keeping the global communication to a
minimum.
We note that the spectral quadrature step accounts for greater than
98
percent of the total time in each SCF
iteration. The prefactor of spectral quadrature can be significantly reduced by incorporating reduced basis
methods such as Discrete Discontinious Basis Projection ([62]). Further, the number of SCF iterations to
achieve a fixed target SCF error increases with system size in metallic systems due to charge sloshing [24].
The introduction of real space preconditioning schemes [50, 29] is likely to reduce this for large metallic
systems.
5 Defects in magnesium
We now study isolated point defects and defect pairs in magnesium. Of particular interest are the formation
energy of isolated defects and binding energy of defect pairs. Let
E
(
M,n,m
)
be the energy of a crystal
with
M
solvent atoms,
n
solute atoms and
m
vacancies. The formation energy of a defect cluster with
n
solute atoms and
m
vacancies is the excess energy of the crystal with defect cluster over the those of perfect
12
10
2
10
3
10
1
10
2
10
3
(a) strong scaling
100
200
300
400
500
2
4
6
8
10
12
14
(b) speed up
10
2
10
3
10
1
10
2
10
3
(c) weak scaling
10
2
10
3
10
2
10
3
(d) memory scaling
Figure 5: Scaling and performance of the framework. The dash dot line is the ideal linear scaling behavior.
crystals of the host and solute:
E
f
n,m
=
E
(
M
−
n
−
m,n,m
)
−
M
−
n
−
m
M
E
(
M,
0
,
0)
−
n
̄
E
,
(36)
where
̄
E
is the energy per atom of the solute in its perfect crystalline state. Note that when we have an
isolated solute and no vacancies, the formation energy
E
f
1
,
0
is referred to as dilute impurity energy. Further,
the binding energy of this cluster is
E
b
n,m
=
nE
f
1
,
0
+
mE
f
0
,
1
−
E
f
n,m
.
(37)
Note that the defect is stable when the formation energy is negative, and the defect cluster has favorable
binding when the binding energy is positive.
Isolated point defect
We calculate the formation energy of a monovacancy for various computational
domain size from
63
atoms to
1151
atoms, and the results are shown in Fig. 6a. We observe that the
formation energy strongly depends on cell size, and converges at cell sizes of approximately
1000
atoms
13
10
2
10
3
0.65
0.70
0.75
0.80
0.85
(a) Formation energy of a vacancy
(b) Electron density near a vacancy
10
2
10
3
-4.75
-4.70
-4.65
-4.60
-4.55
-4.50
-4.45
-4.40
-4.35
(c) Dilute impurity energy of Al
(d) Electron density near a Al solute
Figure 6: Vacancy (a,b) and an aluminum solute (c,d) in magnesium. The calculated vacancy formation
energy (a) and dilute impurity energy (c) for various computational cell size and computed electron density
along the basal plane for a vacancy (b) and solute (d). The ‘*’ marks in (b,d) indicate the projected positions
of the atoms in the basal plane at a height
c/
2
above and below this plane.
to
0
.
8456
eV. This is broadly in agreement with values reported in the literature: calculated values
0
.
779
-
0
.
768
eV using local pseudo-potential and a coarse grained approach with
1024
to
1
billion atoms [40] and
measured values of
0
.
58
-
0
.
90
eV [31, 22, 58, 60]. Fig. 6b shows the electron density on the basal plane
in the vicinity of the vacancy. Unsurprisingly, it is depleted at the vacancy. An interesting feature is that the
electron density does not display the reflection symmetry of the basal plane (e.g. about the red dashed line).
This is because the three dimensional crystal breaks this symmetry at the plane at a height
c/
2
above and
below this plane. This is emphasized by indicating the atoms on this upper and lower planes with a ‘*’ in the
figure. This observation plays a role in binding.
Next we compute the dilute impurity energy of an aluminum solute atom, and this is shown in Fig. 6c.
We again observe that this energy depends on the cell size and converges at a few hundred atoms. The
electron density on the basal plane is shown Fig. 6d. The electron density if elevated near the Al solute and
the distribution is more symmetric.
14