of 17
Special Issue Paper
The International Journal of High
Performance Computing Applications
2023, Vol. 37(1) 28
44
© The Author(s) 2022
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/10943420221128233
journals.sagepub.com/home/hpc
#COVIDisAirborne: AI-enabled multiscale
computational microscopy of delta SARS-
CoV-2 in a respiratory aerosol
Abigail Dommer
1,
, Lorenzo Casalino
1,
, Fiona Kearns
1,
, Mia Rosenfeld
1
,
Nicholas Wauer
1
, Surl-Hee Ahn
1
, John Russo
2
,So
fi
a Oliveira
3
, Clare Morris
1
,
Anthony Bogetti
4
, Anda Trifan
5,6
, Alexander Brace
5,7
, Terra Sztain
1,8
, Austin Clyde
5,7
,
Heng Ma
5
, Chakra Chennubhotla
4
, Hyungro Lee
9
, Matteo Turilli
9
, Syma Khalid
10
,
Teresa Tamayo-Mendoza
11
, Matthew Welborn
11
, Anders Christensen
11
,
Daniel GA Smith
11
, Zhuoran Qiao
12
, Sai K Sirumalla
11
, Michael O
Connor
11
,
Frederick Manby
11
, Anima Anandkumar
12,13
, David Hardy
6
, James Phillips
6
,
Abraham Stern
13
, Josh Romero
13
, David Clark
13
, Mitchell Dorrell
14
, Tom Maiden
14
,
Lei Huang
15
, John McCalpin
15
, Christopher Woods
3
, Alan Gray
13
, Matt Williams
3
,
Bryan Barker
16
, Harinda Rajapaksha
16
, Richard Pitts
16
, Tom Gibbs
13
, John Stone
6,13
,
Daniel M. Zuckerman
2
, Adrian J. Mulholland
3
, Thomas Miller III
11,12
, Shantenu Jha
9
,
Arvind Ramanathan
5
, Lillian Chong
4
and Rommie E Amaro
1
Abstract
We seek to completely revise current models of airborne transmissio
n of respiratory viruses by provi
ding never-before-seen atomic-
level views of the SARS-CoV-2 virus within a respiratory aerosol.
Our work dramatically extends the capabilities of multiscale
computational microscopy to address the signi
fi
cant gaps that exist in current experimental m
ethods, which are limited in their ability
to interrogate aerosols at the at
omic/molecular level and thus obscure our understa
nding of airborne transmission. We demonstrate
how our integrated data-driven platform pro
vides a new way of exploring the compositio
n, structure, and dynamics of aerosols and
aerosolized viruses, while driving simulation method developme
nt along several important axes.
We present a series of initial
scienti
fi
c discoveries for the SARS-CoV-2 Delta v
ariant, noting that the full scienti
fi
c impact of this work has yet to be realized.
1
UC San Diego, La Jolla, CA, USA
2
Oregon Health & Science University, Portland, OR, USA
3
University of Bristol, UK
4
University of Pittsburgh, Pittsburgh, PA, USA
5
Argonne National Laboratory, Lemont, IL, USA
6
University of Illinois at Urbana-Champaign, Urbana, IL, USA
7
University of Chicago, Chicago, IL, USA
8
Freie Universitat Berlin
9
Brookhaven National Lab and Rutgers University
10
University of Oxford, UK
11
Entos, Inc., San Diego, CA, USA
12
California Institute of Technology, Pasadena, CA, USA
13
NVIDIA Corp, Santa Clara, CA, USA
14
Pittsburgh Supercomputing Center, Pittsburgh, PA, USA
15
Texas Advanced Computing Center, Austin, TX, USA
16
Oracle for Research, Austin, TX, USA
Joint
fi
rst authors
Corresponding author:
Daniel Zuckerman, Adrian Mulholland, Thomas Miller III, Shantenu Jha, Arvind Ramanathan, Lillian Chong, Rommie E. Amaro.
Email:
ramaro@ucsd.edu
Keywords
molecular dynamics, deep learning, multiscale simulation, weighted ensemble, computational virology, SARS-CoV-2,
aerosols, COVID-19, HPC, AI, GPU, Delta
Justi
fi
cation
We develop a novel HPC-enabled multiscale research
framework to study aerosolized viruses and the full com-
plexity of species that comprise them. We present tech-
nological and methodological advances that bridge time and
length scales from electronic structure through whole
aerosol particle morphology and dynamics.
Performance attributes
Performance attribute
Our submission
Category of achievement
Scalability, Time-to-solution
Type of method used
Explicit, Deep Learning
Results reported on the basis of Whole application including I/O
Precision reported
Mixed Precision
System scale
Measured on full system
Measurement mechanism
Hardware performance
counters
Application timers
Performance Modeling
Overview of the problem.
Respiratory pathogens, such as
SARS-CoV-2 and in
fl
uenza, are the cause of signi
fi
cant
morbidity and mortality worldwide. These respiratory
pathogens are spread by virus-laden aerosols and droplets
that are produced in an infected person, exhaled, and
transported through the environment (
Wang et al., 2021
)
(
Figure 1
). Medical dogma has long focused on droplets as
the main transmission route for respiratory viruses, where
either a person has contact with an infected surface (fomites)
or direct droplet transmission by close contact with an
infected individual. However, as we continue to observe
with SARS-CoV-2, airborne transmission also plays a
signi
fi
cant role in spreading disease. We know this from
various super spreader events, for example, during a choir
rehearsal (
Miller et al., 2021
). Intervention and mitigation
decisions, such as the relative importance of surface
cleaning or whether and when to wear a mask, have un-
fortunately hinged on a weak understanding of aerosol
transmission, to the detriment of public health.
A central challenge to understanding airborne trans-
mission has been the inability of experimental science to
reliably probe the structure and dynamics of viruses once
they are inside respiratory aerosol particles. Single particle
experimental methods have poor resolution for smaller
particles (<1 micron) and are prone to sample destruction
during collection. Airborne viruses are present in low
concentrations in the air and are similarly prone to viral
inactivation during sampling. In addition, studies of the
initial infection event, for example, in the deep lung, are
limited in their ability to provide a detailed understanding of
the myriad of molecular interactions and dynamics taking
place in situ. Altogether, these knowledge gaps hamper our
collective ability to understand mechanisms of infection and
develop novel effective antivirals, as well as prevent us from
developing concrete, science-driven mitigation measures
(e.g., masking and ventilation protocols).
Here, we aim to reconceptualize current models of air-
borne transmission of respiratory viruses by providing
never-before-seen views of viruses within aerosols. Our
approach relies on the use of all-atom molecular dynamics
(MD) simulations as a multiscale
computational micro-
scope.
MD simulations can synthesize multiple types of
biological data (e.g., multiresolution structural datasets,
glycomics, lipidomics, etc.) into cohesive, biologically
accurate
structural models. Once created, we then ap-
proximate the model down to its many atoms, creating
trajectories of its time dependent dynamics under cell-like
(or in this case, aerosol-like) conditions. Critically, MD
simulations are more than just
pretty movies.
MD
equations are solved in a theoretically rigorous manner,
allowing us to compute experimentally testable macro-
scopic observables from time-averaged microscopic prop-
erties. What this means is that we can directly connect MD
simulations with experiments, each validating and pro-
viding testable hypotheses to the other, which is the real
power of the approach. An ongoing challenge to the suc-
cessful application of such methods, however, is the need
for technological and methodological advances that make it
possible to access length scales relevant to the study of
large, biologically complex systems (spanning nanometers
to
one micron in size) and, correspondingly, longer
timescales (microseconds to seconds).
Such challenges and opportunities manifest in the study
of aerosolized viruses. Aerosols are generally de
fi
ned as
being less than 5 microns in diameter, able to
fl
oat in the air
for hours, travel signi
fi
cant distances (i.e., can
fi
ll a room,
like cigarette smoke), and be inhaled. Fine aerosols < 1
micron in size can stay in the air for over 12 h and are
enriched with viral particles (
Fennelly 2020
;
Coleman et al.,
2021
). Our work focuses on these
fi
ner aerosols that travel
deeper into the respiratory tract. Several studies provide the
molecular recipes necessary to reconstitute respiratory
aerosols according to their actual biologically relevant
Dommer et al.
29
composition (
Vejerano and Marr 2018
;
Walker et al., 2021
).
These aerosols can contain lipids, cholesterol, albumin
(protein), various mono- and di-valent salts, mucins, other
surfactants, and water (
Figure 1
). Simulations of aerosolized
viruses embody a novel framework for the study of aerosols:
they will allow us and others to tune different species,
relative humidity, ion concentrations, etc. to match exper-
iments that can directly and indirectly connect to and inform
our simulations, as well as test hypotheses. Some of the
species under study here, for example, mucins, have not yet
been structurally characterized or explored with simulations
and thus the models we generate are expected to have
impact beyond their roles in aerosols.
In addition to varying aerosol composition and size, the
viruses themselves can be modi
fi
ed to re
fl
ect new variants
of concern, where such mutations may affect interactions
with particular species in the aerosol that might affect its
structural dynamics and/or viability. The virion developed
here is the Delta variant (B.1.617.2 lineage) of SARS-CoV-
2(
Figure 2
), which presents a careful integration of multiple
biological datasets: (1) a complete viral envelope with re-
alistic membrane composition, (2) fully glycosylated full-
length spike proteins integrating 3D structural coordinates
from multiple cryoelectron microscopy (cryoEM) studies
(
McCallum et al., 2021
;
Wrapp et al., 2020
;
Walls et al.,
2020
;
Bangaru et al., 2020
) (3) all biologically known
features (post-translational modi
fi
cations, palmitoylation,
etc.), (4) any other known membrane proteins (e.g. the
envelope (E) and membrane (M) proteins), and (5) virion
size and patterning taken directly from cryoelectron to-
mography (cryoET). Each of the individual components of
the virus are built up before being integrated into the
composite virion, and thus represent useful molecular-scale
scienti
fi
c contributions in their own right (
Casalino et al.,
2020
;
Sztain et al., 2021
).
Altogether in this work, we dramatically extend the
capabilities of data-driven, multiscale computational mi-
croscopy to provide a new way of exploring the compo-
sition, structure, and dynamics of respiratory aerosols.
While a seemingly limitless number of putative hypotheses
could result from these investigations, the
fi
rst set of
questions we expect to answer are:
How does the virus exist
within a droplet of the same order of magnitude in size,
without being affected by the air-water interface, which is
known to destroy molecular structure
(
D
Imprima et al.
2019
)
? How does the biochemical composition of the
droplet, including pH, affect the structural dynamics of the
virus? Are there species within the aerosols that
buffer
the viral structure from damage, and are there particular
conditions under which the impact of those species
changes?
Our simulations can also provide speci
fi
c pa-
rameters that can be included in physical models of aerosols,
Figure 1.
Overall schematic depicting the construction and multiscale simulations of Delta SARS-CoV-2 in a respiratory aerosol. (N.B.:
The size of di-valent cations has been increased for visibility.)
30
The International Journal of High Performance Computing Applications 37(1)
which still assume a simple water or water-salt composition
even though it is well known that such models, for example,
using kappa-Kohler theory, break down signi
fi
cantly as the
molecular species diversify (
Petters and Kreidenweis 2007
).
Current state of the art
Current experimental methods are unable to directly in-
terrogate the atomic-level structure and dynamics of viruses
and other molecules within aerosols. Here we showcase
computational microscopy as a powerful tool capable to
overcome these signi
fi
cant experimental limitations. We
present the major elements of our multiscale computational
microscope and how they come together in an integrated
manner to enable the study of aerosols across multiple
scales of resolution. We demonstrate the impact such
methods can bring to bear on scienti
fi
c challenges that until
now have been intractable, and present a series of new
scienti
fi
c discoveries for SARS-CoV-2.
Parallel molecular dynamics
All-atom molecular dynamics simulation has emerged as an
increasingly powerful tool for understanding the molecular
mechanisms underlying biophysical behaviors in complex
systems. Leading simulation engines, NAMD (
Phillips
et al., 2020
), AMBER (
Case et al. [n. d.]
), and GRO-
MACS (
P
́
all et al., 2020
), are broadly useful, with each
providing unique strengths in terms of speci
fi
c methods or
capabilities as required to address a particular biological
question, and in terms of their support for particular HPC
hardware platforms. Within the multiscale computational
microscopy platform developed here, we show how each
of these different codes contributes different elements to
the overall framework, oftentimes utilizing different
computing modalities/architectures, while simultaneously
extending on state-of-the-art for each. Structure building,
simulation preparation, visualization, and post hoc tra-
jectory analysis are performed using VMD on both local
workstations and remote HPC resources, enabling mod-
eling of the molecular systems studied herein (
Humphrey
et al., 1996
;
Stone et al., 2013a
,
b
,
2016b
;
Sener et al.,
2021
). We show how further development of each of these
codes, considered together within the larger-scale collec-
tive framework, enables the study of SARS-CoV-2 in a
wholly novel manner, with extension to numerous other
complex systems and diseases.
AI-enhanced WE simulations
Because the virulence of the Delta variant of SARS-CoV-2
may be partly attributable to spike protein (S) opening, it is
of pressing interest to characterize the mechanism and ki-
netics of the process. Although S-opening in principle can
be studied via conventional MD simulations, in practice the
system complexity and timescales make this wholly in-
tractable. Splitting strategies that periodically replicate
promising MD trajectories, among them the weighted
Figure 2.
Individual protein components of the SARS-CoV-2 Delta virion. The spike is shown with the surface in cyan and with Delta
s
mutated residues and deletion sites highlighted in pink and yellow, respectively. Glycans attached to the spike are shown in blue. The E
protein is shown in yellow and the M-protein is shown in silver and white. Visualized with VMD.
Dommer et al.
31
ensemble (WE) method (
Huber and Kim 1996
;
Zuckerman
and Chong 2017
), have enabled simulations of the spike
opening of WT SARS-CoV-2 (
Sztain et al., 2021
;
Zimmerman et al., 2021
). WE simulations can be orders of
magnitude more ef
fi
cient than conventional MD in gener-
ating pathways and rate constants for rare events (e.g.
protein folding (
Adhikari et al., 2019
) and binding (
Saglam
and Chong 2019
)). The WESTPA software for running WE
(
Zwier et al., 2015
) is well-suited for high-performance
computing with nearly perfect CPU/GPU scaling. The
software is interoperable with any dynamics engine, in-
cluding the GPU-accelerated AMBER dynamics engine
(
Salomon-Ferrer et al., 2013
) that is used here. As shown
below, major upgrades to WESTPA (v. 2.0) have enabled a
dramatic demonstration of spike opening in the Delta
variant (
Figures 5
and
6
) and exponentially improved
analysis of spike-opening kinetics (
Russo et al., 2022
).
The integration of AI techniques with WE can further
enhance the ef
fi
ciency of sampling rare events (
Noe 2020
;
Brace et al., 2021b
;
Casalino et al., 2021
). One frontier area
couples unsupervised linear and non-linear dimensionality
reduction methods to identify collective variables/progress
coordinates in high-dimensional molecular systems
(
Bhowmik et al., 2018
;
Clyde et al., 2021
). Such methods
may be well suited for analyzing the aerosolized virus.
Integrating these approaches with WE simulations is ad-
vantageous in sampling the closed
open transitions in the
Delta S landscape (
Figure 5
) as these unsupervised AI
approaches automatically stratify progress coordinates
(
Figure 5(D)
).
Dynamical non-equilibrium MD
Aerosols rapidly acidify during
fl
ight via reactive uptake of
atmospheric gases, which is likely to impact the opening/
closing of the S protein (
Vejerano and Marr 2018
;
Warwicker 2021
). Here, we describe the extension of dy-
namical non-equilibrium MD (D-NEMD) (
Ciccotti and
Ferrario 2016
) to investigate pH effects on the Delta S.
D-NEMD simulations (
Ciccotti and Ferrario 2016
) are
emerging as a useful technique for identifying allosteric
effects and communication pathways in proteins (
Galdadas
et al., 2021
;
Oliveira et al., 2019
), including recently
identifying effects of linoleic acid in the WT spike (
Oliveira
et al., 2021b
). This approach complements equilibrium MD
simulations, which provide a distribution of con
fi
gurations
as starting points for an ensemble of short non-equilibrium
trajectories under the effect of the external perturbation. The
response of the protein to the perturbation introduced can
then be determined using the Kubo-Onsager relation
(
Oliveira et al., 2021a
;
Ciccotti and Ferrario 2016
) by di-
rectly tracking the change in atomic positions between the
equilibrium and non-equilibrium simulations at equivalent
points in time (
Oliveira et al., 2021a
).
OrbNet
Ca
2+
ions are known to play a key role in mucin aggregation
in epithelial tissues (
Hughes et al., 2019
). Our RAV sim-
ulations would be an ideal case-study to probe such com-
plex interactions between Ca
2+
, mucins, and the SARS-
CoV-2 virion in aerosols. However, Ca
2+
binding energies
can be dif
fi
cult to capture accurately due to electronic
dispersion and polarization, terms which are not typically
modeled in classical mechanical force
fi
elds. Quantum
mechanical (QM) methods are uniquely suited to capture
these subtle interactions. Thus, we set out to estimate the
correlation in Ca
2+
binding energies between
CHARMM36m and quantum mechanical estimates enabled
via AI with OrbNet. Calculation of energies with suf
fi
cient
accuracy in biological systems can, in many cases, be ad-
equately described with density functional theory (DFT).
However, its high cost limits the applicability of DFT in
comparison to
fi
xed charge force
fi
elds. To capture quantum
quality energetics at a fraction of the computational ex-
pense, we employ a novel approach (OrbNet) based on the
featurization of molecules in terms of symmetry-adapted
atomic orbitals and the use of graph neural network methods
for deep learning quantum-mechanical properties (
Qiao
et al., 2020
). Our method outperforms existing methods
in terms of its training ef
fi
ciency and transferable accuracy
across diverse molecular systems, opening a new pathway
for replacing DFT in large-scale scienti
fi
c applications such
as those explored here. (
Christensen et al., 2021
).
Innovations realized
Construction and simulation of SARS-CoV-2 in a respiratory
aerosol.
Our approach to simulating the entire aerosol
follows a composite framework wherein each of the indi-
vidual molecular pieces is re
fi
ned and simulated on its own
before it is incorporated into the composite model. Simu-
lations of each of the components are useful in their own
right, and often serve as the basis for biochemical and
biophysical validation and experiments (
Casalino et al.,
2020
).
Throughout, we refer to the original circulating SARS-
CoV-2 strain as
WT,
whereas all SARS-CoV-2 proteins
constructed in this work represent the Delta variant
(
Figure 2
). All simulated membranes re
fl
ect mammalian
ER-Golgi intermediate compartment (ERGIC) mimetic
lipid compositions. VMD (
Humphrey et al., 1996
;
Stone
et al., 2016a
), psfgen (
Phillips et al., 2005
), and CHARMM-
GUI (
Park et al., 2019
) were used for construction and
parameterization. Topologies and parameters for simula-
tions were taken from CHARMM36m all-atom additive
force
fi
elds (
Guvench et al., 2009
;
Huang and Mackerell
2013
;
Huang et al., 2017
;
Klauda et al., 2010
;
Beglov and
Roux 1994
;
Han et al., 2018
;
Venable et al., 2013
). NAMD
32
The International Journal of High Performance Computing Applications 37(1)
was used to perform MD simulations (
Phillips et al., 2020
),
adopting similar settings and protocols as in (
Casalino et al.,
2020
). All systems underwent solvation, charge neutrali-
zation, minimization, heating, and equilibration prior to
production runs. Refer to
Table 1
for Abbreviations, PBC
dimensions, total number of atoms, and total equilibration
times for each system of interest.
Simulating the SARS-CoV-2 structural proteins.
Fully glyco-
sylated Delta spike (S) structures in open and closed con-
formations were built based on WT constructs from
Casalino et al. (
Casalino et al., 2020
) with the following
mutations: T19R, T95I, G142D, E156G,
Δ
157
158,
L452R, T478K, D614G, P681R, and D950N (
McCallum
et al., 2021
;
Kannan et al., 2021
). Higher resolved regions
were grafted from PDB 7JJI (
Bangaru et al., 2020
). Ad-
ditionally, coordinates of residues 128
167
accounting
for a drastic conformational change seen in the Delta variant
S
graciously made available to us by the Veesler Lab,
were similarly grafted onto our constructs (
McCallum et al.,
2021
). Finally, the S proteins were glycosylated following
work by Casalino et al. (
Casalino et al., 2020
). By incor-
porating the Veesler Lab
s bleeding-edge structure
(
McCallum et al., 2021
) and highly resolved regions from
7JJI (
Bangaru et al., 2020
), our models represent the most
complete and accurate structures of the Delta S to date. The
S proteins were inserted into membrane patches and
equilibrated for 3 × 110 ns. For non-equilibrium and
weighted ensemble simulations, a closed S head (SH,
residues 13
1140) was constructed by removing the stalk
from the full-length closed S structure, then resolvated,
neutralized, minimized, and subsequently passed to WE and
D-NEMD teams. The M-protein was built from a structure
graciously provided by the Feig Lab (paper in prep). The
model was inserted into a membrane patch and equilibrated
for 700 ns. RMSD-based clustering was used to select a
stable starting M-protein conformation. From the equili-
brated and clustered M structure, VMD
s Mutator plugin
(
Humphrey et al., 1996
) was used to incorporate the I82T
mutation onto each M monomer to arrive at the Delta variant
M. To construct the most complete E protein model to-date,
the structure was patched together by resolving incomplete
PDBs 5X29 (
Surya et al., 2018
), 7K3G (
Mandala et al.,
2020
) and 7M4R (
Chai et al., 2021
). To do so, the trans-
membrane domain (residues 8
38) from 7K3G were
aligned to the N-terminal domain (residues 1
7) and resi-
dues 39 to 68 of 5X29 and residues 69 to 75 of 7M4R by
their C
α
atoms. E was then inserted into a membrane patch
and equilibrated for 40 ns.
Constructing the SARS-CoV-2 Delta virion.
The SARS-CoV-2
Delta virion (V) model was constructed following Casalino
et al. (
Casalino et al., 2021
) using CHARMM-GUI (
Lee
et al., 2016
), LipidWrapper (
Durrant and Amaro 2014
), and
Blender (
Blender Online Community 2020
), using a 350
̊
A
lipid bilayer with an equilibrium area per lipid of 63
̊
A
2
and
a 100 nm diameter Blender icospherical surface mesh
(
Turonova et al., 2020
). The resulting lipid membrane was
solvated in a 1100
̊
A
3
waterbox and subjected to four rounds
of equilibration and patching (
Casalino et al., 2021
). 360
M
dimers and 4 E pentamers were then tiled onto the surface,
followed by random placement of 29 full-length S proteins
Table 1.
Summary of all systems constructed in this work. See
Figure 3
for illustration of aerosol construction.
a
systems
b
Abb
c
(
̊
̊
̊
A)
d
N
a
e
(ns)
f
M dimers
M
125 × 125 × 124
164,741
700
f
E pentamers
E
123 × 125 × 102
136,775
41
Spikes
f
(Open)
S
206 × 200 × 410
1,692,444
330
f
(Closed)
S
204 × 202 × 400
1,658,224
330
g
(Closed, head)
SH
172 × 184 × 206
615,593
73
μ
s
Mucins
f
short mucin 1
m
1
123 × 104 × 72
87,076
25
f
short mucin 2
m
2
120 × 101 × 72
82,155
25
f
long mucin 1
m
3
810 × 104 × 115
931,778
23
f
long mucin 2
m
4
904 × 106 × 109
997,029
15
f
long mucin 3
m
5
860 × 111 × 113
1,040,215
18
f
S+m1/m2+ALB
SMA
227 × 229 × 433
2,156,689
840
f
Virion
V
1460 × 1460 × 1460
305,326,834
41
f
Resp.Aero.+Vir
RAV
2834 × 2820 × 2828
1,016,813,441
2.42
Total FLOPS
2.4 ZFLOPS
a
M, E, S, SH, and V models represent SARS-CoV-2 Delta strain.
b
Abbreviations used throughout document.
c
Periodic boundary dimensions.
d
Total number
of atoms.
e
Total aggregate simulation time, including heating and equilibration runs.
f
Simulated with NAMD.
g
Simulated with NAMD, AMBER, and
GROMACS.
Dommer et al.
33
(9 open, 20 closed) according to experimentally observed S
protein density (
Ke et al., 2020
). M and E proteins were
oriented with intravirion C-termini. After solvation in a
1460
̊
A waterbox, the complete V model tallied >305
million atoms (
Table 1
). V was equilibrated for 41 ns prior
to placement in the respiratory aerosol (RA) model. The
equilibrated membrane was 90 nm in diameter and remains
in close structural agreement with the experimental studies
(
Ke et al., 2020
).
Building and simulating the respiratory aerosol.
Respiratory
aerosols contain a complex mixture of chemical and bio-
logical species. We constructed a respiratory aerosol (RA)
fl
uid based on a composition from arti
fi
cial saliva and
surrogate deep lung
fl
uid recipes (
Walker et al., 2021
). This
recipe includes 0.7 m
M
DPPG, 6.5 m
M
DPPC, 0.3 m
M
cholesterol, 1.4 m
M
Ca
2+
, 0.8 m
M
Mg
2+
, and 142 m
M
Na
+
(
Vejerano and Marr 2018
;
Walker et al., 2021
), human
serum albumin (ALB) protein, and a composition of mucins
(
Figure 3
). Mucins are long polymer-like structures that are
decorated by dense, heterogeneous, and complex regions of
O-glycans. This work represents the
fi
rst of its kind as, due
to their complexity, the O-glycosylated regions of mucins
have never before been constructed for molecular simula-
tions. Two short (m
1
,m
2
,
5 nm) and three long (m
3
,m
4
,m
5
55 nm) mucin models were constructed following known
experimental compositions of protein and glycosylation
sequences (
Symmes et al., 2018
;
Hughes et al., 2019
;
Markovetz et al., 2019
;
Thomsson et al., 2005
;
Mariethoz
et al., 2018
) with ROSETTA (
Raveh et al., 2010
) and
CHARMM-GUI Glycan Modeller (
Jo et al., 2011
). Mucin
models (short and long) were solvated, neutralized by
charge matching with Ca
2+
ions, minimized, and equili-
brated for 15
25 ns each (
Table 1
). Human serum albumin
(ALB), which is also found in respiratory aerosols, was
constructed from PDB 1AO6 (
Sugio et al., 1999
). ALB was
solvated, neutralized, minimized, and equilibrated for 7ns.
Equilibrated structures of ALB and the three long mucins
were used in construction of the RAV with m3+m4+m5
added at 6 g/mol and ALB at 4.4 g/mol.
Constructing the respiratory aerosolized virion model
A 100 nm cubic box with the RA
fl
uid recipe speci
fi
ed
above was built with PACKMOL (
Mart
́
ı
nez et al., 2009
),
minimized, equilibrated brie
fl
y on TACC Frontera, then
replicated to form a 300 nm cube. The RA box was then
carved into a 270 nm diameter sphere. To make space for the
placement of V within the RA, a spherical selection with
volume corresponding to that of the V membrane + S crown
(radius 734
̊
A) was deleted from the center of the RA. The
fi
nal equilibrated V model, including surrounding equili-
brated waters and ions (733
̊
A radius), was translated into
the RA. Atom clashes were resolved using a 1.2
̊
A cutoff.
Hydrogen mass repartitioning (
Hopkins et al., 2015
) was
applied to the structure to improve performance. The
simulation box was increased to 2800
̊
A per side to provide
a 100
̊
A vacuum atmospheric buffer. The RAV simulation
was conducted in an NVT ensemble with a 4 fs timestep.
After minimizing, the RAV was heated to 298 K with
0.1 kcal/mol
̊
A
2
restraints on the viral lipid headgroups, then
equilibrated for 1.5 ns. Finally, a cross-section of the RAV
model
including and open S, m1/m2, and ALB (called the
SMA system)
was constructed with PACKMOL to
closely observe atomic scale interactions within the RAV
model (
Figure 4
).
Parameter evaluation with OrbNet
Comparison to quantum methods reveals signi
fi
cant po-
larization effects, and shows that there is opportunity to
improve the accuracy of
fi
xed charge force
fi
elds. For the
large system sizes associated with solvated Ca
2+
-protein
interaction motifs (over 1000 atoms, even in aggressively
truncated systems), conventional quantum mechanics
methods like density functional theory (DFT) are imprac-
tical for analyzing a statistically signi
fi
cant ensemble of
distinct con
fi
gurations (see discussion in Performance
Results). In contrast, OrbNet allows for DFT accuracy with
over 1000-fold speedup, providing a useful method for
benchmarking and re
fi
ning the force
fi
eld simulation pa-
rameters with quantum accuracy (
Christensen et al., 2021
).
To con
fi
rm the accuracy of OrbNet versus DFT (
ω
B97X-D/
def2-TZVP), the inset of
Figure 4(E)
correlates the two
methods for the Ca
2+
-binding energy in a benchmark dataset
Figure 3.
Image of RAV with relative mass ratios of RA molecular
components represented in the colorbar. Water content is
dependent on the relative humidity of the environment and is thus
omitted from the molecular ratios.
34
The International Journal of High Performance Computing Applications 37(1)
of small Ca
2+
-peptide complexes (
Hu et al., 2021
). The
excellent correlation of OrbNet and DFT for the present use
case is clear from the inset
fi
gure; six datapoints were re-
moved from this plot on the basis of a diagnostic applied to
the semi-empirical GFN-xTB solution used for feature
generation of OrbNet (
Christensen et al., 2021
).
Figure 4
presents a comparison of the validated OrbNet
method with the CHARMM36m force
fi
eld for 1800
snapshots taken from the SMA MD simulations. At each
snapshot, a subsystem containing a solvated Ca
2+
-protein
complex was extracted (
Figure 4(E)
), with protein bonds
capped by hydrogens. For both OrbNet and the force
fi
eld,
the Ca
2+
-binding energy was computed and shown in the
correlation plot. Lack of correlation between OrbNet and
the force
fi
eld identi
fi
es important polarization effects,
absent in a
fi
xed charge description. Similarly, the steep
slope of the best-
fi
t line in
Figure 4(E)
re
fl
ects the fact that
some of the con
fi
gurations sampled using MD with the
CHARMM36m force
fi
eld are relatively high in energy
according to the more accurate OrbNet potential. This
approach allows us to test and quantify limitations of
empirical force
fi
elds, such as lack of electronic
polarization.
The practicality of OrbNet for these simulation snapshots
with 1000+ atoms offers a straightforward multiscale
strategy for re
fi
ning the accuracy of the CHARMM36m
force
fi
eld. By optimizing the partial charges and other force
fi
eld parameters, improved correlation with OrbNet for the
subtle Ca
2+
-protein interactions could be achieved, leading
to near-quantum accuracy simulations with improved
con
fi
gurational sampling. The calculations presented here
present a proof-of-concept of this iterative strategy.
AI-WE simulations of delta spike opening
While our previous WE simulations of the WT SARS-CoV-
2 S-opening (
Sztain et al., 2021
) were notable in generating
pathways for a seconds-timescale process of a massive
system, we have made two critical technological ad-
vancements in the WESTPA software that greatly enhance
the ef
fi
ciency and analysis of WE simulations. These ad-
vances enabled striking observations of Delta variant S
opening (
Figures 5
and
6
). First, in contrast to prior manual
bins for controlling trajectory replication, we have devel-
oped automated and adaptive binning that enables more
ef
fi
cient surmounting of large barriers via early identi
fi
-
cation of
bottleneck
regions (
Torrillo et al., 2021
). Sec-
ond, we have parallelized, memory-optimized, and
implemented data streaming for the history-augmented
Markov state model (haMSM) analysis scheme
(
Copperman and Zuckerman 2020
) to enable application to
the TB-scale S-opening datasets. The haMSM approach
Figure 4.
SMA system captured with multiscale modeling from classical MD to AI-enabled quantum mechanics. For all panels: S protein
shown in cyan, S glycans in blue, m
1
/m
2
shown in red, ALB in orange, Ca
2+
in yellow spheres, viral membrane in purple. A) Interactions
between mucins and S facilitated by glycans and Ca
2+
. B) Snapshot from SMA simulations. C) Example Ca
2+
binding site from SMA
simulations (1800 sites, each 1000+ atoms) used for AI-enabled quantum mechanical estimates from OrbNet Sky. D) Quanti
fi
cation of
contacts between S and mucin from SMA simulations. E) OrbNet Sky energies versus CHARMM36m energies for each sub-selected
system, colored by total number of atoms. Performance of OrbNet Sky versus DFT in subplot (
ω
B97x-D3/def-TZVP,
R
2
=0.99, for 17
systems of peptides chelating Ca
2+
(
Hu et al., 2021
)). Visualized with VMD.
Dommer et al.
35
estimates rate constants from simulations that have not yet
reached a steady state (
Suarez et al., 2014
).
Our WE simulations generated >800 atomically detailed,
Delta variant S-opening pathways (
Figures 5(B)
and
6
)of
the receptor binding domain (RBD) switching from a
glycan-shielded
down
to an exposed
up
state using 72
μ
s of total simulation time within 14 days using 192
NVIDIA V100 GPUs at a time on TACC
s Longhorn su-
percomputer. Among these pathways, 83 reach an
open
state that aligns with the structure of the human ACE2-
bound WT S protein (
Benton et al., 2020
) and 18 reach a
dramatically open state (
Figure 6
). Our haMSM analysis of
WT WE simulations successfully provided long-timescale
(steady state) rate constants for S-opening based on highly
transient information (
Figure 5(C)
).
We also leveraged a simple, yet powerful unsupervised
deep learning method called Anharmonic Conformational
Analysis enabled Autoencoders (ANCA-AE)
Clyde et al.
(2021)
to extract conformational states from our long-
timescale WE simulations of Delta spike opening
(
Figures 5(A) and (D)
). ANCA-AE
fi
rst minimizes the
fourth order correlations in atomistic
fl
uctuations from MD
simulation datasets and projects the data onto a low di-
mensional space where one can visualize the anharmonic
conformational
fl
uctuations. These projections are then
input to an autoencoder that further minimizes non-linear
correlations in the atomistic
fl
uctuations to learn an em-
bedding where conformations are automatically clustered
based on their structural and energetic similarity. A visu-
alization of the
fi
rst three dimensions from the latent space
articulates the RBD opening motion from its closed state
(
Figure 5(D)
). It is notable that while other deep learning
techniques need special purpose hardware (such as GPUs),
the ANCA-AE approach can be run with relatively modest
CPU resources and can therefore scale to much larger
systems (e.g., the virion within aerosol) when optimized.
D-NEMD explores pH effects on delta spike
We performed D-NEMD simulations of the SH system with
GROMACS (
Abraham et al., 2015
) using a
Δ
pH=2.0 (from
7.0 to 5.0) as the external perturbation. We ran 3200-ns
equilibrium MD simulations of SH to generate 87 con
fi
g-
urations (29 con
fi
gurations per replicate) that were used as
the starting points for multiple short (10 ns) D-NEMD
trajectories under the effect of the external perturbation
(
Δ
pH=2.0). The effect of a
Δ
pH was modeled by changing
the protonation state of histidines 66, 69, 146, 245, 625,
655, 1064, 1083, 1088, and 1101 (we note that other res-
idues may also become protonated (
Lobo and Warwicker
2021
); the D-NEMD approach can also be applied to ex-
amine those). The structural response of the S to the pH
Figure 5.
Delta variant spike opening from WE simulations, and AI/haMSM analysis. A) The integrated work
fl
ow. B) Snapshots of the
down,
”“
up,
and
open
states for Delta S-opening from a representative pathway generated by WE simulation, which represents
10
5
speedup compared to conventional MD. C) Rate constant estimation with haMSM analysis of WE data (purple lines) signi
fi
cantly
improves direct WE computation (red), by comparison to experimental measurement (black dashed). Varying haMSM estimates result
from different featurizations which will be individually cross-validated. D) The
fi
rst three dimensions of the ANCA-AE embeddings
depict a clear separation between the closed (darker purple) and open (yellow) conformations of the Delta spike. A sub-sampled
landscape is shown here where each sphere represents a conformation from the WE simulations and colored with the root-mean
squared deviations (
̊
A) with respect to the closed state. Visualized with VMD.
36
The International Journal of High Performance Computing Applications 37(1)
decrease was investigated by measuring the difference in the
position for each C
α
atom between the equilibrium and
corresponding D-NEMD simulation at equivalent points in
time (
Oliveira et al., 2021a
), namely after 0, 0.1, 1, 5, and 10
ns of simulation. The D-NEMD simulations reveal that pH
changes, of the type expected in aerosols, affect the dy-
namics of functionally important regions of the spike, with
potential implications for viral behavior (
Figure 7
). As this
approach involves multiple short independent non-
equilibrium trajectories, it is well suited for cloud com-
puting. All D-NEMD simulations were performed using
Oracle Cloud.
How performance was measured
WESTPA.
For the WE simulations of spike opening using
WESTPA, we de
fi
ned the time-to-solution as the total
simulation time required to generate the
fi
rst spike opening
event. Spike opening is essentially impossible to observe
via conventional MD. WESTPA simulations were run
using the AMBER20 dynamics engine and 192 NVIDIA
V100 GPUs at a time on TACC
s Longhorn
supercomputer.
NAMD.
NAMD performance metrics were collected using
hardware performance counters for FLOPs/step measure-
ments, and application-internal timers for overall simulation
rates achieved by production runs including all I/O for
simulation trajectory and checkpoint output. NAMD
FLOPs/step measurements were conducted on TACC
Frontera, by querying hardware performance counters with
the rdmsr utility from Intel msr-tools
1
and the
TACC stats
system programs.
2
For each simulation, FLOP counts were
measured for NAMD simulation runs of two different step
counts. The results of the two simulation lengths were
subtracted to eliminate NAMD startup operations, yielding
an accurate estimate of the marginal FLOPs per step for a
continuing simulation (
Phillips et al., 2002
). Using the
FLOPs/step values computed for each simulation, overall
FLOP rates were computed by dividing the FLOPs/step
value by seconds/step performance data reported by NAMD
internal application timers during production runs.
GROMACS.
GROMACS 2020.4 benchmarking was per-
formed on Oracle Cloud Infrastructure (OCI)
3
compute
shape BM.GPU4.8 consisting of 8×NVIDIA A100 tensor
core GPUs, and 64 AMD Rome CPU cores. The simulation
Figure 6.
WE simulations reveal a dramatic opening of the Delta S (cyan), compared to WT S (white). While further investigation is
needed, this super open state seen in the Delta S may indicate increased capacity for binding to human host-cell receptors.
Dommer et al.
37
used for benchmarking contained 615,563 atoms and was
run for 500,000 steps with 2 fs time steps. The simulations
were run on increasing numbers of GPUs, from 1 to 8, using
eight CPU cores per GPU, running for both the production
(Nose-Hoover) and GPU-accelerated (velocity rescaling)
thermostats. Particle
mesh Ewald (PME) calculations were
pinned to a single GPU, with additional GPUs for multi-
GPU jobs used for particle
particle calculations. Perfor-
mance data (ns/day and average single-precision TFLOPS,
calculated as total number of TFLOPs divided by total job
walltime) were reported by GROMACS itself. Each
simulation was repeated four times and average perfor-
mance
fi
gures reported.
Performance results
Table 2
.
NAMD performance.
NAMD was used to perform all of the
simulations listed in
Table 1
, except for the closed spike
SH
simulations described further below. With the ex-
ception of the aerosol and virion simulation, the other
Figure 7.
D-NEMD simulations reveal changes in key functional regions of the S protein, including the receptor binding domain, as the
result of a pH decrease. Color scale and ribbon thickness indicate the degree of deviation of C
α
atoms from their equilibrium position.
Red spheres indicate the location of positively charged histidines.
Table 2.
MD simulation
fl
oating point ops per timestep.
MD Simulation
Code
Atoms
a
FLOPs/step
Spike, head
AMBER, GROMACS
0.6
M
62.14 GFLOPs/step
Spike
NAMD
1.7
M
43.05 GFLOPs/step
S+m
1
/m
2
+ALB
NAMD
2.1
M
54.86 GFLOPs/step
Resp. Aero.+Vir
NAMD
1B
25.81 TFLOPs/step
a
FLOPs/step data were computed by direct FLOP measurements from hardware performance counters for NAMD simulations, or by using the
application-reported FLOP rates and ns/day simulation performance in the case of GROMACS.
38
The International Journal of High Performance Computing Applications 37(1)
NAMD simulations used conventional protocols and have
performance and parallel scaling characteristics that closely
match the results reported in our previous SARS-CoV-2
research
Casalino et al. (2021)
. NAMD 2.14 scaling per-
formance for the one billion-atom respiratory aerosol and
virion simulation run on ORNL Summit is summarized in
Tables 3
and
4
. A signi
fi
cant performance challenge as-
sociated with the aerosol virion simulation relates to the
roughly 50% reduction in particle density as compared with
a more conventional simulation with a fully populated
periodic cell. The reduced particle density results in large
regions of empty space that nevertheless incur additional
overheads associated with both force calculations and in-
tegration, and creates problems for the standard NAMD
load balancing scheme that estimates the work associated
with the cubic
patches
used for parallel domain decom-
position. The PME electrostatics algorithm and associated
3-D FFT and transpose operations encompass the entire
simulation unit cell and associated patches, requiring in-
volvement in communication and reduction operations
despite the inclusion of empty space. Enabling NAMD
diagnostic output on a 512-node 1B-atom aerosol and virion
simulation revealed that ranks assigned empty regions of the
periodic cell had 66 times the number of
fi
xed-size patches
as ranks assigned dense regions. The initial load estimate for
an empty patch was changed from a
fi
xed 10 atoms to a
runtime parameter with a default of 40 atoms, which re-
duced the patch ratio from 66 to 19 and doubled perfor-
mance on 512 nodes.
WESTPA performance.
Our time to solution for WE simu-
lations of spike opening (to the
up
state) (
Figure 5
) using
the WESTPA software and AMBER20 was 14
μ
s of total
simulation time, which was completed in 4 days using 192
NVIDIA V100 GPUs at a time on TACC
s Longhorn
supercomputer. For reference, conventional MD would
require an expected
5 orders of magnitude more com-
puting. The WESTPA software is highly scalable, with
nearly perfect scaling out to >1000 NVIDIA V100 GPUs
and this scaling is expected to continue until the
fi
lesystem
is saturated. Thus, WESTPA makes optimal use of large
supercomputers and is limited by
fi
lesystem I/O due to the
periodic restarting of trajectories after short time intervals.
AI-enhanced WE simulations.
DeepDriveMD is a framework
to coordinate the concurrent execution of ensemble simu-
lations and drive them using AI models
Brace et al. (2021a)
;
Lee et al. (2019)
. DeepDriveMD has been shown to im-
prove the scienti
fi
c performance of diverse problems:
from-protein folding to conformation of protein-ligand
complexes. We coupled WESTPA to DeepDriveMD,
which is responsible for resource dynamism and concurrent
heterogeneous task execution (ML and AMBER). The
coupled work
fl
ow was executed on 1024 nodes on Summit
(OLCF), and, in spite of the spatio-temporal heterogeneity
of tasks involved, the resource utilization was in the high
90%. Consistent with earlier studies, the coupling of
WESTPA to DeepDriveMD results in a 100x improvement
in the exploration of phase space.
GROMACS performance.
Figure 8
shows GROMACS par-
allelizes well across the eight NVIDIA A100 GPUs
available on each BM.GPU4.8 instance used in the
Cluster
in the Cloud
4
running on OCI. There is a performance drop
for two GPUs due to inef
fi
cient division of the PME and
particle
particle tasks. Methods to address this exist for the
two GPU case
P
́
all et al. (2020)
, but were not adopted as we
were targeting maximum raw performance across all eight
Table 3.
NAMD performance: Respiratory Aerosol + Virion, 1B
atoms, 4 fs timestep w/HMR, and PME every three steps.
Nodes
Summit
Speedup
Ef
fi
ciency
CPU + GPU
256
4.18 ns/day
1
:
100%
512
7.68 ns/day
1.84×
92%
1024
13.64 ns/day
3.27×
81%
2048
23.10 ns/day
5.53×
69%
4096
34.21 ns/day
8.19×
51%
Table 4.
Peak NAMD FLOP rates, ORNL Summit.
NAMD
Simulation
Atoms, B Nodes Sim rate
Performance
Resp. Aero.+Vir 1
4096 34.21 ns/day 2.55 PFLOPS
Figure 8.
GROMACS performance across 1
8 A100 GPUs in ns/
day (thicker, blue lines) and the fraction of maximum theoretical
TFLOPS (thinner, green lines); production setup shown with solid
line, and runs with the GPU-accelerated thermostat in dashed.
Dommer et al.
39
GPUs. Production simulations achieved 27% of the peak
TFLOPS available from the GPUs. Multiple simulations
were run across 10 such compute nodes, enabling the en-
semble to run at an average combined speed of 425
TFLOPS and sampling up to 1
μ
s/day. We note that the
calculations will be able to run 20%
40% faster once the
Nose-Hoover thermostat that is required for the simulation
is ported to run on the GPU. Benchmarking using a velocity
rescaling thermostat that has been ported to GPU shows that
this would enable the simulation to extract 34% of the peak
TFLOPS from the cards, enabling each node to achieve an
average speed of 53.4 TFLOPS, and 125 ns/day. A cluster of
10 nodes would enable GROMACS to run at an average
combined speed of over 0.5 PFLOPs, simulating over
1.2
μ
s/day.
A signi
fi
cant innovation is that this power is available on
demand: Cluster in the Cloud with GPU-optimized GRO-
MACS was provisioned and benchmarked within 1 day of
inception of the project. This was handed to the researcher,
who submitted the simulations. Automatically, up to 10
BM.GPU4.8 compute nodes were provisioned on-demand
based on requests from the Slurm scheduler. These simu-
lations were performed on OCI, using
Cluster in the Cloud
Williams (2021)
to manage automatic scaling.
Cluster in the Cloud was con
fi
gured to dynamically
provision and terminate computing nodes based on the
workload. Simulations were conducted using GROMACS
2020.4 compiled with CUDA support. Multiple simulta-
neous simulations were conducted, with each simulation
utilizing a single BM.GPU4.8 node without multinode
parallelism.
This allowed all production simulations to be completed
within 2 days. The actual compute cost of the project was
less than $6125 USD (on-demand OCI list price). The huge
reduction in
time to science
that low-cost cloud enables
changes the way that researchers can access and use HPC
facilities. In our opinion, such a setup enables
exclusive
on-demand
HPC capabilities for the scienti
fi
c community
for rapid advancement in science.
OrbNet performance.
Prior benchmarking reveals that
OrbNet provides over 1000-fold speedup compared to DFT
(
Christensen et al., 2021
). For the calculations presented
here, the cost of corresponding high quality range-separated
DFT calculations (
ω
B97X-D/def2-TZVP) can be estimated.
In
Figure 4(E)
, we consider system sizes which would
require 14,000
47,000 atomic orbitals for
ω
B97X-D/def2-
TZVP, exceeding the range of typical DFT evaluations.
Estimation of the DFT computational cost of the 1811
con
fi
gurations studied in
Figure 4(E)
suggests a total of
115M core-hours on NERSC Cori Haswell nodes; in
contrast, the OrbNet calculations for the current study re-
quire only 100k core-hours on the same nodes. DFT cost
estimates were based on extrapolation from a dataset of over
1M ChEMBL molecules ranging in size from 40 to 107
atom systems considering only the cubic cost component of
DFT (
Christensen et al., 2021
).
Implications
Our major scienti
fi
c achievements are
1. We showcase an extensible AI-enabled multiscale
computational framework that bridges time and
length scales from electronic structure through
whole aerosol particle morphology and dynamics.
2. We develop all-atom simulations of respiratory
mucins, and use these to understand the structural
basis of interaction with the SARS-CoV-2 spike
protein. This has implications for viral binding in
the deep lung, which is coated with mucins. We
expect the impact of our mucin simulations to be far
reaching, as malfunctions in mucin secretion and
folding have been implicated in progression of
severe diseases such as cancer and cystic
fi
brosis.
3. We present a signi
fi
cantly enhanced all-atom model
and simulation of the SARS-CoV-2 Delta virion,
which includes the hundreds of tiled M-protein
dimers and the E-protein ion channels. This
model can be used as a basis to understand why the
Delta virus is so much more infectious than the WT
or alpha variants.
4. We develop an ultra-large (1 billion+) all-atom
simulation capturing massive chemical and bio-
logical complexity within a respiratory aerosol.
This simulation provides the
fi
rst atomic-level
views of virus-laden aerosols and is already serv-
ing as a basis to develop an untold number of
experimentally testable hypotheses. An immediate
example suggests a mechanism through which
mucins and other species, for example, lipids, which
are present in the aerosol, arrange to protect the
molecular structure of the virus, which otherwise
would be exposed to the air-water interface. This
work also opens the door for developing simulations
of other aerosols, for example, sea spray aerosols,
that are involved in regulating climate.
5. We evidence how changes in pH, which are ex-
pected in the aerosol environment, may alter dy-
namics and allosteric communication pathways in
key functional regions of the Delta spike protein.
6. We characterize atomically detailed pathways for
the spike-opening process of the Delta variant using
WE simulations, revealing a dramatically open
state that may facilitate binding to human host cells.
7. We demonstrate how parallelized haMSM analysis
of WE data can provide physical rate estimates of
spike opening, improving prior estimates by many
40
The International Journal of High Performance Computing Applications 37(1)
orders of magnitude. The pipeline can readily be
applied to the any variant spike protein or other
complex systems of interest.
8. We show how HPC and cloud resources can be used
to signi
fi
cantly drive down time-to-solution for
major scienti
fi
c efforts as well as connect re-
searchers and greatly enable complex collaborative
interactions.
9. We demonstrate how AI coupled to HPC at multiple
levels can result in signi
fi
cantly improved effective
performance, for example, with AI-driven
WESTPA, and extend the reach and domain of
applicability of tools ordinarily restricted to smaller,
less complex systems, for example, with OrbNet.
10. While our work provides a successful use case, it also
exposes weaknesses in the HPC ecosystem in terms of
support for key steps in large/complex computational
science campaigns. We
fi
nd lack of widespread
support for high performance remote visualization
and interactive graphical sessions for system prepa-
ration, debugging, and analy
sis with diverse science
tools to be a limiting factor in such efforts.
Acknowledgements
We thank Prof. Kim Prather for inspiring and informative
discussions about aerosols and for her commitment to convey
the airborne nature of SARS-CoV-2. We thank D. Veesler for
sharing the Delta spike NTD coo
rdinates in advance of pub-
lication. We thank B. Messer, D. Maxwell, and the Oak Ridge
Leadership Computing Facilit
y at Oak Ridge National Labo-
ratory supported by the DOE under Contract DE-AC05-
00OR22725. We thank the Texas Advanced Computing Cen-
ter Frontera team, especially D. S
tanzione and T. Cockerill, and
for compute time made available through a Director
sDis-
cretionary Allocation (NSF OAC-1818253). We thank the
Argonne Leadership Computing Facility supported by the DOE
under DE-AC02-06CH11357. We thank the Pittsburgh Su-
percomputer Center for providing priority queues on Bridges-2
through the XSEDE allocation NSF TG-CHE060063. We thank
N. Kern and J. Lee of the CHARMM-GUI support team for help
converting topologies between NAMD and GROMACS. We
thank J. Copperman, G. Simpson, D. Aristoff, and J. Leung for
valuable discussions and support from NIH grant GM115805.
NAMD and VMD are funded by NIH P41-GM104601. This
work was supported by the NSF Center for Aerosol Impacts on
Chemistry of the Environment (CAICE), National Science
Foundation Center for Chemical Innovation (NSF CHE-
1801971), as well as NIH GM132826, NSF RAPID MCB-
2032054, an award from the RCSA Research Corp., a UC
San Diego Moore
s Cancer Center 2020 SARS-CoV-2 seed
grant, to R.E.A. This work was also supported by Oracle Cloud
credits and related resources provided by the Oracle for Re-
search program. AJM and ASFO receive funding from the
European Research Council (ERC) under the European Union
s
Horizon 2020 research and innovation programme (PRE-
DACTED Advanced Grant, Grant agreement No.: 101021207).
Declaration of con
fl
icting interests
The author(s) declared no potential con
fl
icts of interest with re-
spect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following
fi
nancial support
for the research, authorship, and/or publication of this article: This
work was supported by National Science Foundation (CHE-
1801971); National Science Foundation (MCB- 2032054); Na-
tional Science Foundation (OAC-1818253); National Science
Foundation (TG-CHE060063); U.S. Department of Energy (DE-
AC02-06CH11357); U.S. Department of Energy (DE-AC05-
00OR22725); National Institutes of Health (P41-GM104601);
National Institutes of Health (R01-GM132826).
ORCID iDs
Lorenzo Casalino
https://orcid.org/0000-0003-3581-1148
Mia Rosenfeld
https://orcid.org/0000-0002-8961-8231
Surl-Hee Ahn
https://orcid.org/0000-0002-3422-805X
John Russo
https://orcid.org/0000-0002-2813-6554
So
fi
a Oliveira
https://orcid.org/0000-0001-8753-4950
Clare Morris
https://orcid.org/0000-0002-4314-5387
Alexander Brace
https://orcid.org/0000-0001-9873-9177
Hyungro Lee
https://orcid.org/0000-0002-4221-7094
Zhuoran Qiao
https://orcid.org/0000-0002-5704-7331
Anima Anandkumar
https://orcid.org/0000-0002-6974-6797
James Phillips
https://orcid.org/0000-0002-2296-3591
John McCalpin
https://orcid.org/0000-0002-2535-1355
Christopher Woods
https://orcid.org/0000-0001-6563-9903
Matt Williams
https://orcid.org/0000-0003-2198-1058
Richard Pitts
https://orcid.org/0000-0002-2037-3360
Daniel Zuckerman
https://orcid.org/0000-0001-7662-2031
Adrian Mulholland
https://orcid.org/0000-0003-1015-4567
Arvind Ramanathan
https://orcid.org/0000-0002-1622-5488
Lillian Chong
https://orcid.org/0000-0002-0590-483X
Rommie E Amaro
https://orcid.org/0000-0002-9275-9553
Notes
1.
https://github.com/intel/msr-tools
2.
https://github.com/TACC/tacc_stats
3.
https://www.oracle.com/cloud/
4.
https://cluster-in-the-cloud.readthedocs.io/
References
Abraham MJ, Murtola T, Schulz R, et al. (2015) GROMACS: High
performance molecular simulations through multi-level par-
allelism from laptops to supercomputers.
SoftwareX
1-2:
19
25. DOI:
10.1016/j.softx.2015.06.001
.
Dommer et al.
41