#COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol

Special Issue Paper

The International Journal of High

Performance Computing Applications

2023, Vol. 37(1) 28

–

Article reuse guidelines:

sagepub.com/journals-permissions

DOI: 10.1177/10943420221128233

journals.sagepub.com/home/hpc

#COVIDisAirborne: AI-enabled multiscale

computational microscopy of delta SARS-

CoV-2 in a respiratory aerosol

Abigail Dommer

†

, Lorenzo Casalino

†



, Fiona Kearns

†

, Mia Rosenfeld



Nicholas Wauer

, Surl-Hee Ahn



, John Russo



,So

fi

a Oliveira



, Clare Morris



Anthony Bogetti

, Anda Trifan

5,6

, Alexander Brace

5,7



, Terra Sztain

1,8

, Austin Clyde

5,7

Heng Ma

, Chakra Chennubhotla

, Hyungro Lee



, Matteo Turilli

, Syma Khalid

Teresa Tamayo-Mendoza

, Matthew Welborn

, Anders Christensen

Daniel GA Smith

, Zhuoran Qiao



, Sai K Sirumalla

, Michael O

’

Connor

Frederick Manby

, Anima Anandkumar

12,13



, David Hardy

, James Phillips



Abraham Stern

, Josh Romero

, David Clark

, Mitchell Dorrell

, Tom Maiden

Lei Huang

, John McCalpin



, Christopher Woods



, Alan Gray

, Matt Williams



Bryan Barker

, Harinda Rajapaksha

, Richard Pitts



, Tom Gibbs

, John Stone

6,13

Daniel M. Zuckerman



, Adrian J. Mulholland



, Thomas Miller III

11,12

, Shantenu Jha

Arvind Ramanathan



, Lillian Chong



and Rommie E Amaro



Abstract

We seek to completely revise current models of airborne transmissio

n of respiratory viruses by provi

ding never-before-seen atomic-

level views of the SARS-CoV-2 virus within a respiratory aerosol.

Our work dramatically extends the capabilities of multiscale

computational microscopy to address the signi

fi

cant gaps that exist in current experimental m

ethods, which are limited in their ability

to interrogate aerosols at the at

omic/molecular level and thus obscure our understa

nding of airborne transmission. We demonstrate

how our integrated data-driven platform pro

vides a new way of exploring the compositio

n, structure, and dynamics of aerosols and

aerosolized viruses, while driving simulation method developme

nt along several important axes.

We present a series of initial

scienti

fi

c discoveries for the SARS-CoV-2 Delta v

ariant, noting that the full scienti

fi

c impact of this work has yet to be realized.

UC San Diego, La Jolla, CA, USA

Oregon Health & Science University, Portland, OR, USA

University of Bristol, UK

University of Pittsburgh, Pittsburgh, PA, USA

Argonne National Laboratory, Lemont, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Chicago, Chicago, IL, USA

Freie Universitat Berlin

Brookhaven National Lab and Rutgers University

University of Oxford, UK

Entos, Inc., San Diego, CA, USA

California Institute of Technology, Pasadena, CA, USA

NVIDIA Corp, Santa Clara, CA, USA

Pittsburgh Supercomputing Center, Pittsburgh, PA, USA

Texas Advanced Computing Center, Austin, TX, USA

Oracle for Research, Austin, TX, USA

†

Joint

fi

rst authors

Corresponding author:

Daniel Zuckerman, Adrian Mulholland, Thomas Miller III, Shantenu Jha, Arvind Ramanathan, Lillian Chong, Rommie E. Amaro.

Email:

ramaro@ucsd.edu

Keywords

molecular dynamics, deep learning, multiscale simulation, weighted ensemble, computational virology, SARS-CoV-2,

aerosols, COVID-19, HPC, AI, GPU, Delta

Justi

fi

cation

We develop a novel HPC-enabled multiscale research

framework to study aerosolized viruses and the full com-

plexity of species that comprise them. We present tech-

nological and methodological advances that bridge time and

length scales from electronic structure through whole

aerosol particle morphology and dynamics.

Performance attributes

Performance attribute

Our submission

Category of achievement

Scalability, Time-to-solution

Type of method used

Explicit, Deep Learning

Results reported on the basis of Whole application including I/O

Precision reported

Mixed Precision

System scale

Measured on full system

Measurement mechanism

Hardware performance

counters

Application timers

Performance Modeling

Overview of the problem.

Respiratory pathogens, such as

SARS-CoV-2 and in

fl

uenza, are the cause of signi

fi

cant

morbidity and mortality worldwide. These respiratory

pathogens are spread by virus-laden aerosols and droplets

that are produced in an infected person, exhaled, and

transported through the environment (

Wang et al., 2021

)

(

Figure 1

). Medical dogma has long focused on droplets as

the main transmission route for respiratory viruses, where

either a person has contact with an infected surface (fomites)

or direct droplet transmission by close contact with an

infected individual. However, as we continue to observe

with SARS-CoV-2, airborne transmission also plays a

signi

fi

cant role in spreading disease. We know this from

various super spreader events, for example, during a choir

rehearsal (

Miller et al., 2021

). Intervention and mitigation

decisions, such as the relative importance of surface

cleaning or whether and when to wear a mask, have un-

fortunately hinged on a weak understanding of aerosol

transmission, to the detriment of public health.

A central challenge to understanding airborne trans-

mission has been the inability of experimental science to

reliably probe the structure and dynamics of viruses once

they are inside respiratory aerosol particles. Single particle

experimental methods have poor resolution for smaller

particles (<1 micron) and are prone to sample destruction

during collection. Airborne viruses are present in low

concentrations in the air and are similarly prone to viral

inactivation during sampling. In addition, studies of the

initial infection event, for example, in the deep lung, are

limited in their ability to provide a detailed understanding of

the myriad of molecular interactions and dynamics taking

place in situ. Altogether, these knowledge gaps hamper our

collective ability to understand mechanisms of infection and

develop novel effective antivirals, as well as prevent us from

developing concrete, science-driven mitigation measures

(e.g., masking and ventilation protocols).

Here, we aim to reconceptualize current models of air-

borne transmission of respiratory viruses by providing

never-before-seen views of viruses within aerosols. Our

approach relies on the use of all-atom molecular dynamics

(MD) simulations as a multiscale

“

computational micro-

scope.

”

MD simulations can synthesize multiple types of

biological data (e.g., multiresolution structural datasets,

glycomics, lipidomics, etc.) into cohesive, biologically

“

accurate

”

structural models. Once created, we then ap-

proximate the model down to its many atoms, creating

trajectories of its time dependent dynamics under cell-like

(or in this case, aerosol-like) conditions. Critically, MD

simulations are more than just

“

pretty movies.

”

equations are solved in a theoretically rigorous manner,

allowing us to compute experimentally testable macro-

scopic observables from time-averaged microscopic prop-

erties. What this means is that we can directly connect MD

simulations with experiments, each validating and pro-

viding testable hypotheses to the other, which is the real

power of the approach. An ongoing challenge to the suc-

cessful application of such methods, however, is the need

for technological and methodological advances that make it

possible to access length scales relevant to the study of

large, biologically complex systems (spanning nanometers

∼

one micron in size) and, correspondingly, longer

timescales (microseconds to seconds).

Such challenges and opportunities manifest in the study

of aerosolized viruses. Aerosols are generally de

fi

ned as

being less than 5 microns in diameter, able to

fl

oat in the air

for hours, travel signi

fi

cant distances (i.e., can

fi

ll a room,

like cigarette smoke), and be inhaled. Fine aerosols < 1

micron in size can stay in the air for over 12 h and are

enriched with viral particles (

Fennelly 2020

;

Coleman et al.,

2021

). Our work focuses on these

fi

ner aerosols that travel

deeper into the respiratory tract. Several studies provide the

molecular recipes necessary to reconstitute respiratory

aerosols according to their actual biologically relevant

Dommer et al.

composition (

Vejerano and Marr 2018

;

Walker et al., 2021

These aerosols can contain lipids, cholesterol, albumin

(protein), various mono- and di-valent salts, mucins, other

surfactants, and water (

Figure 1

). Simulations of aerosolized

viruses embody a novel framework for the study of aerosols:

they will allow us and others to tune different species,

relative humidity, ion concentrations, etc. to match exper-

iments that can directly and indirectly connect to and inform

our simulations, as well as test hypotheses. Some of the

species under study here, for example, mucins, have not yet

been structurally characterized or explored with simulations

and thus the models we generate are expected to have

impact beyond their roles in aerosols.

In addition to varying aerosol composition and size, the

viruses themselves can be modi

fi

ed to re

fl

ect new variants

of concern, where such mutations may affect interactions

with particular species in the aerosol that might affect its

structural dynamics and/or viability. The virion developed

here is the Delta variant (B.1.617.2 lineage) of SARS-CoV-

Figure 2

), which presents a careful integration of multiple

biological datasets: (1) a complete viral envelope with re-

alistic membrane composition, (2) fully glycosylated full-

length spike proteins integrating 3D structural coordinates

from multiple cryoelectron microscopy (cryoEM) studies

(

McCallum et al., 2021

;

Wrapp et al., 2020

;

Walls et al.,

2020

;

Bangaru et al., 2020

) (3) all biologically known

features (post-translational modi

fi

cations, palmitoylation,

etc.), (4) any other known membrane proteins (e.g. the

envelope (E) and membrane (M) proteins), and (5) virion

size and patterning taken directly from cryoelectron to-

mography (cryoET). Each of the individual components of

the virus are built up before being integrated into the

composite virion, and thus represent useful molecular-scale

scienti

fi

c contributions in their own right (

Casalino et al.,

2020

;

Sztain et al., 2021

Altogether in this work, we dramatically extend the

capabilities of data-driven, multiscale computational mi-

croscopy to provide a new way of exploring the compo-

sition, structure, and dynamics of respiratory aerosols.

While a seemingly limitless number of putative hypotheses

could result from these investigations, the

fi

rst set of

questions we expect to answer are:

How does the virus exist

within a droplet of the same order of magnitude in size,

without being affected by the air-water interface, which is

known to destroy molecular structure

(

’

Imprima et al.

2019

)

? How does the biochemical composition of the

droplet, including pH, affect the structural dynamics of the

virus? Are there species within the aerosols that

“

buffer

”

the viral structure from damage, and are there particular

conditions under which the impact of those species

changes?

Our simulations can also provide speci

fi

c pa-

rameters that can be included in physical models of aerosols,

Figure 1.

Overall schematic depicting the construction and multiscale simulations of Delta SARS-CoV-2 in a respiratory aerosol. (N.B.:

The size of di-valent cations has been increased for visibility.)

The International Journal of High Performance Computing Applications 37(1)

which still assume a simple water or water-salt composition

even though it is well known that such models, for example,

using kappa-Kohler theory, break down signi

fi

cantly as the

molecular species diversify (

Petters and Kreidenweis 2007

Current state of the art

Current experimental methods are unable to directly in-

terrogate the atomic-level structure and dynamics of viruses

and other molecules within aerosols. Here we showcase

computational microscopy as a powerful tool capable to

overcome these signi

fi

cant experimental limitations. We

present the major elements of our multiscale computational

microscope and how they come together in an integrated

manner to enable the study of aerosols across multiple

scales of resolution. We demonstrate the impact such

methods can bring to bear on scienti

fi

c challenges that until

now have been intractable, and present a series of new

scienti

fi

c discoveries for SARS-CoV-2.

Parallel molecular dynamics

All-atom molecular dynamics simulation has emerged as an

increasingly powerful tool for understanding the molecular

mechanisms underlying biophysical behaviors in complex

systems. Leading simulation engines, NAMD (

Phillips

et al., 2020

), AMBER (

Case et al. [n. d.]

), and GRO-

MACS (

all et al., 2020

), are broadly useful, with each

providing unique strengths in terms of speci

fi

c methods or

capabilities as required to address a particular biological

question, and in terms of their support for particular HPC

hardware platforms. Within the multiscale computational

microscopy platform developed here, we show how each

of these different codes contributes different elements to

the overall framework, oftentimes utilizing different

computing modalities/architectures, while simultaneously

extending on state-of-the-art for each. Structure building,

simulation preparation, visualization, and post hoc tra-

jectory analysis are performed using VMD on both local

workstations and remote HPC resources, enabling mod-

eling of the molecular systems studied herein (

Humphrey

et al., 1996

;

Stone et al., 2013a

2016b

;

Sener et al.,

2021

). We show how further development of each of these

codes, considered together within the larger-scale collec-

tive framework, enables the study of SARS-CoV-2 in a

wholly novel manner, with extension to numerous other

complex systems and diseases.

AI-enhanced WE simulations

Because the virulence of the Delta variant of SARS-CoV-2

may be partly attributable to spike protein (S) opening, it is

of pressing interest to characterize the mechanism and ki-

netics of the process. Although S-opening in principle can

be studied via conventional MD simulations, in practice the

system complexity and timescales make this wholly in-

tractable. Splitting strategies that periodically replicate

promising MD trajectories, among them the weighted

Figure 2.

Individual protein components of the SARS-CoV-2 Delta virion. The spike is shown with the surface in cyan and with Delta

’

mutated residues and deletion sites highlighted in pink and yellow, respectively. Glycans attached to the spike are shown in blue. The E

protein is shown in yellow and the M-protein is shown in silver and white. Visualized with VMD.

Dommer et al.

ensemble (WE) method (

Huber and Kim 1996

;

Zuckerman

and Chong 2017

), have enabled simulations of the spike

opening of WT SARS-CoV-2 (

Sztain et al., 2021

;

Zimmerman et al., 2021

). WE simulations can be orders of

magnitude more ef

fi

cient than conventional MD in gener-

ating pathways and rate constants for rare events (e.g.

protein folding (

Adhikari et al., 2019

) and binding (

Saglam

and Chong 2019

)). The WESTPA software for running WE

(

Zwier et al., 2015

) is well-suited for high-performance

computing with nearly perfect CPU/GPU scaling. The

software is interoperable with any dynamics engine, in-

cluding the GPU-accelerated AMBER dynamics engine

(

Salomon-Ferrer et al., 2013

) that is used here. As shown

below, major upgrades to WESTPA (v. 2.0) have enabled a

dramatic demonstration of spike opening in the Delta

variant (

Figures 5

and

) and exponentially improved

analysis of spike-opening kinetics (

Russo et al., 2022

The integration of AI techniques with WE can further

enhance the ef

fi

ciency of sampling rare events (

Noe 2020

;

Brace et al., 2021b

;

Casalino et al., 2021

). One frontier area

couples unsupervised linear and non-linear dimensionality

reduction methods to identify collective variables/progress

coordinates in high-dimensional molecular systems

(

Bhowmik et al., 2018

;

Clyde et al., 2021

). Such methods

may be well suited for analyzing the aerosolized virus.

Integrating these approaches with WE simulations is ad-

vantageous in sampling the closed

→

open transitions in the

Delta S landscape (

Figure 5

) as these unsupervised AI

approaches automatically stratify progress coordinates

(

Figure 5(D)

Dynamical non-equilibrium MD

Aerosols rapidly acidify during

fl

ight via reactive uptake of

atmospheric gases, which is likely to impact the opening/

closing of the S protein (

Vejerano and Marr 2018

;

Warwicker 2021

). Here, we describe the extension of dy-

namical non-equilibrium MD (D-NEMD) (

Ciccotti and

Ferrario 2016

) to investigate pH effects on the Delta S.

D-NEMD simulations (

Ciccotti and Ferrario 2016

) are

emerging as a useful technique for identifying allosteric

effects and communication pathways in proteins (

Galdadas

et al., 2021

;

Oliveira et al., 2019

), including recently

identifying effects of linoleic acid in the WT spike (

Oliveira

et al., 2021b

). This approach complements equilibrium MD

simulations, which provide a distribution of con

fi

gurations

as starting points for an ensemble of short non-equilibrium

trajectories under the effect of the external perturbation. The

response of the protein to the perturbation introduced can

then be determined using the Kubo-Onsager relation

(

Oliveira et al., 2021a

;

Ciccotti and Ferrario 2016

) by di-

rectly tracking the change in atomic positions between the

equilibrium and non-equilibrium simulations at equivalent

points in time (

Oliveira et al., 2021a

OrbNet

ions are known to play a key role in mucin aggregation

in epithelial tissues (

Hughes et al., 2019

). Our RAV sim-

ulations would be an ideal case-study to probe such com-

plex interactions between Ca

, mucins, and the SARS-

CoV-2 virion in aerosols. However, Ca

binding energies

can be dif

fi

cult to capture accurately due to electronic

dispersion and polarization, terms which are not typically

modeled in classical mechanical force

fi

elds. Quantum

mechanical (QM) methods are uniquely suited to capture

these subtle interactions. Thus, we set out to estimate the

correlation in Ca

binding energies between

CHARMM36m and quantum mechanical estimates enabled

via AI with OrbNet. Calculation of energies with suf

fi

cient

accuracy in biological systems can, in many cases, be ad-

equately described with density functional theory (DFT).

However, its high cost limits the applicability of DFT in

comparison to

fi

xed charge force

fi

elds. To capture quantum

quality energetics at a fraction of the computational ex-

pense, we employ a novel approach (OrbNet) based on the

featurization of molecules in terms of symmetry-adapted

atomic orbitals and the use of graph neural network methods

for deep learning quantum-mechanical properties (

Qiao

et al., 2020

). Our method outperforms existing methods

in terms of its training ef

fi

ciency and transferable accuracy

across diverse molecular systems, opening a new pathway

for replacing DFT in large-scale scienti

fi

c applications such

as those explored here. (

Christensen et al., 2021

Innovations realized

Construction and simulation of SARS-CoV-2 in a respiratory

aerosol.

Our approach to simulating the entire aerosol

follows a composite framework wherein each of the indi-

vidual molecular pieces is re

fi

ned and simulated on its own

before it is incorporated into the composite model. Simu-

lations of each of the components are useful in their own

right, and often serve as the basis for biochemical and

biophysical validation and experiments (

Casalino et al.,

2020

Throughout, we refer to the original circulating SARS-

CoV-2 strain as

“

WT,

”

whereas all SARS-CoV-2 proteins

constructed in this work represent the Delta variant

(

Figure 2

). All simulated membranes re

fl

ect mammalian

ER-Golgi intermediate compartment (ERGIC) mimetic

lipid compositions. VMD (

Humphrey et al., 1996

;

Stone

et al., 2016a

), psfgen (

Phillips et al., 2005

), and CHARMM-

GUI (

Park et al., 2019

) were used for construction and

parameterization. Topologies and parameters for simula-

tions were taken from CHARMM36m all-atom additive

force

fi

elds (

Guvench et al., 2009

;

Huang and Mackerell

2013

;

Huang et al., 2017

;

Klauda et al., 2010

;

Beglov and

Roux 1994

;

Han et al., 2018

;

Venable et al., 2013

). NAMD

The International Journal of High Performance Computing Applications 37(1)

was used to perform MD simulations (

Phillips et al., 2020

adopting similar settings and protocols as in (

Casalino et al.,

2020

). All systems underwent solvation, charge neutrali-

zation, minimization, heating, and equilibration prior to

production runs. Refer to

Table 1

for Abbreviations, PBC

dimensions, total number of atoms, and total equilibration

times for each system of interest.

Simulating the SARS-CoV-2 structural proteins.

Fully glyco-

sylated Delta spike (S) structures in open and closed con-

formations were built based on WT constructs from

Casalino et al. (

Casalino et al., 2020

) with the following

mutations: T19R, T95I, G142D, E156G,

157

–

158,

L452R, T478K, D614G, P681R, and D950N (

McCallum

et al., 2021

;

Kannan et al., 2021

). Higher resolved regions

were grafted from PDB 7JJI (

Bangaru et al., 2020

). Ad-

ditionally, coordinates of residues 128

–

167

—

accounting

for a drastic conformational change seen in the Delta variant

—

graciously made available to us by the Veesler Lab,

were similarly grafted onto our constructs (

McCallum et al.,

2021

). Finally, the S proteins were glycosylated following

work by Casalino et al. (

Casalino et al., 2020

). By incor-

porating the Veesler Lab

’

s bleeding-edge structure

(

McCallum et al., 2021

) and highly resolved regions from

7JJI (

Bangaru et al., 2020

), our models represent the most

complete and accurate structures of the Delta S to date. The

S proteins were inserted into membrane patches and

equilibrated for 3 × 110 ns. For non-equilibrium and

weighted ensemble simulations, a closed S head (SH,

residues 13

–

1140) was constructed by removing the stalk

from the full-length closed S structure, then resolvated,

neutralized, minimized, and subsequently passed to WE and

D-NEMD teams. The M-protein was built from a structure

graciously provided by the Feig Lab (paper in prep). The

model was inserted into a membrane patch and equilibrated

for 700 ns. RMSD-based clustering was used to select a

stable starting M-protein conformation. From the equili-

brated and clustered M structure, VMD

’

s Mutator plugin

(

Humphrey et al., 1996

) was used to incorporate the I82T

mutation onto each M monomer to arrive at the Delta variant

M. To construct the most complete E protein model to-date,

the structure was patched together by resolving incomplete

PDBs 5X29 (

Surya et al., 2018

), 7K3G (

Mandala et al.,

2020

) and 7M4R (

Chai et al., 2021

). To do so, the trans-

membrane domain (residues 8

–

38) from 7K3G were

aligned to the N-terminal domain (residues 1

–

7) and resi-

dues 39 to 68 of 5X29 and residues 69 to 75 of 7M4R by

their C

atoms. E was then inserted into a membrane patch

and equilibrated for 40 ns.

Constructing the SARS-CoV-2 Delta virion.

The SARS-CoV-2

Delta virion (V) model was constructed following Casalino

et al. (

Casalino et al., 2021

) using CHARMM-GUI (

Lee

et al., 2016

), LipidWrapper (

Durrant and Amaro 2014

), and

Blender (

Blender Online Community 2020

), using a 350

̊

lipid bilayer with an equilibrium area per lipid of 63

̊

and

a 100 nm diameter Blender icospherical surface mesh

(

Turonova et al., 2020

). The resulting lipid membrane was

solvated in a 1100

̊

waterbox and subjected to four rounds

of equilibration and patching (

Casalino et al., 2021

). 360

dimers and 4 E pentamers were then tiled onto the surface,

followed by random placement of 29 full-length S proteins

Table 1.

Summary of all systems constructed in this work. See

Figure 3

for illustration of aerosol construction.

systems

Abb

(

̊

A×

̊

A×

̊

(ns)

M dimers

125 × 125 × 124

164,741

700

E pentamers

123 × 125 × 102

136,775

Spikes

(Open)

206 × 200 × 410

1,692,444

330

(Closed)

204 × 202 × 400

1,658,224

330

(Closed, head)

172 × 184 × 206

615,593

Mucins

short mucin 1

123 × 104 × 72

87,076

short mucin 2

120 × 101 × 72

82,155

long mucin 1

810 × 104 × 115

931,778

long mucin 2

904 × 106 × 109

997,029

long mucin 3

860 × 111 × 113

1,040,215

S+m1/m2+ALB

SMA

227 × 229 × 433

2,156,689

840

Virion

1460 × 1460 × 1460

305,326,834

Resp.Aero.+Vir

RAV

2834 × 2820 × 2828

1,016,813,441

2.42

Total FLOPS

2.4 ZFLOPS

M, E, S, SH, and V models represent SARS-CoV-2 Delta strain.

Abbreviations used throughout document.

Periodic boundary dimensions.

Total number

of atoms.

Total aggregate simulation time, including heating and equilibration runs.

Simulated with NAMD.

Simulated with NAMD, AMBER, and

GROMACS.

Dommer et al.

(9 open, 20 closed) according to experimentally observed S

protein density (

Ke et al., 2020

). M and E proteins were

oriented with intravirion C-termini. After solvation in a

1460

̊

A waterbox, the complete V model tallied >305

million atoms (

Table 1

). V was equilibrated for 41 ns prior

to placement in the respiratory aerosol (RA) model. The

equilibrated membrane was 90 nm in diameter and remains

in close structural agreement with the experimental studies

(

Ke et al., 2020

Building and simulating the respiratory aerosol.

Respiratory

aerosols contain a complex mixture of chemical and bio-

logical species. We constructed a respiratory aerosol (RA)

fl

uid based on a composition from arti

fi

cial saliva and

surrogate deep lung

fl

uid recipes (

Walker et al., 2021

). This

recipe includes 0.7 m

DPPG, 6.5 m

DPPC, 0.3 m

cholesterol, 1.4 m

, 0.8 m

, and 142 m

(

Vejerano and Marr 2018

;

Walker et al., 2021

), human

serum albumin (ALB) protein, and a composition of mucins

(

Figure 3

). Mucins are long polymer-like structures that are

decorated by dense, heterogeneous, and complex regions of

O-glycans. This work represents the

fi

rst of its kind as, due

to their complexity, the O-glycosylated regions of mucins

have never before been constructed for molecular simula-

tions. Two short (m

∼

5 nm) and three long (m

∼

55 nm) mucin models were constructed following known

experimental compositions of protein and glycosylation

sequences (

Symmes et al., 2018

;

Hughes et al., 2019

;

Markovetz et al., 2019

;

Thomsson et al., 2005

;

Mariethoz

et al., 2018

) with ROSETTA (

Raveh et al., 2010

) and

CHARMM-GUI Glycan Modeller (

Jo et al., 2011

). Mucin

models (short and long) were solvated, neutralized by

charge matching with Ca

ions, minimized, and equili-

brated for 15

–

25 ns each (

Table 1

). Human serum albumin

(ALB), which is also found in respiratory aerosols, was

constructed from PDB 1AO6 (

Sugio et al., 1999

). ALB was

solvated, neutralized, minimized, and equilibrated for 7ns.

Equilibrated structures of ALB and the three long mucins

were used in construction of the RAV with m3+m4+m5

added at 6 g/mol and ALB at 4.4 g/mol.

Constructing the respiratory aerosolized virion model

A 100 nm cubic box with the RA

fl

uid recipe speci

fi

above was built with PACKMOL (

Mart

ı

nez et al., 2009

minimized, equilibrated brie

fl

y on TACC Frontera, then

replicated to form a 300 nm cube. The RA box was then

carved into a 270 nm diameter sphere. To make space for the

placement of V within the RA, a spherical selection with

volume corresponding to that of the V membrane + S crown

(radius 734

̊

A) was deleted from the center of the RA. The

fi

nal equilibrated V model, including surrounding equili-

brated waters and ions (733

̊

A radius), was translated into

the RA. Atom clashes were resolved using a 1.2

̊

A cutoff.

Hydrogen mass repartitioning (

Hopkins et al., 2015

) was

applied to the structure to improve performance. The

simulation box was increased to 2800

̊

A per side to provide

a 100

̊

A vacuum atmospheric buffer. The RAV simulation

was conducted in an NVT ensemble with a 4 fs timestep.

After minimizing, the RAV was heated to 298 K with

0.1 kcal/mol

̊

restraints on the viral lipid headgroups, then

equilibrated for 1.5 ns. Finally, a cross-section of the RAV

model

—

including and open S, m1/m2, and ALB (called the

SMA system)

—

was constructed with PACKMOL to

closely observe atomic scale interactions within the RAV

model (

Figure 4

Parameter evaluation with OrbNet

Comparison to quantum methods reveals signi

fi

cant po-

larization effects, and shows that there is opportunity to

improve the accuracy of

fi

xed charge force

fi

elds. For the

large system sizes associated with solvated Ca

-protein

interaction motifs (over 1000 atoms, even in aggressively

truncated systems), conventional quantum mechanics

methods like density functional theory (DFT) are imprac-

tical for analyzing a statistically signi

fi

cant ensemble of

distinct con

fi

gurations (see discussion in Performance

Results). In contrast, OrbNet allows for DFT accuracy with

over 1000-fold speedup, providing a useful method for

benchmarking and re

fi

ning the force

fi

eld simulation pa-

rameters with quantum accuracy (

Christensen et al., 2021

To con

fi

rm the accuracy of OrbNet versus DFT (

B97X-D/

def2-TZVP), the inset of

Figure 4(E)

correlates the two

methods for the Ca

-binding energy in a benchmark dataset

Figure 3.

Image of RAV with relative mass ratios of RA molecular

components represented in the colorbar. Water content is

dependent on the relative humidity of the environment and is thus

omitted from the molecular ratios.

The International Journal of High Performance Computing Applications 37(1)

of small Ca

-peptide complexes (

Hu et al., 2021

). The

excellent correlation of OrbNet and DFT for the present use

case is clear from the inset

fi

gure; six datapoints were re-

moved from this plot on the basis of a diagnostic applied to

the semi-empirical GFN-xTB solution used for feature

generation of OrbNet (

Christensen et al., 2021

Figure 4

presents a comparison of the validated OrbNet

method with the CHARMM36m force

fi

eld for 1800

snapshots taken from the SMA MD simulations. At each

snapshot, a subsystem containing a solvated Ca

-protein

complex was extracted (

Figure 4(E)

), with protein bonds

capped by hydrogens. For both OrbNet and the force

fi

eld,

the Ca

-binding energy was computed and shown in the

correlation plot. Lack of correlation between OrbNet and

the force

fi

eld identi

fi

es important polarization effects,

absent in a

fi

xed charge description. Similarly, the steep

slope of the best-

fi

t line in

Figure 4(E)

fl

ects the fact that

some of the con

fi

gurations sampled using MD with the

CHARMM36m force

fi

eld are relatively high in energy

according to the more accurate OrbNet potential. This

approach allows us to test and quantify limitations of

empirical force

fi

elds, such as lack of electronic

polarization.

The practicality of OrbNet for these simulation snapshots

with 1000+ atoms offers a straightforward multiscale

strategy for re

fi

ning the accuracy of the CHARMM36m

force

fi

eld. By optimizing the partial charges and other force

fi

eld parameters, improved correlation with OrbNet for the

subtle Ca

-protein interactions could be achieved, leading

to near-quantum accuracy simulations with improved

con

fi

gurational sampling. The calculations presented here

present a proof-of-concept of this iterative strategy.

AI-WE simulations of delta spike opening

While our previous WE simulations of the WT SARS-CoV-

2 S-opening (

Sztain et al., 2021

) were notable in generating

pathways for a seconds-timescale process of a massive

system, we have made two critical technological ad-

vancements in the WESTPA software that greatly enhance

the ef

fi

ciency and analysis of WE simulations. These ad-

vances enabled striking observations of Delta variant S

opening (

Figures 5

and

). First, in contrast to prior manual

bins for controlling trajectory replication, we have devel-

oped automated and adaptive binning that enables more

fi

cient surmounting of large barriers via early identi

fi

cation of

“

bottleneck

”

regions (

Torrillo et al., 2021

). Sec-

ond, we have parallelized, memory-optimized, and

implemented data streaming for the history-augmented

Markov state model (haMSM) analysis scheme

(

Copperman and Zuckerman 2020

) to enable application to

the TB-scale S-opening datasets. The haMSM approach

Figure 4.

SMA system captured with multiscale modeling from classical MD to AI-enabled quantum mechanics. For all panels: S protein

shown in cyan, S glycans in blue, m

shown in red, ALB in orange, Ca

in yellow spheres, viral membrane in purple. A) Interactions

between mucins and S facilitated by glycans and Ca

. B) Snapshot from SMA simulations. C) Example Ca

binding site from SMA

simulations (1800 sites, each 1000+ atoms) used for AI-enabled quantum mechanical estimates from OrbNet Sky. D) Quanti

fi

cation of

contacts between S and mucin from SMA simulations. E) OrbNet Sky energies versus CHARMM36m energies for each sub-selected

system, colored by total number of atoms. Performance of OrbNet Sky versus DFT in subplot (

B97x-D3/def-TZVP,

=0.99, for 17

systems of peptides chelating Ca

(

Hu et al., 2021

)). Visualized with VMD.

Dommer et al.

estimates rate constants from simulations that have not yet

reached a steady state (

Suarez et al., 2014

Our WE simulations generated >800 atomically detailed,

Delta variant S-opening pathways (

Figures 5(B)

and

)of

the receptor binding domain (RBD) switching from a

glycan-shielded

“

down

”

to an exposed

“

”

state using 72

s of total simulation time within 14 days using 192

NVIDIA V100 GPUs at a time on TACC

’

s Longhorn su-

percomputer. Among these pathways, 83 reach an

“

open

”

state that aligns with the structure of the human ACE2-

bound WT S protein (

Benton et al., 2020

) and 18 reach a

dramatically open state (

Figure 6

). Our haMSM analysis of

WT WE simulations successfully provided long-timescale

(steady state) rate constants for S-opening based on highly

transient information (

Figure 5(C)

We also leveraged a simple, yet powerful unsupervised

deep learning method called Anharmonic Conformational

Analysis enabled Autoencoders (ANCA-AE)

Clyde et al.

(2021)

to extract conformational states from our long-

timescale WE simulations of Delta spike opening

(

Figures 5(A) and (D)

). ANCA-AE

fi

rst minimizes the

fourth order correlations in atomistic

fl

uctuations from MD

simulation datasets and projects the data onto a low di-

mensional space where one can visualize the anharmonic

conformational

fl

uctuations. These projections are then

input to an autoencoder that further minimizes non-linear

correlations in the atomistic

fl

uctuations to learn an em-

bedding where conformations are automatically clustered

based on their structural and energetic similarity. A visu-

alization of the

fi

rst three dimensions from the latent space

articulates the RBD opening motion from its closed state

(

Figure 5(D)

). It is notable that while other deep learning

techniques need special purpose hardware (such as GPUs),

the ANCA-AE approach can be run with relatively modest

CPU resources and can therefore scale to much larger

systems (e.g., the virion within aerosol) when optimized.

D-NEMD explores pH effects on delta spike

We performed D-NEMD simulations of the SH system with

GROMACS (

Abraham et al., 2015

) using a

pH=2.0 (from

7.0 to 5.0) as the external perturbation. We ran 3200-ns

equilibrium MD simulations of SH to generate 87 con

fi

urations (29 con

fi

gurations per replicate) that were used as

the starting points for multiple short (10 ns) D-NEMD

trajectories under the effect of the external perturbation

(

pH=2.0). The effect of a

pH was modeled by changing

the protonation state of histidines 66, 69, 146, 245, 625,

655, 1064, 1083, 1088, and 1101 (we note that other res-

idues may also become protonated (

Lobo and Warwicker

2021

); the D-NEMD approach can also be applied to ex-

amine those). The structural response of the S to the pH

Figure 5.

Delta variant spike opening from WE simulations, and AI/haMSM analysis. A) The integrated work

fl

ow. B) Snapshots of the

“

down,

”“

up,

”

and

“

open

”

states for Delta S-opening from a representative pathway generated by WE simulation, which represents

∼

speedup compared to conventional MD. C) Rate constant estimation with haMSM analysis of WE data (purple lines) signi

fi

cantly

improves direct WE computation (red), by comparison to experimental measurement (black dashed). Varying haMSM estimates result

from different featurizations which will be individually cross-validated. D) The

fi

rst three dimensions of the ANCA-AE embeddings

depict a clear separation between the closed (darker purple) and open (yellow) conformations of the Delta spike. A sub-sampled

landscape is shown here where each sphere represents a conformation from the WE simulations and colored with the root-mean

squared deviations (

̊

A) with respect to the closed state. Visualized with VMD.

The International Journal of High Performance Computing Applications 37(1)

decrease was investigated by measuring the difference in the

position for each C

atom between the equilibrium and

corresponding D-NEMD simulation at equivalent points in

time (

Oliveira et al., 2021a

), namely after 0, 0.1, 1, 5, and 10

ns of simulation. The D-NEMD simulations reveal that pH

changes, of the type expected in aerosols, affect the dy-

namics of functionally important regions of the spike, with

potential implications for viral behavior (

Figure 7

). As this

approach involves multiple short independent non-

equilibrium trajectories, it is well suited for cloud com-

puting. All D-NEMD simulations were performed using

Oracle Cloud.

How performance was measured

WESTPA.

For the WE simulations of spike opening using

WESTPA, we de

fi

ned the time-to-solution as the total

simulation time required to generate the

fi

rst spike opening

event. Spike opening is essentially impossible to observe

via conventional MD. WESTPA simulations were run

using the AMBER20 dynamics engine and 192 NVIDIA

V100 GPUs at a time on TACC

’

s Longhorn

supercomputer.

NAMD.

NAMD performance metrics were collected using

hardware performance counters for FLOPs/step measure-

ments, and application-internal timers for overall simulation

rates achieved by production runs including all I/O for

simulation trajectory and checkpoint output. NAMD

FLOPs/step measurements were conducted on TACC

Frontera, by querying hardware performance counters with

the rdmsr utility from Intel msr-tools

and the

“

TACC stats

”

system programs.

For each simulation, FLOP counts were

measured for NAMD simulation runs of two different step

counts. The results of the two simulation lengths were

subtracted to eliminate NAMD startup operations, yielding

an accurate estimate of the marginal FLOPs per step for a

continuing simulation (

Phillips et al., 2002

). Using the

FLOPs/step values computed for each simulation, overall

FLOP rates were computed by dividing the FLOPs/step

value by seconds/step performance data reported by NAMD

internal application timers during production runs.

GROMACS.

GROMACS 2020.4 benchmarking was per-

formed on Oracle Cloud Infrastructure (OCI)

compute

shape BM.GPU4.8 consisting of 8×NVIDIA A100 tensor

core GPUs, and 64 AMD Rome CPU cores. The simulation

Figure 6.

WE simulations reveal a dramatic opening of the Delta S (cyan), compared to WT S (white). While further investigation is

needed, this super open state seen in the Delta S may indicate increased capacity for binding to human host-cell receptors.

Dommer et al.

used for benchmarking contained 615,563 atoms and was

run for 500,000 steps with 2 fs time steps. The simulations

were run on increasing numbers of GPUs, from 1 to 8, using

eight CPU cores per GPU, running for both the production

(Nose-Hoover) and GPU-accelerated (velocity rescaling)

thermostats. Particle

–

mesh Ewald (PME) calculations were

pinned to a single GPU, with additional GPUs for multi-

GPU jobs used for particle

–

particle calculations. Perfor-

mance data (ns/day and average single-precision TFLOPS,

calculated as total number of TFLOPs divided by total job

walltime) were reported by GROMACS itself. Each

simulation was repeated four times and average perfor-

mance

fi

gures reported.

Performance results

Table 2

NAMD performance.

NAMD was used to perform all of the

simulations listed in

Table 1

, except for the closed spike

“

”

simulations described further below. With the ex-

ception of the aerosol and virion simulation, the other

Figure 7.

D-NEMD simulations reveal changes in key functional regions of the S protein, including the receptor binding domain, as the

result of a pH decrease. Color scale and ribbon thickness indicate the degree of deviation of C

atoms from their equilibrium position.

Red spheres indicate the location of positively charged histidines.

Table 2.

MD simulation

fl

oating point ops per timestep.

MD Simulation

Code

Atoms

FLOPs/step

Spike, head

AMBER, GROMACS

0.6

62.14 GFLOPs/step

Spike

NAMD

1.7

43.05 GFLOPs/step

S+m

+ALB

NAMD

2.1

54.86 GFLOPs/step

Resp. Aero.+Vir

NAMD

25.81 TFLOPs/step

FLOPs/step data were computed by direct FLOP measurements from hardware performance counters for NAMD simulations, or by using the

application-reported FLOP rates and ns/day simulation performance in the case of GROMACS.

The International Journal of High Performance Computing Applications 37(1)

NAMD simulations used conventional protocols and have

performance and parallel scaling characteristics that closely

match the results reported in our previous SARS-CoV-2

research

Casalino et al. (2021)

. NAMD 2.14 scaling per-

formance for the one billion-atom respiratory aerosol and

virion simulation run on ORNL Summit is summarized in

Tables 3

and

. A signi

fi

cant performance challenge as-

sociated with the aerosol virion simulation relates to the

roughly 50% reduction in particle density as compared with

a more conventional simulation with a fully populated

periodic cell. The reduced particle density results in large

regions of empty space that nevertheless incur additional

overheads associated with both force calculations and in-

tegration, and creates problems for the standard NAMD

load balancing scheme that estimates the work associated

with the cubic

“

patches

”

used for parallel domain decom-

position. The PME electrostatics algorithm and associated

3-D FFT and transpose operations encompass the entire

simulation unit cell and associated patches, requiring in-

volvement in communication and reduction operations

despite the inclusion of empty space. Enabling NAMD

diagnostic output on a 512-node 1B-atom aerosol and virion

simulation revealed that ranks assigned empty regions of the

periodic cell had 66 times the number of

fi

xed-size patches

as ranks assigned dense regions. The initial load estimate for

an empty patch was changed from a

fi

xed 10 atoms to a

runtime parameter with a default of 40 atoms, which re-

duced the patch ratio from 66 to 19 and doubled perfor-

mance on 512 nodes.

WESTPA performance.

Our time to solution for WE simu-

lations of spike opening (to the

“

”

state) (

Figure 5

) using

the WESTPA software and AMBER20 was 14

s of total

simulation time, which was completed in 4 days using 192

NVIDIA V100 GPUs at a time on TACC

’

s Longhorn

supercomputer. For reference, conventional MD would

require an expected

∼

5 orders of magnitude more com-

puting. The WESTPA software is highly scalable, with

nearly perfect scaling out to >1000 NVIDIA V100 GPUs

and this scaling is expected to continue until the

fi

lesystem

is saturated. Thus, WESTPA makes optimal use of large

supercomputers and is limited by

fi

lesystem I/O due to the

periodic restarting of trajectories after short time intervals.

AI-enhanced WE simulations.

DeepDriveMD is a framework

to coordinate the concurrent execution of ensemble simu-

lations and drive them using AI models

Brace et al. (2021a)

;

Lee et al. (2019)

. DeepDriveMD has been shown to im-

prove the scienti

fi

c performance of diverse problems:

from-protein folding to conformation of protein-ligand

complexes. We coupled WESTPA to DeepDriveMD,

which is responsible for resource dynamism and concurrent

heterogeneous task execution (ML and AMBER). The

coupled work

fl

ow was executed on 1024 nodes on Summit

(OLCF), and, in spite of the spatio-temporal heterogeneity

of tasks involved, the resource utilization was in the high

90%. Consistent with earlier studies, the coupling of

WESTPA to DeepDriveMD results in a 100x improvement

in the exploration of phase space.

GROMACS performance.

Figure 8

shows GROMACS par-

allelizes well across the eight NVIDIA A100 GPUs

available on each BM.GPU4.8 instance used in the

Cluster

in the Cloud

running on OCI. There is a performance drop

for two GPUs due to inef

fi

cient division of the PME and

particle

–

particle tasks. Methods to address this exist for the

two GPU case

all et al. (2020)

, but were not adopted as we

were targeting maximum raw performance across all eight

Table 3.

NAMD performance: Respiratory Aerosol + Virion, 1B

atoms, 4 fs timestep w/HMR, and PME every three steps.

Nodes

Summit

Speedup

fi

ciency

CPU + GPU

256

4.18 ns/day

∼

:

0×

∼

100%

512

7.68 ns/day

1.84×

92%

1024

13.64 ns/day

3.27×

81%

2048

23.10 ns/day

5.53×

69%

4096

34.21 ns/day

8.19×

51%

Table 4.

Peak NAMD FLOP rates, ORNL Summit.

NAMD

Simulation

Atoms, B Nodes Sim rate

Performance

Resp. Aero.+Vir 1

4096 34.21 ns/day 2.55 PFLOPS

Figure 8.

GROMACS performance across 1

–

8 A100 GPUs in ns/

day (thicker, blue lines) and the fraction of maximum theoretical

TFLOPS (thinner, green lines); production setup shown with solid

line, and runs with the GPU-accelerated thermostat in dashed.

Dommer et al.

GPUs. Production simulations achieved 27% of the peak

TFLOPS available from the GPUs. Multiple simulations

were run across 10 such compute nodes, enabling the en-

semble to run at an average combined speed of 425

TFLOPS and sampling up to 1

s/day. We note that the

calculations will be able to run 20%

–

40% faster once the

Nose-Hoover thermostat that is required for the simulation

is ported to run on the GPU. Benchmarking using a velocity

rescaling thermostat that has been ported to GPU shows that

this would enable the simulation to extract 34% of the peak

TFLOPS from the cards, enabling each node to achieve an

average speed of 53.4 TFLOPS, and 125 ns/day. A cluster of

10 nodes would enable GROMACS to run at an average

combined speed of over 0.5 PFLOPs, simulating over

1.2

s/day.

A signi

fi

cant innovation is that this power is available on

demand: Cluster in the Cloud with GPU-optimized GRO-

MACS was provisioned and benchmarked within 1 day of

inception of the project. This was handed to the researcher,

who submitted the simulations. Automatically, up to 10

BM.GPU4.8 compute nodes were provisioned on-demand

based on requests from the Slurm scheduler. These simu-

lations were performed on OCI, using

Cluster in the Cloud

Williams (2021)

to manage automatic scaling.

Cluster in the Cloud was con

fi

gured to dynamically

provision and terminate computing nodes based on the

workload. Simulations were conducted using GROMACS

2020.4 compiled with CUDA support. Multiple simulta-

neous simulations were conducted, with each simulation

utilizing a single BM.GPU4.8 node without multinode

parallelism.

This allowed all production simulations to be completed

within 2 days. The actual compute cost of the project was

less than $6125 USD (on-demand OCI list price). The huge

reduction in

“

time to science

”

that low-cost cloud enables

changes the way that researchers can access and use HPC

facilities. In our opinion, such a setup enables

“

exclusive

on-demand

”

HPC capabilities for the scienti

fi

c community

for rapid advancement in science.

OrbNet performance.

Prior benchmarking reveals that

OrbNet provides over 1000-fold speedup compared to DFT

(

Christensen et al., 2021

). For the calculations presented

here, the cost of corresponding high quality range-separated

DFT calculations (

B97X-D/def2-TZVP) can be estimated.

Figure 4(E)

, we consider system sizes which would

require 14,000

–

47,000 atomic orbitals for

B97X-D/def2-

TZVP, exceeding the range of typical DFT evaluations.

Estimation of the DFT computational cost of the 1811

con

fi

gurations studied in

Figure 4(E)

suggests a total of

115M core-hours on NERSC Cori Haswell nodes; in

contrast, the OrbNet calculations for the current study re-

quire only 100k core-hours on the same nodes. DFT cost

estimates were based on extrapolation from a dataset of over

1M ChEMBL molecules ranging in size from 40 to 107

atom systems considering only the cubic cost component of

DFT (

Christensen et al., 2021

Implications

Our major scienti

fi

c achievements are

1. We showcase an extensible AI-enabled multiscale

computational framework that bridges time and

length scales from electronic structure through

whole aerosol particle morphology and dynamics.

2. We develop all-atom simulations of respiratory

mucins, and use these to understand the structural

basis of interaction with the SARS-CoV-2 spike

protein. This has implications for viral binding in

the deep lung, which is coated with mucins. We

expect the impact of our mucin simulations to be far

reaching, as malfunctions in mucin secretion and

folding have been implicated in progression of

severe diseases such as cancer and cystic

fi

brosis.

3. We present a signi

fi

cantly enhanced all-atom model

and simulation of the SARS-CoV-2 Delta virion,

which includes the hundreds of tiled M-protein

dimers and the E-protein ion channels. This

model can be used as a basis to understand why the

Delta virus is so much more infectious than the WT

or alpha variants.

4. We develop an ultra-large (1 billion+) all-atom

simulation capturing massive chemical and bio-

logical complexity within a respiratory aerosol.

This simulation provides the

fi

rst atomic-level

views of virus-laden aerosols and is already serv-

ing as a basis to develop an untold number of

experimentally testable hypotheses. An immediate

example suggests a mechanism through which

mucins and other species, for example, lipids, which

are present in the aerosol, arrange to protect the

molecular structure of the virus, which otherwise

would be exposed to the air-water interface. This

work also opens the door for developing simulations

of other aerosols, for example, sea spray aerosols,

that are involved in regulating climate.

5. We evidence how changes in pH, which are ex-

pected in the aerosol environment, may alter dy-

namics and allosteric communication pathways in

key functional regions of the Delta spike protein.

6. We characterize atomically detailed pathways for

the spike-opening process of the Delta variant using

WE simulations, revealing a dramatically open

state that may facilitate binding to human host cells.

7. We demonstrate how parallelized haMSM analysis

of WE data can provide physical rate estimates of

spike opening, improving prior estimates by many

The International Journal of High Performance Computing Applications 37(1)

orders of magnitude. The pipeline can readily be

applied to the any variant spike protein or other

complex systems of interest.

8. We show how HPC and cloud resources can be used

to signi

fi

cantly drive down time-to-solution for

major scienti

fi

c efforts as well as connect re-

searchers and greatly enable complex collaborative

interactions.

9. We demonstrate how AI coupled to HPC at multiple

levels can result in signi

fi

cantly improved effective

performance, for example, with AI-driven

WESTPA, and extend the reach and domain of

applicability of tools ordinarily restricted to smaller,

less complex systems, for example, with OrbNet.

10. While our work provides a successful use case, it also

exposes weaknesses in the HPC ecosystem in terms of

support for key steps in large/complex computational

science campaigns. We

fi

nd lack of widespread

support for high performance remote visualization

and interactive graphical sessions for system prepa-

ration, debugging, and analy

sis with diverse science

tools to be a limiting factor in such efforts.

Acknowledgements

We thank Prof. Kim Prather for inspiring and informative

discussions about aerosols and for her commitment to convey

the airborne nature of SARS-CoV-2. We thank D. Veesler for

sharing the Delta spike NTD coo

rdinates in advance of pub-

lication. We thank B. Messer, D. Maxwell, and the Oak Ridge

Leadership Computing Facilit

y at Oak Ridge National Labo-

ratory supported by the DOE under Contract DE-AC05-

00OR22725. We thank the Texas Advanced Computing Cen-

ter Frontera team, especially D. S

tanzione and T. Cockerill, and

for compute time made available through a Director

’

sDis-

cretionary Allocation (NSF OAC-1818253). We thank the

Argonne Leadership Computing Facility supported by the DOE

under DE-AC02-06CH11357. We thank the Pittsburgh Su-

percomputer Center for providing priority queues on Bridges-2

through the XSEDE allocation NSF TG-CHE060063. We thank

N. Kern and J. Lee of the CHARMM-GUI support team for help

converting topologies between NAMD and GROMACS. We

thank J. Copperman, G. Simpson, D. Aristoff, and J. Leung for

valuable discussions and support from NIH grant GM115805.

NAMD and VMD are funded by NIH P41-GM104601. This

work was supported by the NSF Center for Aerosol Impacts on

Chemistry of the Environment (CAICE), National Science

Foundation Center for Chemical Innovation (NSF CHE-

1801971), as well as NIH GM132826, NSF RAPID MCB-

2032054, an award from the RCSA Research Corp., a UC

San Diego Moore

’

s Cancer Center 2020 SARS-CoV-2 seed

grant, to R.E.A. This work was also supported by Oracle Cloud

credits and related resources provided by the Oracle for Re-

search program. AJM and ASFO receive funding from the

European Research Council (ERC) under the European Union

’

Horizon 2020 research and innovation programme (PRE-

DACTED Advanced Grant, Grant agreement No.: 101021207).

Declaration of con

fl

icting interests

The author(s) declared no potential con

fl

icts of interest with re-

spect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following

fi

nancial support

for the research, authorship, and/or publication of this article: This

work was supported by National Science Foundation (CHE-

1801971); National Science Foundation (MCB- 2032054); Na-

tional Science Foundation (OAC-1818253); National Science

Foundation (TG-CHE060063); U.S. Department of Energy (DE-

AC02-06CH11357); U.S. Department of Energy (DE-AC05-

00OR22725); National Institutes of Health (P41-GM104601);

National Institutes of Health (R01-GM132826).

ORCID iDs

Lorenzo Casalino



https://orcid.org/0000-0003-3581-1148

Mia Rosenfeld



https://orcid.org/0000-0002-8961-8231

Surl-Hee Ahn



https://orcid.org/0000-0002-3422-805X

John Russo



https://orcid.org/0000-0002-2813-6554

fi

a Oliveira



https://orcid.org/0000-0001-8753-4950

Clare Morris



https://orcid.org/0000-0002-4314-5387

Alexander Brace



https://orcid.org/0000-0001-9873-9177

Hyungro Lee



https://orcid.org/0000-0002-4221-7094

Zhuoran Qiao



https://orcid.org/0000-0002-5704-7331

Anima Anandkumar



https://orcid.org/0000-0002-6974-6797

James Phillips



https://orcid.org/0000-0002-2296-3591

John McCalpin



https://orcid.org/0000-0002-2535-1355

Christopher Woods



https://orcid.org/0000-0001-6563-9903

Matt Williams



https://orcid.org/0000-0003-2198-1058

Richard Pitts



https://orcid.org/0000-0002-2037-3360

Daniel Zuckerman



https://orcid.org/0000-0001-7662-2031

Adrian Mulholland



https://orcid.org/0000-0003-1015-4567

Arvind Ramanathan



https://orcid.org/0000-0002-1622-5488

Lillian Chong



https://orcid.org/0000-0002-0590-483X

Rommie E Amaro



https://orcid.org/0000-0002-9275-9553

Notes

https://github.com/intel/msr-tools

https://github.com/TACC/tacc_stats

https://www.oracle.com/cloud/

https://cluster-in-the-cloud.readthedocs.io/

References

Abraham MJ, Murtola T, Schulz R, et al. (2015) GROMACS: High

performance molecular simulations through multi-level par-

allelism from laptops to supercomputers.

SoftwareX

1-2:

–

25. DOI:

10.1016/j.softx.2015.06.001

Dommer et al.