Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action

Intelligent Resolution: Integrating Cryo-EM with AI-driven

Multi-resolution Simulations to Observe the SARS-CoV-2

Replication-Transcription Machinery in Action

Anda Trifan

1

,

2

†

, Defne Gorgun

1

,

2

†

, Zongyi Li

3

†

, Alexander Brace

1

,

4

†

, Maxim Zvyagin

1

†

, Heng Ma

1

†

Austin Clyde

1

,

4

, David Clark

5

, Michael Salim

1

, David J. Hardy

2

, Tom Burnley

6

, Lei Huang

7

, John

McCalpin

7

, Murali Emani

1

, Hyenseung Yoo

1

, Junqi Yin

8

, Aristeidis Tsaris

8

, Vishal Subbiah

9

Tanveer Raza

9

, Jessica Liu

9

, Noah Trebesch

2

, Geo

rey Wells

10

, Venkatesh Mysore

5

, Thomas

Gibbs

5

, James Phillips

1

, S. Chakra Chennubhotla

11

, Ian Foster

1

,

4

, Rick Stevens

1

,

4

, Anima

Anandkumar

3

,

5

⇤

, Venkatram Vishwanath

1

⇤

, John E. Stone

2

⇤

, Emad Tajkhorshid

2

⇤

, Sarah A.

Harris

12

⇤

, Arvind Ramanathan

1

⇤

1

Argonne National Laboratory,

2

University of Illinois Urbana-Champaign,

3

California Institute of Technology,

4

University of Chicago,

5

NVIDIA,

6

Science and Technology Facilities Council,

7

Texas Advanced Computing Center,

8

Oak Ridge National Laboratory,

9

Cerebras Inc.,

10

University College of London,

11

University of Pittsburgh,

12

University of Leeds

†

Joint

rst authors,

⇤

Contact authors: s.a.harris@leeds.ac.uk, anima@caltech.edu, venkat@anl.gov, johns@ks.uiuc.edu,

emad@illinois.edu, ramanathana@anl.gov

ABSTRACT

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)

replication transcription complex (RTC) is a multi-domain protein

responsible for replicating and transcribing the viral mRNA inside

a human cell. Attacking RTC function with pharmaceutical com-

pounds is a pathway to treating COVID-19. Conventional tools,

e.g., cryo-electron microscopy and all-atom molecular dynamics

(AAMD), do not provide su

ciently high resolution or timescale

to capture important dynamics of this molecular machine. Con-

sequently, we develop an innovative work

ow that bridges the

gap between these resolutions, using mesoscale

uctuating

nite

element analysis (FFEA) continuum simulations and a hierarchy

of AI-methods that continually learn and infer features for main-

taining consistency between AAMD and FFEA simulations. We

leverage a multi-site distributed work

ow manager to orchestrate

AI, FFEA, and AAMD jobs, providing optimal resource utilization

across HPC centers. Our study provides unprecedented access to

study the SARS-CoV-2 RTC machinery, while providing general

capability for AI-enabled multi-resolution simulations at scale.

KEYWORDS

multi-resolution simulations, SARS-CoV-2, COVID19, HPC, AI

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for pro

t or commercial advantage and that copies bear this notice and the full citation

on the

rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior speci

c permission and/or a

fee. Request permissions from permissions@acm.org.

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

©

2020 Association for Computing Machinery.

ACM ISBN ISBN...$15.00

https://doi.org/

nalDOI

ACM Reference Format:

Anda Trifan

1

,

2

†

, Defne Gorgun

1

,

2

†

, Zongyi Li

3

†

, Alexander Brace

1

,

4

†

, Maxim

Zvyagin

1

†

, Heng Ma

1

†

, Austin Clyde

1

,

4

, David Clark

5

, Michael Salim

1

, David

J. Hardy

2

, Tom Burnley

6

, Lei Huang

7

, John McCalpin

7

, Murali Emani

1

Hyenseung Yoo

1

, Junqi Yin

8

, Aristeidis Tsaris

8

, Vishal Subbiah

9

, Tanveer

Raza

9

, Jessica Liu

9

, Noah Trebesch

2

, Geo

rey Wells

10

, Venkatesh Mysore

5

Thomas Gibbs

5

, James Phillips

1

, S. Chakra Chennubhotla

11

, Ian Foster

1

,

4

Rick Stevens

1

,

4

, Anima Anandkumar

3

,

5

⇤

, Venkatram Vishwanath

1

⇤

, John

E. Stone

2

⇤

, Emad Tajkhorshid

2

⇤

, Sarah A. Harris

12

⇤

, Arvind Ramanathan

1

⇤

2020. Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-

resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription

Machinery in Action. In

Supercomputing ’21: International Conference for

High Performance Computing, Networking, Storage, and Analysis.

ACM, New

York, NY, USA, 14 pages. https://doi.org/

nalDOI

1 JUSTIFICATION

We developed an AI-enabled multi-resolution simulation frame-

work for studying complex biomolecular machines by directly inte-

grating experimental data. Our framework sets high-water marks

for AI-driven multi-resolution simulations and achieving high uti-

lization of resources across diverse supercomputing platforms at

multiple sites.

2 PERFORMANCE ATTRIBUTES

Performance Attribute

Our Submission

Category of achievement

Scalability, Time-to-solution

Type of method used

Explicit, Deep Learning

Results reported on the basis of Whole application including I/O

Precision reported

Mixed Precision

System scale

Measured on full system

Measurement mechanism

Hardware performance counters,

Application timers,

Performance Modeling

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

Trifan et al.

Ensemble Continuum

Simulations (FFEA)

All-atom Ensemble MD Simulations

(NAMD)

Cryo-EM maps

Hierarchical AI Methods for

computational steering

Initial best

guess

Global

conformation

uctuations

Atomistic (re

ned)

Cryo-EM models

•

Atomistically

correct sub-

domain

orientations

•

Interface

potentials

Novel conformational

states

Localized

uctuations

1

2

3

4

5

6

Protein Data

Bank

Individual protein

“parts”

Figure 1: An integrative biology framework for re

ning

low resolution cryo-EM structures with multi-resolution

simulations. (1) Representing the cryo-EM density map as

a continuum visco-elastic solid. (2) Finite element analy-

sis simulations are then used to generate new conforma-

tions. AI techniques identify interesting events in the land-

scape (global conformational changes), while (3) simultane-

ously constraining them with all-atom simulations derived

protein-protein interface potentials. (4) AI methods are also

used to learn local conformational changes across the molec-

ular machine, such that they can be used to (5) re

ne do-

main orientations in the entire biomolecular complex. (6)

The output represents a set of atomistically re

ned ensem-

ble of structures that captures the conformational

uctua-

tions embodied in the cryo-EM data.

3 OVERVIEW OF THE PROBLEM

The novel coronavirus 2019 (COVID-19) pandemic has led to a

massive acceleration in the pace of development in experimental

structural biology. Multiple research teams have invested their

best resources towards the common goal of characterizing viral

components and their biological functions, thereby providing a

sign-post for the future direction of the

eld (Alam and Higgins,

2020, Bárcena et al., 2021, Barrantes, 2021, Kim and Jung, 2021).

Structural biology data, which provides the basis for rational design

of all new medicines against human and animal disease, is now

inherently multi-modal and multi-scale, requiring novel integrative

approaches (AlQuraishi, 2019, Arantes et al., 2020, Jumper et al.,

2021, Minkyung et al., 2021, Muratov et al., 2021, Padhi et al., 2021,

Tunyasuvunakool et al., 2021, Zimmerman et al., 2020).

While the mechanism of human host cell entry and infection

by SARS-CoV-2 via the spike glycoprotein is now relatively well

characterized (Barros et al., 2020, Shang et al., 2020, Sztain et al.,

2021, Zhang et al., 2021), how the SARS-CoV-2 replicates inside the

host cell is still unclear (Romano et al., 2020). The viral-RNA replica-

tion mechanism is complex, involving RNA synthesis, proofreading,

and capping and is mainly carried out by the mini-replication tran-

scription complex plus error correction machinery (mRTC+ECM

from here on referred to as RTC), having to survive against the

human immune response. Cryo-EM techniques and computational

methods have been immensely helpful in elucidating the overall

structural organization of the RTC (Chen et al., 2020, Perry et al.,

2021, Yan et al., 2020, 2021a), but the high intrinsic

exibility, size

and complexity of the nsp arrangement entails that the overall res-

olution of the data is inherently poor. Consequently, the structure

nement work

ows discard 30-40% of collected images from the

existing RTC complex datasets (Chen et al., 2020, Yan et al., 2020,

2021a), leaving signi

cant gaps in our understanding.

Although several studies have focused on disrupting the func-

tion of the individual non-structural proteins (nsps) with small

molecules, key insights into the overall structural organization,

dynamics and function of the RTC machinery are more di

cult to

obtain. This is crucial, because the ability to target protein-protein

interactions between subunits of the RTC complex o

ers far more

possibilities for drug development. Moreover, the arrangement of

individual protein components of the RTC+ECM protein is itself

dynamic during the viral life-cycle. The computational capability to

model these interactions would provide further insight of relevance

to drug development, but is currently impossible without novel

multi-scale models and the work

ows that connect them.

The primary challenge of experimental imaging is elucidating

diverse structural dynamics. This stems from the

averaging

process

of the imaging data: cryo-EM, in particular, and other experimental

techniques capture only the most sampled conformational states

as static, snapshot-like representations, but the intermediates or

transitional states are less represented (Lyumkis, 2019, Merk et al.,

2016). The details of motions within

exible domains can be en-

riched using complimentary tools such as molecular dynamics (MD)

simulations and Bayesian inference techniques (Bowerman et al.,

2017, Bratholm et al., 2015, Cavalli et al., 2007, Grishaev and Llinás,

2005, Scheres, 2012); however, the timescales accessible to these

atomistic simulations can be a limiting factor. In addition, advances

in 4D imaging modalities (Earnest et al., 2017, Engel et al., 2015,

Mahamid et al., 2016, Villa and Lasker, 2014) and the volume of data

generated from such experimental datasets can be overwhelming.

Therefore, in this paper we address the urgent, yet unmet need to

develop scalable computational tools that can aid the improvement

of resolution within cryo-EM datasets through multi-resolution

simulations. In an e

ort to bridge the gap between experimental

and purely all-atom molecular dynamics (AAMD), we leverage

a complementary mesoscale method of representing biophysical

systems, treating biomolecules as visco-elastic continuum solids

using

uctuating

nite element analysis (FFEA) (Oliver et al., 2013)

technique. These continuum-scale lower resolution FFEA simula-

tions provide a generative model for the cryo-EM data. However,

implementing an approach that directly models electron density

information from cryo-EM data requires a radically di

erent way

to model conformational ensembles, one that moves away from

atomistic-resolution towards a

continuum

-representation, where

by the intrinsic resolution of the data can be captured with nodes

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

90°

Van der Waals interaction energies

Figure 2: Hybrid structure of the FFEA mesh superimposed with the all-atom representation. The all-atom structure of the

RTC dimer is shown as a cartoon (blue) and the FFEA tetrahedral mesh structure determined from the experimental cryo-EM

map is shown as a wireframe. The top inset represents a 90

rotation of the RTC dimer capturing the range of protein-protein

interfaces in the machine. A close-up view of the mesh at the interface between the RdRp (nsp12) and nsp13 reveals the surface

interface potentials inferred from AAMD simulations. Each mesh point is painted with the surface interface potentials, with

darker shades of red indicating higher and lighter shades indicating lower interaction energy in the protein-protein interface.

and meshes, allowing experimentally obtained cryo-EM data to

be used directly in simulations, at time and length-scale orders of

magnitude increased over traditional all-atom methodologies.

We present a radically innovative work

ow, providing the op-

portunity to accelerate conformational sampling (both AAMD and

FFEA) with AI, thus holding much promise in providing mechanis-

tic insights into the dynamics of large biomolecular complexes (Fig.

1). We leverage FFEA and AAMD simulations to iteratively

ll in

the gaps of cryo-EM data by re

ning the pairwise atomic/molecular

interactions while being automatically constrained by the degrees

of freedom embodied in the cryo-EM data. This iterative coarse-

graining approach maintains the

delity between experimentally

observed densities with the detailed potentials from rigorous physics-

based AAMD simulations to learn informative

priors

and e

ectively

guide the conformational search and can result in better

ts to

the experimental observations. We demonstrate our iterative ap-

proach in modeling the complex conformational transitions within

the SARS-CoV-2 RTC to observe how the various nsps coordinate

the proofreading process of the viral-RNA. These conformational

changes can aid in understanding and guiding the design of novel

therapeutics such as molnupiravir, which aim to target speci

nsps. (Agostini et al., 2019, Sheahan et al., 2020)

4 CURRENT STATE OF THE ART

4.1 Parallel molecular dynamics

NAMD (Phillips et al., 2005, 2020) has been one of the most utilized

parallel molecular dynamics engines for over two decades, being

cited once every

⇠

70 minutes

. NAMD uses adaptive, asynchro-

nous, message-driven execution based on Charm++ (Kalé et al., 2019,

Kalé and Zheng, 2013), while e

ciently utilizing GPUs (Phillips

et al., 2008). It contains advanced features giving scientists an exten-

sive set of tools to observe and/or bias their systems using state-of-

the-art simulation methodologies such as collective variables (Fiorin

et al., 2013) and molecular dynamics

exible

tting (MDFF) (Tra-

buco et al., 2009, Vant et al., 2020) modules.

VMD and its psfgen plugin were used to build and analyze the

molecular complexes studied herein. Simulation preparation, visu-

alization, and conventional analysis approaches were performed

using VMD, with extensive use of GPU-accelerated remote visu-

alization resources (Humphrey et al., 1996a, Stone et al., 2013a,b,

2016). VMD incorporates features for processing cryo-EM density

maps to prepare and analyze

￿

hybrid

tting simulations as

required in this project (Stone et al., 2014). VMD incorporates a

custom GPU-accelerated ray tracing engine that exploits hardware

accelerated construction of ray tracing BVH acceleration structures,

BVH traversal, and ray-triangle intersections (Sener et al., 2021).

4.2 FFEA: continuum molecular simulations

Recent advances in cryo-EM, and -ET aided scientists to generate

structural and dynamic information of large biomolecules in 3D

volumetric data (Kühlbrandt, 2014). Expanding on this volumetric

shape, FFEA uses a 3D tetrahedral

nite element mesh providing

dynamic insight of the structure alone and in response to interac-

tions with other molecules (Solernou et al., 2018). FFEA is a new

https://www.ks.uiuc.edu/Highlights/?section=2021&highlight=2021-09

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

Trifan et al.

nsp13-1A

nsp14A

nsp10A

nsp14B

nsp10B

nsp9A

nsp12A

nsp8-2A

nsp13-2A

nsp8-1A

Initial Structure

Final Structure

RNA unwinding

Figure 3: The atomistic representation of a functional RTC monomer (A) with the RNA (red), extended to the active site of the

other nsp14-10 unit of the opposite monomer (nsp14B - orange), and the exit site on the helicase of the same monomer (nsp13-

2A). The mechanism of RNA unwinding using non-equilibrium molecular dynamics simulations starting from the initial

structure as captured in 7EGQ (B-left), to the

nal structure (B-right). The RNA backbone is restrained with an

￿

grid

represented in a grey mesh, while using

colvars

to pull the strands to the active sites on the protein. This is an unprecedented

simulation of the mechanism of RNA unwinding which can give signi

cant insight into the RTC function.

physics algorithm for simulation of mesoscale biological structures

obtained from cryo-EM and cryo-ET (Solernou et al., 2018). FFEA

treats biomolecules as continuum visco-elastic solids subject to

thermal noise, chosen so as to satisfy the

uctuation-dissipation

theorem (Oliver et al., 2013). To describe interacting proteins, FFEA

includes short-range van der Waals attractive forces and steric re-

pulsion. Therefore, FFEA uses a unique “top-down” rather than the

more conventional “bottom-up” coarse-graining strategy, by intro-

ducing nanoscale thermal

uctuations into macroscopic continuum

mechanics equations.

FFEA also includes functionality to exert external forces on pro-

teins, to connect proteins together with harmonic springs and to

represent conformational changes between distinct protein confor-

mational states, for example between the pre- and post-powerstroke

states of molecular motors (Richardson et al., 2020). FFEA has been

used to successfully model diverse biological systems, including the

rotary ATPase motor (Richardson et al., 2014), axonemal (Richard-

son et al., 2020) and cytoplasmic (Hanson et al., 2021) dynein motors,

and protein antibodies subjected to external forces (van der Heijden

et al., 2020).

4.3 Multiscale biological simulations

Integrating data across diverse spatial, temporal, and functional

scales has played an important role in understanding the role of

molecular interactions in various diseases (Pak and Voth, 2018,

Tozzini, 2010, Walpole et al., 2013, Zhou, 2014). In order to bridge

information across multiple scales, a number of coarse-graining

approaches have been widely employed. These methods have been

used to study individual proteins from the SARS-CoV-2 proteome (De San-

cho et al., 2020, Garay et al., 2021), as well as more complex systems

such as the entire SARS-CoV-2 virion (Yu et al., 2021). We note that

compared to the state-of-the-art approaches, we choose a contin-

uum representation of molecules, one that is more conducible for

accessing much larger length- and time-scales using FFEA. While

nite element methods are used to model biological systems (e.g.,

growth models, bone growth, etc.) (Dong and Skalak, 1992, Kenn-

away and Coen, 2019, Panagiotopoulou et al., 2017), we note that

our collective approach enables tight integration between all-atom

and continuum representations for large biomolecular complexes

such as the SARS-CoV-2 RTC.

While physics-based approaches for coarse-graining are usually

employed to overcome limitations of the space- and time-scales

accessed by traditional molecular simulations, machine learning

techniques are now being regularly employed to adaptive coarse-

grained simulations from all-atom simulations (Durumeric and

Voth, 2019, Husic et al., 2020, Noé, 2020).

4.4 AI enabled adaptive MD simulations

MD simulation provides atomistic details of molecular interactions

and ensued dynamics. It’s however computationally expensive, and

the sampling e

ciency can be severely hindered, when trapped by

local minima in the free energy landscape. Our group has developed

AI-enabled adaptive MD simulations, namely DeepDriveMD (Lee

et al., 2019), which records MD propagation in latent space via

convolutional variational autoencoder (CVAE) (Bhowmik et al.,

2018), and drives adaptive sampling in the high dimensional con-

formational landscape. The AI inference constantly observes the

simulation runs, prunes stagnant ones that are trapped in local

energy minima, and spawns new ones from less sampled conforma-

tions. We have shown that this approach can accelerate sampling

of rare events (for example, in the SARS-CoV-2 Spike opening sim-

ulations (Casalino et al., 2021), protein folding (Lee et al., 2019))

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

by at least an order of magnitude. When integrated with special-

ized AI-hardware to accelerate the learning, it can provide nearly 4

orders of magnitude speedup (Brace et al., 2021).

4.5 Work

ow infrastructure

Distributed science work

ows must invariably handle data and

control

ow across networks spanning administrative domains.

Challenging barriers to work

ow management frameworks’ adop-

tion in multi-user HPC environments lie in deploying distributed

client/server infrastructures and establishing connectivity among

remote systems. For instance, Fireworks (Jain et al., 2015) and the

RADICAL-Pilot / Ensemble Toolkit (EnTK) (Balasubramanian et al.,

2016, Merzky et al., 2018) are three widely-used WMFs at super-

computing facilities that expose Python APIs to de

ne and submit

directed acyclic graphs (DAGs) of stateful tasks to a database. These

WMFs possess various implementations of a common

pilot job

de-

sign, whereby tasks are e

ciently executed on HPC resources by

a process that synchronizes state with the work

ow database. Be-

cause the database is directly written by user or pilot job clients,

users of these WMF typically deploy and manage their own data-

base servers such as MongoDB servers.

These methods are non-

portable and depend on factors such as whether the compute facility

mandates multi-factor authentication (MFA).

In the context of REST interfaces to HPC resources, e

orts such

as the

Superfacility

concept (Enders et al., 2020) envisions a future

of automated experimental and observational science work

ows

linked to national computing resources through Web APIs. Imple-

mentations such as the NERSC Superfacility API

and the Swiss

National Supercomputing Centre’s FirecREST API (Cruz and Mar-

tinasso, 2019) expose methods to submit and monitor batch jobs,

move data between remote

lesystems, and check system availabil-

ity. However, these facility services alone do not address work

management or high-throughput execution; instead, they provide

web-interfaced abstractions of the underlying facility, analogous to

modern cloud storage and infrastructure services.

funcX (Chard et al., 2020) is an HPC-oriented instantiation of the

function-as-a-service (FaaS) paradigm, where users invoke Python

functions in remote containers via an API web service. funcX end-

points run on the login nodes of target HPC resources where Globus

Auth is used for authentication and user-endpoint association. The

funcX model is tightly focused on Python functions and therefore

advocates decomposing work

ows into functional building blocks.

Each function executes in a containerized worker process running

on one of the HPC compute nodes. This fails to support a general-

ized model of applications such as simulations with executables that

may or may not leverage containers, per-task remote data depen-

dencies, programmable error- and timeout-handling, and

exible

per-task resource requirements (e.g. tasks may occupy a single core

or specify a multi-node, distributed memory job with some number

of GPU accelerators per MPI rank).

Balsam (Salim et al., 2021) provides a uni

ed API for describ-

ing distributed work

ows, and the user site agents manage the

full lifecycle of data and control

ow. The Balsam service-oriented

architecture shifts administrative burdens away from individual

https://www.mongodb.com/

https://api.nersc.gov/api/v1.2/

researchers by routing all user, user agent, and pilot job client in-

teractions through a hosted, multi-tenant web service. As Balsam

execution sites communicate with the central service only as HTTP

clients, deployment involves a simple user-space

pip

package in-

stallation on any platform with a modern Python and outbound

internet access. Due to the ubiquity of HTTP data transport, Balsam

works “out of the box” on supercomputers spanning DOE and NSF

facilities against a cloud-hosted Balsam service.

5 INNOVATIONS REALIZED

Our innovations can be brie

y summarized as follows. (1) We imple-

ment an automated work

ow to seamlessly transition from AAMD

to FFEA via an AI-based approach. By optimizing the performance

of AAMD simulations (Sec. 5.1) we achieve optimal performance

on modern multi-GPU-based computing systems. (2) By clustering

the conformations from AAMD simulations using unsupervised

AI methods (Sec. 5.3), we learn the parameters for constraining

the FFEA simulations based on the implicit

uctuations between

protein-protein interfaces inferred from AAMD simulations. We de-

velop a novel AI-approach, namely Graph neural operators (GNO)

(Sec. 5.3.3) to summarize time-dependent conformational changes

observed from ensemble AAMD simulations. (3) To overcome the

bottleneck of training deep learning models on large volumes of

data generated by AAMD simulations, we scaled their performance

on emerging AI-hardware such as the Cerebras CS-2 accelerator as

well as on the Summit supercomputer.

(4), Finally, to our knowl-

edge, this is the

rst attempt at executing a single coordinated

work

ow across multiple supercomputing facilities. We achieved

this through the Balsam multisite work

ow manager (Sec. 5.4). The

performance gains enabled by these innovations provide a gen-

eralized, multi-scale computational toolkit for exploring dynamic

biomolecular machines.

5.1 NAMD all-atom MD simulations

5.1.1 GPU-resident NAMD 3 for GPU-accelerated MD simulations.

To achieve previously unrealized performance levels for MD simu-

lations on GPU-accelerated HPC platforms, we have developed a

major new “GPU-resident” computing approach, implemented in

NAMD 3. The GPU-resident computing mode in NAMD 3 employs

GPU acceleration for the most common molecular dynamics force

and energy calculations together with time integration and rigid

bond constraints, allowing sequences of timesteps to be simulated

entirely on the GPU without any signi

cant host–GPU data trans-

fers, thereby exploiting a signi

cantly higher fraction of theoretical

peak GPU performance (Phillips et al., 2020). Most recently, we have

further extended the GPU-resident capabilities of NAMD 3 to enable

strong-scaling of a simulation across multiple GPUs, with partic-

ular emphasis on hardware platforms that incorporate NVLink

connectivity among peer GPUs. NVLink facilitates direct load/store

memory access among peered GPUs with low latency and high

bandwidth data exchange performance levels far beyond what is

currently possible with a conventional HPC network interconnect,

or even host-internal PCIe transfers among GPUs.

The multi-GPU simulation algorithm generally mirrors the tech-

niques used in the conventional distributed-memory builds of NAMD,

https://www.olcf.ornl.gov/summit/

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

Trifan et al.

Post-RNA unwinding

RMSF (Å)

0.9

19.0

FFEA mesh

7EGQ

Pre-RNA unwinding

Figure 4: Root-mean squared

uctuations (RMSF) of the SARS-CoV-2 RTC provide insights are helpful in determining the in-

trinsic concerted motions in the RTC that are implicitly encoded in the experimental data. The structure of the RTC is painted

using the RMSF determined from the ensemble AAMD simulations labeled pre-RNA-unwinding and post-RNA-unwinding

states, the cryo-EM structure (PDB id: 7EGQ; where we use the experimental B-factors to approximate the RMSF) and FFEA

simulations. While the FFEA simulations show larger

uctuations at the interface of nsp10-14 as it interacts with the two nsp13

subunits, both the AAMD simulations capture larger

uctuations in the two nsp13 subunits across each monomer. While this

embodies slightly di

erent rearrangements between the FFEA and AAMD simulations, the relative

uctuations across these

subunits are high within the cryo-EM data indicating the dynamic nature of this assembly.

Table 1: Summary of AAMD simulations.

Simulation

Non-equillibrium Sampling Methods

GFLOPs/step

Total sampling (

`

Equilibration (pre-unwinding)

None

24.9

2.107

RNA unwinding

Extrabonds, Distance colvar, MDFF grid

25.9

0.004

RNA post-unwinding

Atom restraint, MDFF

11.04

1.904

RNA post-unwinding

None

11.04

3.916

except that it is designed to leverage the completely uni

ed vir-

tual memory address space of the GPUs to exchange data directly

through

ne-grained memory load/store operations. NVLink sup-

port for direct

ne-grained memory loads/stores in concert with

atomic increment and atomic compare-swap operations permit e

cient implementation of performance-critical multi-GPU reductions

and synchronization of GPUs within algorithm phases. Platforms

providing GPUs with a fully connected topology permit strong

scaling of molecular systems

⇠

1M atoms on up to 8 GPUs with

acceptable e

ciency, limited primarily by poor scalability of the

PME full electrostatics algorithm.

While running a simulation per GPU avoids these scaling chal-

lenges, multi-GPU scaling enables simulation over longer timescales.

For systems of

⇠

1M atoms, the timescale is typically a limiting

factor to study conformational changes that a protein undergoes,

essential in understanding how a complex functions. Performance

considerations dictate that any advanced sampling method imple-

mented for GPU-resident must avoid frequent data transfers to and

from the host. Many of these features are contributed by external

methodological contributors and have not yet been implemented

to run on GPUs. As such, NAMD 2 and CPU-based resources still

play an important role in the work

ow.

5.1.2 VMD full-time interactive ray tracing.

To facilitate creation of

high-

delity visualizations with high interactivity, VMD has been

extended with a new real-time ray tracing (RTRT) rendering mode

that uses full-time interactive progressive re

nement ray tracing

for rendering of the molecular scenes instead of OpenGL rasteriza-

tion and includes ambient occlusion lighting, shadows, high-quality

transparency, and depth of

eld focal blur. The new RTRT rendering

mode provides substantially more visually informative visualiza-

tions of complex molecular scenes as compared with conventional

OpenGL rasterization, by bringing o

ine-quality rendering fea-

tures to the interactive VMD display window for the

rst time.

When the RTRT rendering mode is active, advanced visualization

features that were formerly only usable for o

ine rendering, or

for static structures, can now be applied to simulation trajectories

while maintaining full interactivity.

5.1.3 RTC System Preparation for all-atom simulations with NAMD.

The SARS-CoV-2 RTC is a multi-subunit structure composed of sev-

eral nsps, including the RNA-dependent RNA polymerase (RdRp,

nsp12), Zinc-bound helicase (HEL, nsp13), the RNA-capping en-

zymes such as the nsp14 (N7-methyltransferase/MTase) and nsp16

(O2-MTase), the proofreading enzyme, namely nsp14, and uridylate-

speci

c endoribonuclease activity (NendoU, nsp15)(Fig. 3A) (Chen

et al., 2020, Perry et al., 2021, Yan et al., 2020, 2021a). Together they

participate in the process of reproducing the viral genome (Wu

et al., 2020). This complex plays an important role in the viral life

cycle, and is therefore an important drug target (Gao et al., 2020,

Wang et al., 2020). Revealing the atomistic details of this complex

is essential for understanding how the viral RNA is processed.

We prepared the structure of the Cryo-EM dimeric form of the

complex 7EGQ.pdb (Yan et al., 2021b). Cations have functional sig-

cance for the RTC, therefore the linking HIS and CYS residues

were mutated to HSE/HSD and CYM respectively, to have the cor-

rect ionization state to coordinate the present 3 Mg

2

ions and 26

2

ions. A complementary CYT base was added to the end of two

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

incomplete RNA strands. Missing loop residues ARG ALA ARG

corresponding to residue numbers 314-316 in the nsp13 subunits

were modeled using

tool in OpenMM (Eastman et al.,

2017). The protein subunits were capped with an NTER and CTER

patches. The

package from VMD (Visual Molecular Dy-

namics) (Humphrey et al., 1996b) was used to assemble the system,

which was then solvated using the

SOLVATE

plugin and neutral-

ized with K

and Cl

ions for a concentration of 150 mM KCl using

AUTOIONIZE

. The resulting systems consist of

⇠

1.1 M atoms and

box size of

⇠

168

⇥

220

⇥

312 Å

3

. The systems were energy mini-

mized using steepest descent method for 500 steps, followed by a

3 ns equilibration protocol restraining all but the water molecules

using a constant k = 3 kcal mol

1

2

. The system was further

equilibrated restraining only the protein backbone alpha carbon

atoms and nucleic backbone phosphate atoms with a constant of k

= 0.05 kcal mol

1

2

with NAMD2.14 (Phillips et al., 2005) NVT

ensemble using CHARMM36 force

elds for proteins and nucleic

acids (Vanommeslaeghe et al., 2010).

5.1.4 Grid-steered Molecular Dynamics Simulation of RNA Unwind-

ing.

One of the crucial functions of the RTC complex is to proofread

the RNA, which is a process that requires a major conformational

change of the protein complex and unzipping of the RNA. To mimic

this operation, we performed extensive preliminary studies using an

isolated RNA double-helical structure from the aforementioned pre-

pared structure in order to optimize the combination of grid-steered

molecular dynamics (GMD) (Wells et al., 2007) and collective vari-

ables (colvars) (Fiorin et al., 2013) and their respective optimal force

constants to mimic the corkscrew motion of the RNA unwinding. To

speed up these empirical studies, instead of the all-atom representa-

tion, the protein was represented by a molecular dynamics

exible

tting

￿

grid map exerting a repulsive

gridForce

(Trabuco et al.,

2009). A variety of

colvars

were employed to drive the extension

of the RNA structure, including

distanceZ

to pull on the strands to

the nsp14-10 active site and the exit point on the nsp13 helicase, as

proposed in previous studies (Yan et al., 2021b). We applied a rota-

tional force by using the

orientationAngle

colvar, while also pulling

on the center of mass using

distance

colvar. The

nal conformation

used for the simulations are a hybrid of the initial double-helical

structure of the RNA strands, with the extended single strands

which have reached their target points within the complex (Fig. 3B).

This structure was obtained by building an attractive 3 dimensional

electron density map manually constructed from the backbone of

the initial RNA position with the

nal extended strands obtained

from the non-equilibrium simulations.

The all-atom simulations to mimic this process were started from

the equilibrated structure described in section 5.1.3. We performed

an initial system minimization for 2000 steps and then simulated

the system for 0.05 ns to allow the RNA to settle into the grid. After

this initial equilibration, the process of unwinding was simulated

by pulling on the strands through this grid, which acts as a tunnel,

using the

distance colvars

, with a harmonic force. A soft restraining

force was applied on the backbone of the nucleic acids. To avoid

distortion of the double-helical shape of the RNA during pulling,

the hydrogen bonds formed by the bases were enforced by virtual

springs through the NAMD extraBonds feature, with a force of 20

kcal/mol/Å

2

5.2 FFEA simulations

5.2.1 Meshing and construction of the RTC dimer for FFEA.

Two

representative meshes were constructed which correspond to the

experimental data available; the cryo-EM density map from EMD-

31138, a dimer form of the biologically active RTC, and the atomistic

structure 7EGQ.pdb, which was used in the above NAMD atom-

istic molecular dynamics simulations. In the latter, the monomer

RTC complex was formed from two separate meshes, one from

the cryo-EM density map EMD-23008, corresponding to 7KRO.pdb,

and the other from 7EGQ.pdb for the nsp14-10 complex atomistic

information. These structures were used to generate 4 four inter-

acting meshes; two in each monomer (Fig. 2). Meshes were con-

structed by

rst removing the dust and smoothing the mask using

ChimeraX (Pettersen et al., 2021) . The non-manifold elements

within the surface were corrected with "MeshLab" (Cignoni et al.,

2008), and course-grained with "MeshMixer" (Sommer et al., 2017).

The 3D meshes were then generated by "Gmsh" (Geuzaine and

Remacle, 2009) and made compatible for FFEA simulations with

NetGen (Schöberl, 1997).

Following dust-removal and smoothing in ChimeraX, density for

the single-stranded RNA was clearly visible in EMD-23008. How-

ever, this extremely thin structure is likely to cause instabilities in

FFEA due to the small volumetric elements required to represent it.

Therefore, the surface elements corresponding to RNA density were

deleted, and the resulting surface repaired by

lling with additional

triangular elements in MeshMixer. The physical presence of the

single-stranded RNA was then represented in the FFEA simulations

using a harmonic spring.

5.2.2 Visualizing FFEA trajectories using VMD.

In order to create

informative visualizations that contain multiple simulation modal-

ities VMD was extended with new plugins to enable reading key

FFEA simulation data. The new plugins enable VMD to read FFEA

simulation and modeling data including tetrahedral mesh nodes,

edges, faces, and topology information, as well as information about

surface and interior faces and their properties, springs, and time-

varying mesh node coordinates. The plugin extensions permit VMD

to create visualizations showing the all-atom and FFEA simulations

aligned and superimposed, with physical properties associated with

the atoms, and FFEA nodes and faces available for use in assigning

surface colors for graphical representations of the simulations.

5.3 AI-enabled multi-resolution simulations

We developed a hierarchy of AI methods that would facilitate

embedding the high-dimensional (FFEA and AAMD) simulations

into a low-dimensional manifold capturing both local (e.g., protein

chain/subunit level) and global (e.g., mRTC monomer or dimer level)

uctuations. This required us to carefully consider the balance of

simulation workloads and AI tools. We discuss the AI methods

rst,

and then address the workload balance aspects in Sec. 5.4.

5.3.1 Capturing global conformational transitions with anharmonic

conformational analysis enabled autoencoders (ANCA-AE).

We have

previously developed a hybrid machine learning approach that com-

bines linear dimensionality reduction methods (Ramanathan et al.,

2011) with a non-linear autoencoder. By minimizing the higher

order (linear) correlations in the atomistic

uctuations and then

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint

Supercomputing ’21, November 14–19, 2021, St. Louis, MO

Trifan et al.

RMSD (Å)

Figure 5: AI methods learn to summarize the time-dependent conformational changes in the FFEA and AAMD simulations.

(A) The latent dimensional representation of the FFEA simulations summarized using the ANCA-AE method shows a clear

separation between low-RMSD (deeper blue shades) and high-RMSD (yellow) states. (B) t-SNE visualization of the latent man-

ifold learned using the graph neural operator (GNO) model on the entire RTC, painted by the system’s overall RMSD value.

This model was trained on the entire 7egq system, comprised of 6650 total residues, with a window size of 10 frames. To o

set

the computational cost in the vast size of system, the model was trained on a small subset of the trajectory containing 2000

frames in order to investigate the method’s viability. The GNO has articulated the time-dependent conformational changes at

the shorter timescales. (C) depicts the conformational changes between a low-RMSD and high-RMSD state learned from the

GNO, capturing the relative motions of the nsp13 domains across the dimer interface.

examining non-linear correlations, we can obtain a succinct de-

scription of how global conformational changes are embodied in a

simulation. This method, titled ANCA-AE (Clyde et al., 2021), can

ciently handle large dimensions (e.g., in the case of the mRTC,

the 6,650 C

-atoms lead to approximately 20,000 dimensions) and

can e

ciently run on CPUs. We have shown that ANCA-AE can

identify conformational states that share structural and energetic

similarities, while characterizing transitional points in the high

dimensional landscape. As shown in Fig. 5, we observed (for both

FFEA and AAMD, although we only show results from FFEA sim-

ulations) that ANCA-AE extracts biophysically meaningful latent

coordinates. The ANCA-AE method reveals conformational transi-

tions from the FFEA trajectories, capturing the

uctuations in the

interfaces between the nsp10-nsp14 complex and the rest of the

RTC subcomponents.

5.3.2 Describing localized conformational transitions with convolu-

tional variational autoencoders (CVAE).

We also leveraged a deep

learning algorithm, namely CVAE (Bhowmik et al., 2018) to de-

scribe localized conformational changes. The CVAE provides a

compact representation of the high dimensional conformational

landscape and can also capture intermediate/metastable states from

long timescale MD simulations (Bhowmik et al., 2018). It can also be

combined with outlier detection algorithms such as Density-Based

Spatial Clustering of Applications with Noise (DBSCAN) (Ester

et al., 1996) and the Local outlier factor (LOF) (Breunig et al., 2000)

method to identify time points corresponding to rare transition

events in the simulation trajectories. We note that both ANCA-AE

and CVAE can utilize outlier detection algorithms for this pur-

pose. Since we have previously discussed the CVAE model, we only

present scaling results for this model.

However, CVAE is quadratic in time and space complexity (Casalino

et al., 2021) and can be prohibitive to train for the entire mRTC. This

makes the CVAE ideal to examine protein sub-unit

uctuations,

which can be signi

cantly smaller and more tractable for training.

To optimize its performance, we implemented a distributed data

parallel version of CVAE on the Summit supercomputer as well as

a version within a single Cerebras CS-2 deep learning accelerator.

By keeping all compute and memory resources on-silicon, the CS-2

provides, in a single system, orders of magnitude more compute

power, on-chip memory, memory bandwidth, and communication

bandwidth than traditional small-scale processors. This makes it

particularly well-suited for accelerating ML models like the CVAE.

5.3.3 Graph neural operator (GNO) networks for characterizing

time-dependent conformational changes from AAMD simulations.

Additionally, we also implemented the graph neural operator (GNO)

as a fast surrogate model to simulate and predict the molecule

movements from the AAMD simulation. The applications of deep

learning for MD trajectory analysis, prediction and generation have

recently exploded (Hoseini et al., 2021). The GNO model (Li et al.,

2020) can solve partial di

erential equations (PDEs) by generaliz-

ing

nite-dimensional neural networks to the in

nite-dimensional

operator-learning setting. In this work, we apply the GNO for the

rst time to capture the time dependent conformational changes

governed by the system of equations in a traditional molecular

dynamics simulation. As a

rst step, we show that we can rea-

sonably predict protein backbone conformations up to 5 ps (250

MD simulation timesteps of 2 fs) into the future based on a 50 ps

trajectory. This allows us to extract the time-dependent conforma-

tional changes of the molecule and analyze the connection between

AAMD and FFEA simulation. This approach is capable of achieving

an even longer prediction horizon of 500 ps on smaller test systems,

and will be described in a separate manuscript. As shown in Fig. 5B,

the ability to predict the time-dependent conformational changes in

the entire RTC at short time-scales provides a means to accelerate

sampling. The structure of the latent space following the RMSD

transitions, provides insights into the large conformational changes

captured in 5C.

CC-BY-NC-ND 4.0 International license

available under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint

this version posted October 12, 2021.

;

https://doi.org/10.1101/2021.10.09.463779

doi:

bioRxiv preprint