Intelligent Resolution: Integrating Cryo-EM with AI-driven
Multi-resolution Simulations to Observe the SARS-CoV-2
Replication-Transcription Machinery in Action
Anda Trifan
1
,
2
†
, Defne Gorgun
1
,
2
†
, Zongyi Li
3
†
, Alexander Brace
1
,
4
†
, Maxim Zvyagin
1
†
, Heng Ma
1
†
,
Austin Clyde
1
,
4
, David Clark
5
, Michael Salim
1
, David J. Hardy
2
, Tom Burnley
6
, Lei Huang
7
, John
McCalpin
7
, Murali Emani
1
, Hyenseung Yoo
1
, Junqi Yin
8
, Aristeidis Tsaris
8
, Vishal Subbiah
9
,
Tanveer Raza
9
, Jessica Liu
9
, Noah Trebesch
2
, Geo
rey Wells
10
, Venkatesh Mysore
5
, Thomas
Gibbs
5
, James Phillips
1
, S. Chakra Chennubhotla
11
, Ian Foster
1
,
4
, Rick Stevens
1
,
4
, Anima
Anandkumar
3
,
5
⇤
, Venkatram Vishwanath
1
⇤
, John E. Stone
2
⇤
, Emad Tajkhorshid
2
⇤
, Sarah A.
Harris
12
⇤
, Arvind Ramanathan
1
⇤
1
Argonne National Laboratory,
2
University of Illinois Urbana-Champaign,
3
California Institute of Technology,
4
University of Chicago,
5
NVIDIA,
6
Science and Technology Facilities Council,
7
Texas Advanced Computing Center,
8
Oak Ridge National Laboratory,
9
Cerebras Inc.,
10
University College of London,
11
University of Pittsburgh,
12
University of Leeds
†
Joint
rst authors,
⇤
Contact authors: s.a.harris@leeds.ac.uk, anima@caltech.edu, venkat@anl.gov, johns@ks.uiuc.edu,
emad@illinois.edu, ramanathana@anl.gov
ABSTRACT
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)
replication transcription complex (RTC) is a multi-domain protein
responsible for replicating and transcribing the viral mRNA inside
a human cell. Attacking RTC function with pharmaceutical com-
pounds is a pathway to treating COVID-19. Conventional tools,
e.g., cryo-electron microscopy and all-atom molecular dynamics
(AAMD), do not provide su
ciently high resolution or timescale
to capture important dynamics of this molecular machine. Con-
sequently, we develop an innovative work
ow that bridges the
gap between these resolutions, using mesoscale
uctuating
nite
element analysis (FFEA) continuum simulations and a hierarchy
of AI-methods that continually learn and infer features for main-
taining consistency between AAMD and FFEA simulations. We
leverage a multi-site distributed work
ow manager to orchestrate
AI, FFEA, and AAMD jobs, providing optimal resource utilization
across HPC centers. Our study provides unprecedented access to
study the SARS-CoV-2 RTC machinery, while providing general
capability for AI-enabled multi-resolution simulations at scale.
KEYWORDS
multi-resolution simulations, SARS-CoV-2, COVID19, HPC, AI
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for pro
t or commercial advantage and that copies bear this notice and the full citation
on the
rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior speci
c permission and/or a
fee. Request permissions from permissions@acm.org.
Supercomputing ’21, November 14–19, 2021, St. Louis, MO
©
2020 Association for Computing Machinery.
ACM ISBN ISBN...$15.00
https://doi.org/
nalDOI
ACM Reference Format:
Anda Trifan
1
,
2
†
, Defne Gorgun
1
,
2
†
, Zongyi Li
3
†
, Alexander Brace
1
,
4
†
, Maxim
Zvyagin
1
†
, Heng Ma
1
†
, Austin Clyde
1
,
4
, David Clark
5
, Michael Salim
1
, David
J. Hardy
2
, Tom Burnley
6
, Lei Huang
7
, John McCalpin
7
, Murali Emani
1
,
Hyenseung Yoo
1
, Junqi Yin
8
, Aristeidis Tsaris
8
, Vishal Subbiah
9
, Tanveer
Raza
9
, Jessica Liu
9
, Noah Trebesch
2
, Geo
rey Wells
10
, Venkatesh Mysore
5
,
Thomas Gibbs
5
, James Phillips
1
, S. Chakra Chennubhotla
11
, Ian Foster
1
,
4
,
Rick Stevens
1
,
4
, Anima Anandkumar
3
,
5
⇤
, Venkatram Vishwanath
1
⇤
, John
E. Stone
2
⇤
, Emad Tajkhorshid
2
⇤
, Sarah A. Harris
12
⇤
, Arvind Ramanathan
1
⇤
.
2020. Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-
resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription
Machinery in Action. In
Supercomputing ’21: International Conference for
High Performance Computing, Networking, Storage, and Analysis.
ACM, New
York, NY, USA, 14 pages. https://doi.org/
nalDOI
1 JUSTIFICATION
We developed an AI-enabled multi-resolution simulation frame-
work for studying complex biomolecular machines by directly inte-
grating experimental data. Our framework sets high-water marks
for AI-driven multi-resolution simulations and achieving high uti-
lization of resources across diverse supercomputing platforms at
multiple sites.
2 PERFORMANCE ATTRIBUTES
Performance Attribute
Our Submission
Category of achievement
Scalability, Time-to-solution
Type of method used
Explicit, Deep Learning
Results reported on the basis of Whole application including I/O
Precision reported
Mixed Precision
System scale
Measured on full system
Measurement mechanism
Hardware performance counters,
Application timers,
Performance Modeling
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted October 12, 2021.
;
https://doi.org/10.1101/2021.10.09.463779
doi:
bioRxiv preprint
Supercomputing ’21, November 14–19, 2021, St. Louis, MO
Trifan et al.
Ensemble Continuum
Simulations (FFEA)
All-atom Ensemble MD Simulations
(NAMD)
Cryo-EM maps
Hierarchical AI Methods for
computational steering
Initial best
guess
Global
conformation
al
fl
uctuations
Atomistic (re
fi
ned)
Cryo-EM models
•
Atomistically
correct sub-
domain
orientations
•
Interface
potentials
Novel conformational
states
Localized
fl
uctuations
1
2
3
4
5
6
Protein Data
Bank
Individual protein
“parts”
Figure 1: An integrative biology framework for re
ning
low resolution cryo-EM structures with multi-resolution
simulations. (1) Representing the cryo-EM density map as
a continuum visco-elastic solid. (2) Finite element analy-
sis simulations are then used to generate new conforma-
tions. AI techniques identify interesting events in the land-
scape (global conformational changes), while (3) simultane-
ously constraining them with all-atom simulations derived
protein-protein interface potentials. (4) AI methods are also
used to learn local conformational changes across the molec-
ular machine, such that they can be used to (5) re
ne do-
main orientations in the entire biomolecular complex. (6)
The output represents a set of atomistically re
ned ensem-
ble of structures that captures the conformational
uctua-
tions embodied in the cryo-EM data.
3 OVERVIEW OF THE PROBLEM
The novel coronavirus 2019 (COVID-19) pandemic has led to a
massive acceleration in the pace of development in experimental
structural biology. Multiple research teams have invested their
best resources towards the common goal of characterizing viral
components and their biological functions, thereby providing a
sign-post for the future direction of the
eld (Alam and Higgins,
2020, Bárcena et al., 2021, Barrantes, 2021, Kim and Jung, 2021).
Structural biology data, which provides the basis for rational design
of all new medicines against human and animal disease, is now
inherently multi-modal and multi-scale, requiring novel integrative
approaches (AlQuraishi, 2019, Arantes et al., 2020, Jumper et al.,
2021, Minkyung et al., 2021, Muratov et al., 2021, Padhi et al., 2021,
Tunyasuvunakool et al., 2021, Zimmerman et al., 2020).
While the mechanism of human host cell entry and infection
by SARS-CoV-2 via the spike glycoprotein is now relatively well
characterized (Barros et al., 2020, Shang et al., 2020, Sztain et al.,
2021, Zhang et al., 2021), how the SARS-CoV-2 replicates inside the
host cell is still unclear (Romano et al., 2020). The viral-RNA replica-
tion mechanism is complex, involving RNA synthesis, proofreading,
and capping and is mainly carried out by the mini-replication tran-
scription complex plus error correction machinery (mRTC+ECM
from here on referred to as RTC), having to survive against the
human immune response. Cryo-EM techniques and computational
methods have been immensely helpful in elucidating the overall
structural organization of the RTC (Chen et al., 2020, Perry et al.,
2021, Yan et al., 2020, 2021a), but the high intrinsic
exibility, size
and complexity of the nsp arrangement entails that the overall res-
olution of the data is inherently poor. Consequently, the structure
re
nement work
ows discard 30-40% of collected images from the
existing RTC complex datasets (Chen et al., 2020, Yan et al., 2020,
2021a), leaving signi
cant gaps in our understanding.
Although several studies have focused on disrupting the func-
tion of the individual non-structural proteins (nsps) with small
molecules, key insights into the overall structural organization,
dynamics and function of the RTC machinery are more di
cult to
obtain. This is crucial, because the ability to target protein-protein
interactions between subunits of the RTC complex o
ers far more
possibilities for drug development. Moreover, the arrangement of
individual protein components of the RTC+ECM protein is itself
dynamic during the viral life-cycle. The computational capability to
model these interactions would provide further insight of relevance
to drug development, but is currently impossible without novel
multi-scale models and the work
ows that connect them.
The primary challenge of experimental imaging is elucidating
diverse structural dynamics. This stems from the
averaging
process
of the imaging data: cryo-EM, in particular, and other experimental
techniques capture only the most sampled conformational states
as static, snapshot-like representations, but the intermediates or
transitional states are less represented (Lyumkis, 2019, Merk et al.,
2016). The details of motions within
exible domains can be en-
riched using complimentary tools such as molecular dynamics (MD)
simulations and Bayesian inference techniques (Bowerman et al.,
2017, Bratholm et al., 2015, Cavalli et al., 2007, Grishaev and Llinás,
2005, Scheres, 2012); however, the timescales accessible to these
atomistic simulations can be a limiting factor. In addition, advances
in 4D imaging modalities (Earnest et al., 2017, Engel et al., 2015,
Mahamid et al., 2016, Villa and Lasker, 2014) and the volume of data
generated from such experimental datasets can be overwhelming.
Therefore, in this paper we address the urgent, yet unmet need to
develop scalable computational tools that can aid the improvement
of resolution within cryo-EM datasets through multi-resolution
simulations. In an e
ort to bridge the gap between experimental
and purely all-atom molecular dynamics (AAMD), we leverage
a complementary mesoscale method of representing biophysical
systems, treating biomolecules as visco-elastic continuum solids
using
uctuating
nite element analysis (FFEA) (Oliver et al., 2013)
technique. These continuum-scale lower resolution FFEA simula-
tions provide a generative model for the cryo-EM data. However,
implementing an approach that directly models electron density
information from cryo-EM data requires a radically di
erent way
to model conformational ensembles, one that moves away from
atomistic-resolution towards a
continuum
-representation, where
by the intrinsic resolution of the data can be captured with nodes
.
CC-BY-NC-ND 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted October 12, 2021.
;
https://doi.org/10.1101/2021.10.09.463779
doi:
bioRxiv preprint
Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action
Supercomputing ’21, November 14–19, 2021, St. Louis, MO
90°
Van der Waals interaction energies
Figure 2: Hybrid structure of the FFEA mesh superimposed with the all-atom representation. The all-atom structure of the
RTC dimer is shown as a cartoon (blue) and the FFEA tetrahedral mesh structure determined from the experimental cryo-EM
map is shown as a wireframe. The top inset represents a 90