of 13
Supporting Information:
Exploring PROTAC cooperativity with
coarse-grained alchemical methods
Huanghao Mai,
Matthew H. Zimmer, and Thomas F. Miller III
Division of Chemistry and Chemical Engineering, California Institute of Technology
E-mail: hmmai@caltech.edu
S-1
1 CGMD Forcefield
The complete potential energy function for a ternary complex is
U
(
x
;
b
,
q
) =
U
ENM
(
x
E
) +
U
ENM
(
x
T
) +
U
spring
(
x
P
) +
U
WCA
(
x
P
)
+
U
bind
(
x
P
,
x
T
;
b
) +
U
bind
(
x
P
,
x
E
;
b
) +
U
WCA
(
x
P
,
x
T
) +
U
WCA
(
x
P
,
x
E
)
+
U
WCA
(
x
E
,
x
T
) +
U
elec
(
x
E
,
x
T
;
q
) +
U
LJ
(
x
E
,
x
T
;
ε
LJ
)
(1)
where
x
E
,
x
T
, and
x
P
indicate the coordinates of the E3 ligase, the target protein, and
the PROTAC respectively,
q
represent the charges of protein beads, and
b
are indicators of
whether protein beads are at the binding pocket or not. All PROTAC beads are modeled
with 0 charge and no attraction to the proteins. All parameters and variables are defined
using a length scale of the large bead (
σ
= 0
.
8 nm) and an energy scale of
ε
=
kT
where
k
is the Boltzmann constant and
T
= 310 K.
1.1 Internal energy terms
Interactions within a protein are modeled by an elastic network model (ENM) such that
every pair of beads within distance
R
c
is connected by a harmonic spring:
U
ENM
(
x
) =
X
(
i,j
)
D
k
spring
(∆
x
ij
d
ij
)
2
(2)
where
k
spring
is the spring constant,
d
ij
is the optimal distance between
x
i
and
x
j
, and
D
=
{
(
i,j
)
|
d
ij
< R
c
}
. The optimal distance between a pair of beads is its initial distance in
the experimental structure. Experimental structures used in this work include VHL (PDB:
5T35
1
chain D), BRD4
BD2
(PDB: 5T35
1
chain A), CRBN (PDB: 6BOY
2
chain B), and
BTK (PDB: 6W7O
3
chain A), and Schr ̈odinger Maestro
4
is used to fill in missing atoms
and perform energy minimization before building the CG ENM. Additional details on the
parameterization are described in a separate section below.
S-2
PROTAC is modeled as a linear molecule, where adjacent beads are connected by springs
(
U
spring
(
x
P
)) and non-adjacent beads are subjected to steric repulsions (
U
WCA
(
x
P
)).
1.2 Interaction energy terms
PROTAC-protein interactions consist of binding interactions modeled by springs between
a binding moiety bead in the PROTAC and all beads in the corresponding binding pocket
(
U
bind
(
x
P
,
x
T
;
b
) and
U
bind
(
x
P
,
x
E
;
b
) in eq.(1)) and steric repulsions (
U
WCA
(
x
P
,
x
T
) and
U
WCA
(
x
P
,
x
E
)) between the remaining parts of PROTAC and protein. Steric repulsions
in intra-PROTAC, PROTAC-protein, and inter-protein interactions are all modeled by the
Weeks-Chandler-Andersen (WCA) potential, a shifted and truncated version of Lennard-
Jones (LJ) potential.
Protein-protein interactions are captured by the steric repulsions (
U
WCA
(
x
E
,
x
T
)), and
depending on the modeling purpose, electrostatics (
U
elec
(
x
E
,
x
T
;
q
)) or nonspecific attrac-
tions (
U
LJ
(
x
E
,
x
T
;
ε
LJ
)). The electrostatic interaction is modeled by a Debye-H ̈uckel (DH)
potential. The functional forms and parameterization of both potentials can be found in.
5
When reducing the screening of electrostatics between BRD4
BD2
and VHL, the Debye length
κ
is multiplied by 10. The solvent in our system is treated implicitly. Nonspecific attractions
aimed at broadly including Van der Waals forces and hydrophobic interactions are modeled
by LJ potentials. The strength of the attraction is kept under that of electrostatic interac-
tions (Fig. S1). The well depth of LJ,
ε
LJ
, is currently set to be the same for all pairs of
beads for nonspecific attraction. For future efforts, minor modifications to the formula
6
and
parameterization of
ε
LJ
to depend on the Wimley-White hydrophobicity scale, for example,
can capture more sequence-specific interactions such as the hydrophobic effects.
1.3 Parameterization of ENM
ENM is a model that represents the tertiary structure of a protein by connecting every pair
of protein beads within a certain distance cutoff
R
c
by a Hookean spring of spring constant
S-3
Interaction potential
(
"#
)
distance
(
%
)
Figure S1: The strengths of various interaction potentials are plotted over the distance
between protein beads. The two vertical dashed grey lines bound the distance between 1
and 2
σ
. The electrostatic potentials (DH) are plotted for beads with +1 and +1 charges or
+1 and -1 charges.
k
spring
. Despite the simplicity of its parameterization, slow modes in ENM can capture bio-
logically significant conformational changes.
7,8
This structure-based model can also be used
in combination with other physics-driven forcefields to model macromolecular complexes.
Protein-protein associations and viral capsid assembly have both been successfully modeled
by using Elnedyn, an ENM at the resolution of 1 residue per bead,
9
on top of the MAR-
TINI CG forcefield. By fitting to atomistic simulations, Elnedyn preserves both structural
properties and dynamics within each protein subunit for the CG simulations.
We follow a similar protocol and fit our CG ENM parameters in eq.(2) to Elnedyn
simulations results. Three proteins – IKZF1
ZF2
(PDB: 6H0F
10
chain C), BRD4
BD1
(PDB:
6BOY
2
chain C), and CRBN (PDB: 6BOY
2
chain B) – are chosen for the fitting to represent
the range of protein sizes based on the publicly available crystal structures of PROTAC-
mediated ternary complexes. Elnedyn is supported as an option in the MARTINI 2 CG
forcefield,
9
and we use the default parameters to generate Elnedyn simulations of these
proteins with GROMACS version 5.0.7. Two equilibration stages were run, first at 1 fs
S-4
timestep for 50 ps, and then at 10 fs timestep for 1 ns. Then, only the dynamics stage
was used for fitting, which was run at 10 fs timestep for 40 ns.
Four metrics are used to
examine how well a particular combination of
k
spring
and
R
c
captures information in Elnedyn
simulations: the difference of time-averaged root-mean-squared-deviation (∆RMSD), bead-
averaged root-mean-squared-fluctuation (∆RMSF), Kullback–Leibler (KL) divergence of the
RMSD distributions, and the root-mean-squared inner product of the principal components
(RMSIP) of the trajectories.
Within a single metric, we usually observe a degeneracy within a certain region of
k
spring
and
R
c
values (Fig. S2), and this was also observed in Elnedyn fitting to atomistic simula-
tions.
9
This is because increasing either
k
spring
or
R
c
can increase the stiffness of a protein
and, therefore, can compensate for each other to some extent. Nevertheless, despite the de-
generacy, given the wide range of protein sizes, there is no single combination of
k
spring
and
R
c
values that works best for all three proteins. We chose
k
spring
= 100
ε/σ
2
and
R
c
= 2
.
0
σ
as they are near the optimal degeneracy region under most metrics and consistent with the
values of Elnedyn parameters (
k
spring
= 124
.
25
ε/σ
2
and
R
c
= 1
.
125
σ
). This combination
of
k
spring
and
R
c
was selected without a global optimization function that combines all four
metrics, and should be subjected to finer tuning if a specific system is of interest.
2 Analysis of alchemical free energy calculations
We perform various checks to address two common concerns in alchemical simulations: 1) are
there sufficient intermediate states along the alchemical reaction pathway, and 2) are there
sufficient samples from each state for accurate free energy calculations. The BTK-PROTAC
(10)-CRBN complex is used as an example for the analysis below.
We first validate that there are sufficient intermediate states for a converged estimation
of ∆
G
ternary(WCA)
. The convergence of free energy calculations depends on the overlap of
the phase space, i.e. the distribution of sampled conformations, between neighboring states.
S-5
∆RMSIP
∆RMSF
∆RMSD
KL_RMSD
1000
500
200
100
50
20
#
.
(
%
)
1.9
2.1
2.3
21.5
10.8
0.1
0.35
0.18
0
0. 28
0.14
0
0.87
0.74
0
'
/0'$%1
(
(
/
%
2
)
CRBN
(124)
BRD4BD1
(42)
IKZF
(11)
Figure S2: Fitting results of ENM parameters arranged by proteins (rows) and evaluation
metrics (columns). Numbers in parenthesis next to protein names are the number of CG
beads. For each plot, blue regions indicate
k
spring
and
R
c
values that result in good fitting, and
red regions indicate significant differences between our simulations and Elnedyn simulations.
Each column shares the same colorbar range. In general, the boxed regions around
k
spring
=
100
ε/σ
2
and
R
c
= 2
.
0
σ
has good fitting.
S-6
Substantial overlap is achieved when the neighboring states are similar, which requires a fine
spacing of the coupling parameter values. In practice, distributions of quantities such as ∆
U
and
∂U/∂λ
that are directly involved in free energy estimations are often treated as proxies
for the high-dimensional phase space.
11
The similarity between distributions is quantified
by KL divergence, where 0 indicates identical distributions and
1 suggests concerning
differences. Based on this metric, all neighboring states have substantial overlap, as the
Kullback–Leibler (KL) divergence values of ∆
U
and of
∂U/∂λ
distributions both stay below
1 (Fig. S3a).
Bennett’s overlapping histogram
12
provides another qualitative test for the overlap of ∆
U
distributions. The difference between
g
λ
i
+1
(∆
U
λ
i
i
+1
) =
P
λ
i
(∆
U
λ
i
i
+1
) + (1
C
) ∆
U
λ
i
i
+1
and
g
λ
i
(∆
U
λ
i
i
+1
) =
P
λ
i
+1
(∆
U
λ
i
i
+1
)
C
U
λ
i
i
+1
is plotted over ∆
U
λ
i
i
+1
values, where
C
is an arbitrary constant between 0 and 1 and
P
λ
i
(∆
U
λ
i
i
+1
) and
P
λ
i
(∆
U
λ
i
i
+1
) are the
distributions of ∆
U
λ
i
i
+1
obtained by sampling from neighboring alchemical states
λ
i
and
λ
i
+1
respectively. Continuous oscillations of
g
λ
i
+1
(∆
U
λ
i
i
+1
)
g
λ
i
(∆
U
λ
i
i
+1
) around the
estimated ∆
G
λ
i
i
+1
over a range of ∆
U
λ
i
i
+1
values suggests good overlap (Fig. S3b).
13
For
states of higher
λ
LJ
values, higher energetic penalty of steric repulsions prevents sampling
over a wide range of ∆
U
values, but the KL divergence and visualization of the distributions
(Fig. S3a,c) both indicate the quality of the overlap.
Next, we examine sampling within each state. For each state, a simulation needs to be
post-processed to discard the initial unequilibrated part and then subsampled to obtain de-
correlated data for accurate uncertainty quantification of the free energy estimation. Thus,
the length of the simulations is dictated by the equilibration time, autocorrelation time,
and the number of de-correlated samples needed for converged estimations. We examine the
values of ∆
U
,
∂U/∂λ
, and other collective variables over the simulation time, which typically
equilibrate after 0.9 s (Fig. S4a). To find out the decorrelation time, we discard the initial
0.9 s of simulations and plot the autocorrelation functions of these variables over different
time lags up to half of the simulation time to ensure that the autocorrelation is calculated
S-7
(a)
KL
divergence
1.0
0.8
0.6
0.4
0.2
0.0
!
!"
0
0.005
0.01
0.02
0.03
0.05
0.07
0.1
0.15
0.2
0.3
0.5
0.7
1
(b)
,
$
,
$
.
$$'
)*
(c)
-
(
)*
)
0
1
3
2
3.0
2.5
2.0
1.5
1.0
0.0
0.5
,
$
.
(
,
)
)*
0.5
1.5
2.5
1.5
0.5
0.0
-
0.5
-
1.0
2.5
2.0
1.0
,
$
,
$
.
$$'
)*
1.6
1.2
0.8
0.4
0.0
0
1
3
2
4
,
$
.
(
,
)
)*
0
2
6
4
8
7
6
4
2
0
3
1
5
Figure S3: Phase space overlap in calculating ∆
G
ternary(WCA)
for BTK-CRBN in Fig. 2.
(a)
Overlap of ∆
U
and
∂U/∂λ
distributions between adjacent states are quantified by the KL
divergence.
(b)
Example Bennett’s overlapping plots for
λ
LJ
= 0
,
0
.
005 states (left) and
λ
LJ
= 0
.
7
,
1 states (right). The grey bands represent ∆
G
λ
i
i
+1
±
1 std estimated using BAR.
(c)
Example distributions of ∆
U
i,i
+1
are shown with Gaussian smoothing (red and blue solid
curves) for better visualization.
S-8
from a sufficient number of samples. The autocorrelation times all plummet to 0 before 0.63
s (Fig. S4b). Both equilibration time and decorrelation time are longer for simulations in
lower value of
λ
LJ
states that retain more memory of previously sampled configurations due
to lower energetic costs. Currently, the equilibration and autocorrelation cutoffs depend on
each system. For convenience, we used the same cutoffs for all
λ
states. In the future, this
can be customized for each state to maximize the number of samples, especially from states
of high
λ
values that requires less equilibration and decorrelation time (Fig. S4b).
(a)
(b)
#
$%
3.0
2.5
2.0
1.5
1.0
0.5
0.0
#
#
,
#
.
##&
from
state
0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
Time
(s)
0.5
0.0
-
0.5
-
1.0
-
1.5
#
#
.
'
,
(
from
state
1
Time
(s)
0.0
1.0
2.0
3.0
4.0
5.0
6.0
Autocorrelation
1.0
0.8
0.6
0.4
0.2
0.0
-
0.2
Time
(s)
0.0
1.0
2.0
3.0
4.0
1.0
0.8
0.6
0.4
0.2
0.0
Time
(s)
0.0
1.0
2.0
3.0
4.0
Figure S4: Detecting equilibration and autocorrelation time in calculating ∆
G
ternary(WCA)
for BTK-CRBN in Fig. 2.
(a)
U
λ
i
i
+1
over simulation time and
(b)
the autocorrelation
of ∆
U
λ
i
i
+1
from
λ
LJ
= 0 (left) and
λ
LJ
= 1 (right). The red curves and the shaded regions
represent the average value
±
1 standard deviation based on 64 independent trajectories.
The vertical dashed lines in this example mark 0.9 s in (a) and 0.63 s in (b). The horizontal
dotted lines in (b) mark the 0 autocorrelation value.
S-9