jp2c05795_si

Supporting Information:

Exploring PROTAC cooperativity with

coarse-grained alchemical methods

Huanghao Mai,

∗

Matthew H. Zimmer, and Thomas F. Miller III

Division of Chemistry and Chemical Engineering, California Institute of Technology

E-mail: hmmai@caltech.edu

S-1

1 CGMD Forcefield

The complete potential energy function for a ternary complex is

(

;

) =

ENM

(

) +

ENM

(

) +

spring

(

) +

WCA

(

)

bind

(

;

) +

bind

(

;

) +

WCA

(

) +

WCA

(

)

WCA

(

) +

elec

(

;

) +

(

;

)

(1)

where

, and

indicate the coordinates of the E3 ligase, the target protein, and

the PROTAC respectively,

represent the charges of protein beads, and

are indicators of

whether protein beads are at the binding pocket or not. All PROTAC beads are modeled

with 0 charge and no attraction to the proteins. All parameters and variables are defined

using a length scale of the large bead (

= 0

8 nm) and an energy scale of

where

is the Boltzmann constant and

= 310 K.

1.1 Internal energy terms

Interactions within a protein are modeled by an elastic network model (ENM) such that

every pair of beads within distance

is connected by a harmonic spring:

ENM

(

) =

X

(

i,j

)

∈

spring

(∆

−

)

(2)

where

spring

is the spring constant,

is the optimal distance between

and

, and

{

(

i,j

)

< R

}

. The optimal distance between a pair of beads is its initial distance in

the experimental structure. Experimental structures used in this work include VHL (PDB:

5T35

chain D), BRD4

BD2

(PDB: 5T35

chain A), CRBN (PDB: 6BOY

chain B), and

BTK (PDB: 6W7O

chain A), and Schr ̈odinger Maestro

is used to fill in missing atoms

and perform energy minimization before building the CG ENM. Additional details on the

parameterization are described in a separate section below.

S-2

PROTAC is modeled as a linear molecule, where adjacent beads are connected by springs

(

spring

(

)) and non-adjacent beads are subjected to steric repulsions (

WCA

(

)).

1.2 Interaction energy terms

PROTAC-protein interactions consist of binding interactions modeled by springs between

a binding moiety bead in the PROTAC and all beads in the corresponding binding pocket

(

bind

(

;

) and

bind

(

;

) in eq.(1)) and steric repulsions (

WCA

(

) and

WCA

(

)) between the remaining parts of PROTAC and protein. Steric repulsions

in intra-PROTAC, PROTAC-protein, and inter-protein interactions are all modeled by the

Weeks-Chandler-Andersen (WCA) potential, a shifted and truncated version of Lennard-

Jones (LJ) potential.

Protein-protein interactions are captured by the steric repulsions (

WCA

(

)), and

depending on the modeling purpose, electrostatics (

elec

(

;

)) or nonspecific attrac-

tions (

(

;

)). The electrostatic interaction is modeled by a Debye-H ̈uckel (DH)

potential. The functional forms and parameterization of both potentials can be found in.

When reducing the screening of electrostatics between BRD4

BD2

and VHL, the Debye length

is multiplied by 10. The solvent in our system is treated implicitly. Nonspecific attractions

aimed at broadly including Van der Waals forces and hydrophobic interactions are modeled

by LJ potentials. The strength of the attraction is kept under that of electrostatic interac-

tions (Fig. S1). The well depth of LJ,

, is currently set to be the same for all pairs of

beads for nonspecific attraction. For future efforts, minor modifications to the formula

and

parameterization of

to depend on the Wimley-White hydrophobicity scale, for example,

can capture more sequence-specific interactions such as the hydrophobic effects.

1.3 Parameterization of ENM

ENM is a model that represents the tertiary structure of a protein by connecting every pair

of protein beads within a certain distance cutoff

by a Hookean spring of spring constant

S-3

Interaction potential

(

)

distance

(

)

Figure S1: The strengths of various interaction potentials are plotted over the distance

between protein beads. The two vertical dashed grey lines bound the distance between 1

and 2

. The electrostatic potentials (DH) are plotted for beads with +1 and +1 charges or

+1 and -1 charges.

spring

. Despite the simplicity of its parameterization, slow modes in ENM can capture bio-

logically significant conformational changes.

7,8

This structure-based model can also be used

in combination with other physics-driven forcefields to model macromolecular complexes.

Protein-protein associations and viral capsid assembly have both been successfully modeled

by using Elnedyn, an ENM at the resolution of 1 residue per bead,

on top of the MAR-

TINI CG forcefield. By fitting to atomistic simulations, Elnedyn preserves both structural

properties and dynamics within each protein subunit for the CG simulations.

We follow a similar protocol and fit our CG ENM parameters in eq.(2) to Elnedyn

simulations results. Three proteins – IKZF1

ZF2

(PDB: 6H0F

chain C), BRD4

BD1

(PDB:

6BOY

chain C), and CRBN (PDB: 6BOY

chain B) – are chosen for the fitting to represent

the range of protein sizes based on the publicly available crystal structures of PROTAC-

mediated ternary complexes. Elnedyn is supported as an option in the MARTINI 2 CG

forcefield,

and we use the default parameters to generate Elnedyn simulations of these

proteins with GROMACS version 5.0.7. Two equilibration stages were run, first at 1 fs

S-4

timestep for 50 ps, and then at 10 fs timestep for 1 ns. Then, only the dynamics stage

was used for fitting, which was run at 10 fs timestep for 40 ns.

Four metrics are used to

examine how well a particular combination of

spring

and

captures information in Elnedyn

simulations: the difference of time-averaged root-mean-squared-deviation (∆RMSD), bead-

averaged root-mean-squared-fluctuation (∆RMSF), Kullback–Leibler (KL) divergence of the

RMSD distributions, and the root-mean-squared inner product of the principal components

(RMSIP) of the trajectories.

Within a single metric, we usually observe a degeneracy within a certain region of

spring

and

values (Fig. S2), and this was also observed in Elnedyn fitting to atomistic simula-

tions.

This is because increasing either

spring

can increase the stiffness of a protein

and, therefore, can compensate for each other to some extent. Nevertheless, despite the de-

generacy, given the wide range of protein sizes, there is no single combination of

spring

and

values that works best for all three proteins. We chose

spring

= 100

ε/σ

and

= 2

as they are near the optimal degeneracy region under most metrics and consistent with the

values of Elnedyn parameters (

spring

= 124

ε/σ

and

= 1

125

). This combination

spring

and

was selected without a global optimization function that combines all four

metrics, and should be subjected to finer tuning if a specific system is of interest.

2 Analysis of alchemical free energy calculations

We perform various checks to address two common concerns in alchemical simulations: 1) are

there sufficient intermediate states along the alchemical reaction pathway, and 2) are there

sufficient samples from each state for accurate free energy calculations. The BTK-PROTAC

(10)-CRBN complex is used as an example for the analysis below.

We first validate that there are sufficient intermediate states for a converged estimation

of ∆

ternary(WCA)

. The convergence of free energy calculations depends on the overlap of

the phase space, i.e. the distribution of sampled conformations, between neighboring states.

S-5

∆RMSIP

∆RMSF

∆RMSD

KL_RMSD

1000

500

200

100

(

)

1.9

2.1

2.3

21.5

10.8

0.1

0.35

0.18

0. 28

0.14

0.87

0.74

/0'$%1

(

)

CRBN

(124)

BRD4BD1

(42)

IKZF

(11)

Figure S2: Fitting results of ENM parameters arranged by proteins (rows) and evaluation

metrics (columns). Numbers in parenthesis next to protein names are the number of CG

beads. For each plot, blue regions indicate

spring

and

values that result in good fitting, and

red regions indicate significant differences between our simulations and Elnedyn simulations.

Each column shares the same colorbar range. In general, the boxed regions around

spring

100

ε/σ

and

= 2

has good fitting.

S-6

Substantial overlap is achieved when the neighboring states are similar, which requires a fine

spacing of the coupling parameter values. In practice, distributions of quantities such as ∆

and

∂U/∂λ

that are directly involved in free energy estimations are often treated as proxies

for the high-dimensional phase space.

The similarity between distributions is quantified

by KL divergence, where 0 indicates identical distributions and

≫

1 suggests concerning

differences. Based on this metric, all neighboring states have substantial overlap, as the

Kullback–Leibler (KL) divergence values of ∆

and of

∂U/∂λ

distributions both stay below

1 (Fig. S3a).

Bennett’s overlapping histogram

provides another qualitative test for the overlap of ∆

distributions. The difference between

i

(∆

i

,λ

i

) =

i

(∆

i

,λ

i

) + (1

−

) ∆

i

,λ

i

and

i

(∆

i

,λ

i

) =

i

(∆

i

,λ

i

)

−

∆

i

,λ

i

is plotted over ∆

i

,λ

i

values, where

is an arbitrary constant between 0 and 1 and

i

(∆

i

,λ

i

) and

i

(∆

i

,λ

i

) are the

distributions of ∆

i

,λ

i

obtained by sampling from neighboring alchemical states

and

respectively. Continuous oscillations of

i

(∆

i

,λ

i

)

−

i

(∆

i

,λ

i

) around the

estimated ∆

i

,λ

i

over a range of ∆

i

,λ

i

values suggests good overlap (Fig. S3b).

For

states of higher

values, higher energetic penalty of steric repulsions prevents sampling

over a wide range of ∆

values, but the KL divergence and visualization of the distributions

(Fig. S3a,c) both indicate the quality of the overlap.

Next, we examine sampling within each state. For each state, a simulation needs to be

post-processed to discard the initial unequilibrated part and then subsampled to obtain de-

correlated data for accurate uncertainty quantification of the free energy estimation. Thus,

the length of the simulations is dictated by the equilibration time, autocorrelation time,

and the number of de-correlated samples needed for converged estimations. We examine the

values of ∆

∂U/∂λ

, and other collective variables over the simulation time, which typically

equilibrate after 0.9 s (Fig. S4a). To find out the decorrelation time, we discard the initial

0.9 s of simulations and plot the autocorrelation functions of these variables over different

time lags up to half of the simulation time to ensure that the autocorrelation is calculated

S-7

(a)

divergence

1.0

0.8

0.6

0.4

0.2

0.0

0.005

0.01

0.02

0.03

0.05

0.07

0.1

0.15

0.2

0.3

0.5

0.7

(b)

∆

$$'

(c)

∆

(

)

3.0

2.5

2.0

1.5

1.0

0.0

0.5

∆

(

)

0.5

1.5

2.5

1.5

0.5

0.0

0.5

1.0

2.5

2.0

1.0

∆

$$'

1.6

1.2

0.8

0.4

0.0

∆

(

)

Figure S3: Phase space overlap in calculating ∆

ternary(WCA)

for BTK-CRBN in Fig. 2.

(a)

Overlap of ∆

and

∂U/∂λ

distributions between adjacent states are quantified by the KL

divergence.

(b)

Example Bennett’s overlapping plots for

= 0

005 states (left) and

= 0

1 states (right). The grey bands represent ∆

i

,λ

i

1 std estimated using BAR.

(c)

Example distributions of ∆

i,i

are shown with Gaussian smoothing (red and blue solid

curves) for better visualization.

S-8

from a sufficient number of samples. The autocorrelation times all plummet to 0 before 0.63

s (Fig. S4b). Both equilibration time and decorrelation time are longer for simulations in

lower value of

states that retain more memory of previously sampled configurations due

to lower energetic costs. Currently, the equilibration and autocorrelation cutoffs depend on

each system. For convenience, we used the same cutoffs for all

states. In the future, this

can be customized for each state to maximize the number of samples, especially from states

of high

values that requires less equilibration and decorrelation time (Fig. S4b).

(a)

(b)

∆

3.0

2.5

2.0

1.5

1.0

0.5

0.0

∆

##&

from

state

0.0

1.0

2.0

3.0

4.0

5.0

6.0

Time

(s)

0.5

0.0

0.5

1.0

1.5

∆

(

from

state

Time

(s)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

Autocorrelation

1.0

0.8

0.6

0.4

0.2

0.0

0.2

Time

(s)

0.0

1.0

2.0

3.0

4.0

1.0

0.8

0.6

0.4

0.2

0.0

Time

(s)

0.0

1.0

2.0

3.0

4.0

Figure S4: Detecting equilibration and autocorrelation time in calculating ∆

ternary(WCA)

for BTK-CRBN in Fig. 2.

(a)

∆

i

,λ

i

over simulation time and

(b)

the autocorrelation

of ∆

i

,λ

i

from

= 0 (left) and

= 1 (right). The red curves and the shaded regions

represent the average value

1 standard deviation based on 64 independent trajectories.

The vertical dashed lines in this example mark 0.9 s in (a) and 0.63 s in (b). The horizontal

dotted lines in (b) mark the 0 autocorrelation value.

S-9