Single position substitution of hairpin pyrrole
-
imidazole polyamides
1
imparts distinct DNA
-
binding profiles across the human genome
2
Paul B. Finn
1
¶,#a
, Devesh Bhimsaria
2
¶
, Asfa Ali
3
, Asuka
Eguchi
4
,
Aseem Z. An
sari
5
,
Peter B. Dervan
1
*
,
3
1
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena,
4
C
alifornia
, U
nited States of America
5
2
Bio Informaticals, Jaipur, Rajasthan, India
6
3
Department of
Molecular Genetics, University of Texas Southwestern Medical Center, Dallas,
7
T
exas, United States of America
8
4
Department of Microbiology and Immunology, Stanford University, Stanford, C
alifornia, United
9
States of America
10
5
Department of Chemical Biology &
Therapeutics, St. Jude Children's Research Hospital
,
Memphis,
11
T
ennessee
, U
nited States of America
12
#a
Current Address:
Department of Bioengineering, Stanford University, Stanford,
C
alifornia
, U
nited
13
S
tates of America
14
15
*
Corresponding author
16
17
E
-
mail:
dervan@caltech.edu (PBD)
18
19
20
¶
T
he
se authors contributed equally to this work.
21
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted August 14, 2020.
.
https://doi.org/10.1101/2020.08.13.249730
doi:
bioRxiv preprint
2
ABSTRACT
22
Regulating desired loci in the genome with sequence
-
specific DNA
-
binding molecules
is a
23
major goal for the develop
ment of precision medicine. Pyrrole
–
imidazole (Py
–
Im) polyamides
24
are synthetic molecules that can be rationally designed to target specific DNA sequences to
25
both disrupt and recruit transcriptional machinery. While
in vitro
binding has been extensively
26
st
udied,
in vivo
effects are often difficult to predict using current models of
DNA
binding.
27
Determining the impact of genomic architecture and the local chromatin landscape on
28
polyamide
-
DNA sequence specificity remains an unresolved question that impedes th
eir
29
effective deployment
in vivo
. In this report we identified polyamide
–
DNA interaction sites
30
across the entire genome, by covalent
ly
crosslinking and capturing these events in the nuclei
31
of human LNCaP cells. This method, termed COSMIC
-
seq, confirms th
e ability of hairpin
-
32
polyamides, with similar architectures but differing at a single ring position, to retain
in vitro
33
specificities and display distinct genome
-
wide binding profiles. These results underpin the
34
development of Py
-
Im polyamides as DNA
-
targ
eting molecules
that mediate their regulatory
35
or remedial functions at desired genomic loci
.
36
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted August 14, 2020.
.
https://doi.org/10.1101/2020.08.13.249730
doi:
bioRxiv preprint
3
INTRODUCTION
37
Regulating genomic architecture and activity with sequence
-
specific synthetic DNA
38
binding molecules is a
long
-
standing
goal at the interface of chem
istry, biology and medicine.
39
Small molecules that selectively target desired genomic loci could be harnessed to regulate
40
critical gene networks. The greatest success in designing small molecules with programmable
41
DNA
-
binding specificity has been with pyr
role
-
imidazole (Py
-
Im) polyamides
[1
–
8]
.
Pyrrole
-
42
imidazole (Py
-
Im) polyamides are synthetic DNA
-
binding oligomers with high sequence
43
specificity and affinity
[7]
.
An oligomer, comprising a modular set of aromatic pyrrole and
44
imidazole amino acids linked in series by a central aliphatic
γ
-
aminobutyric acid (GABA) ‘turn’
45
uni
t, fold into a hairpin structure in the minor groove of DNA and afford binding affinities and
46
specificities comparable to natural transcription factors
[3,7]
.
Sequence specificity is
47
programmed through side
-
by
-
side pairs of the Py and Im subunits that “read” the steric and
48
hydrogen bonding patterns presented by the edges of the four Watson
-
Crick base pairs on
49
t
he floor of the minor gro
o
ve
[5]
.
DNase I footprinting titrations and
other
in vitro
methods
50
have extensively characterized the binding affinity and specificity of these molecules
[3,6,7,9]
.
51
An Im/Py pair
binds
G•C; Py/Im
binds
C•G, and Py/Py
pairs both bind
A•T and T•A (denoted
52
as W)
[1,2]
.
Py
-
Im polyamide binding in
the minor groove induces allosteric changes to DNA,
53
widening the minor groove and narrowing the major groove
[10
–
12]
.
Polyamide
-
DNA binding
54
is sufficient to disrupt protein
-
DNA interfaces, including
DNA interactions made by
55
transcription factor
s and the transcriptional machinery
[13
–
15]
.
Additionally, polyamides
can
56
function as sequence
-
specific synthetic cofactors through allosteric DNA modulation to
57
enhance the assembly of protein
-
DNA complexes
[12]
.
Py
-
Im polyamides are cell permeab
le,
58
localize to the nucleus in live cells and are non
-
genotoxic
[16
–
18]
failing to activate canonical
59
DNA damage response or significantly alter cell cycle distribution
[19]
.
60
The identification of new mechanistic insights into Py
-
Im polyamide activity have underlined
61
the importance of mapping polyamide binding to chromatin
[15,18,19]
.
Polyamide binding in
62
the more complex cellular environment presents a formidable challenge since
chromatin
DNA
63
has varying degrees of
accessibility.
Sequence specific a
ccess by Py
-
Im polyamides to
the
64
nucleosome core particle
(NCP)
has been demonstrated
in vitro
and
with
x
-
ray crystal
65
structures
of NCP
•
polyamide complexes
[20
–
22]
.
However, the extent to which chromatin
66
states influence polyamide binding to its cognate sites
remains
a
long
-
standing question. The
67
lack of clarity on the parameters that govern genome
-
wide binding of polyamides greatly
68
impedes the deployment of this class of molecules to regulate cell fate
-
defining and disease
-
69
causing gene networks
in vivo
.
70
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted August 14, 2020.
.
https://doi.org/10.1101/2020.08.13.249730
doi:
bioRxiv preprint
4
We report here
the genome
-
wide binding profiles of two Py
-
Im polyamides
1
and
2
, of
71
identical
architecture (8
-
ring hairpin) that differ at a single aromatic ring position in
cellular
72
nuclei
using COSMIC
-
seq (‘crosslinking of small molecules for isolation of chromatin wi
th next
-
73
generation sequencing),
Fig 1
[23,2
4]
.
COSMIC
-
seq employs
a tripartite
conjugate composed
74
of the DNA
-
binding ligand attached to a biotin affinity handle and a psoralen photocrosslinker.
75
Genome
-
wide binding of these tripartite molecules
is
captured by photo
-
induced crosslinking
76
follow
ed
by
biotin
-
enabled
enrichment and
unbiased NGS
sequencing
of the conjugated
77
genomic loci
[23,24]
. The ability to induce
rapid crosslinking at the desired time point
78
distinguishes COSMIC
-
seq from continuous and
uncontrolled alkylation
-
dependent DNA
79
conjugation
s that have been used to query genome
-
wide binding of polyamides
[25]
.
80
COSMIC
-
seq also differs from Chem
-
seq approaches that
use
ligands
for
protein complexes
81
that are
associate
d
with
the genome
[26]
.
Previously,
COSMIC
-
seq
was
utilized to access
82
genom
e
-
wide
binding of two structurally distinc
t Py
-
Im polyamides (hairpin vs linear) that code
83
for very different sequences
[24]
.
An
8
-
ring hairpin Py
-
Im polyamide (TpPyPyIm
-
γ
-
PyImPyPy
-
84
β
-
Dp) binds 6 bp of DNA (5’
-
WTWCGW
-
3’
)
[27]
,
whereas
a
linear polyamide (
ImPy
-
β
-
ImPy
-
85
β
-
Im
-
β
-
Dp
) binds
9
bp of
purine rich
DNA (5’
-
AAGAAGAAG
-
3’)
[28
–
31]
.
While such a dramatic
86
difference in target sequence composition
leads to distinct genome
-
wide binding profiles, we
87
wondered how a more
challenging
single position change (CH to N:) within one ring of an 8
-
88
ring hairpin w
ould affect genomic occupancy. In this study we applied COSMIC
-
seq to
89
determine if two polyamides of identical size and architecture, hairpins
1
and
2
which code for
90
6 base pair sites differing by
one
base pair position
5’
-
WGWWCW
-
3’ and 5’
-
WGGWCW
-
3’,
91
resp
ectively, can display distinct genomic binding occupancy
on chromatin
. These
92
experiments provide a
more stringent
test of genome
-
wide binding properties of hairpin
93
polyamides in a chromatin environment for application as precision
-
targeting molecules.
94
95
F
i
g
1.
Trifunctional Py
-
Im polyamide
conjugates
1 and 2.
(
A
) Chemical structure of hairpin
96
Py
-
Im polyamides
1
and
2
which differ by one atom, shown in red, and (
B
) the corresponding
97
predicted target sequences based on the pairing rules. Py
-
Im polyamide
1
targets the DNA
98
sequence 5’
-
WGWWCW
-
3’ and Py
-
Im polyamide
2
targets 5’
-
WGGWCW
-
3’. Open and filled
99
circles represent N
-
methylpyrrole (Py) and N
-
methylimidazole (Im), respectively. The N
-
100
acetylated (R)
-
γ
-
aminobutyric acid turn residue is shown as a semici
rcle, and psoralen and
101
biotin are denoted by P and B, respectively.
102
103
MATERIALS AND METHODS
104
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted August 14, 2020.
.
https://doi.org/10.1101/2020.08.13.249730
doi:
bioRxiv preprint
5
Materials
105
Chemicals and solvents were purchased from
standard chemical suppliers and used without
106
further
purification. (R)
-
2,4
-
Fmoc
-
D
ab
(Boc)
-
OH (
α
-
amino
-
GABA turn)
was
purchased from
107
Peptides International.
Monomers were synthesized as previous
ly
described
[32]
.
Kaise
r
108
oxime resin (100
-
200 mesh)
and benzotriazole
-
1
-
yl
-
oxy
-
trispyrrolidinophosphonium
109
h
exafluorophosphate (PyBOP) were purchased
from
Novabiochem.
2
-
Chlorotrityl chloride
110
resin was purchased from A
appt
ec.
Preparative HPLC
purification was performed on an
111
Agilent 1200 Series instrument eq
uipped with a Phenomenex Gemini
preparative column (250
112
x 21.2 mm, 5μm) with the mobile phase consis
tin
g of a gradient of acetonitrile
(CH
3
CN) in 0.1%
113
aqueous trifluoroacetic acid (TFA)
. Polyami
de concentrations were measured
by UV
/Vis
114
spectroscopy
in distilled and
deionized water (ddH
2
O)
with a molar extinction coefficient of
115
8650
M
-
1
cm
-
1
at
310 nm for
each
N
-
methylpyrrole (Py)
and
N
-
methylimidazole
(Im) and 11,
8
00
116
M
-
1
cm
-
1
for the psoralen/biotin derivative
3
[33,34]
.
Analytical HPLC analysis was conducted
117
on
a Beckman Gold
instrument equipped with a
Phe
nomenex Gemini analytical column (250
118
x 4.6 mm, 5μm)
,
a diode array
detector,
and the mobile phase consisting
of a gradient of
119
acetonitrile
in 0.1%
aqueous TFA
.
Matrix
-
assisted, LASER desor
ption/ionization time
-
of
-
flight
120
(MALDI
-
TOF) mass spectrometry was
performed on an
Autoflex MALDI TOF/TOF (Bruker)
121
using
α
-
cyano
-
4
-
hydroxycinnamic acid matrix.
Oligonucleotides were
purchased
f
rom
122
Integrated DNA Technologies
Inc.
All
sequencing samples were
processed as single read
123
(50 bp)
sequencing runs at the Califor
nia Institute of Technology Millard and Muriel Jacobs
124
Genetics and Genomics Laboratory on an I
llumina HiSeq 25
00 Genome Analyzer.
125
Chemical
s
ynthesis
126
Polyamides
1A
and
2A
were synthesized on solid support
(Kaiser oxime resin, 100
-
200 mesh),
127
using microwave
-
assisted PyBOP coupling conditions with
N
-
methylpyrrole (Py),
N
-
128
methylimidazole (Im) amino acid monomers and dimers (
5a
&
5
b
)
as previously described,
129
S1
A
Fig
[34]
.
P
olyamide
s were
cleaved from
r
esin with
neat
3,3′
-
di
amino
-
N
-
130
methyldipropylamine (60 °C, 5 min,
μ
W),
precipitated with diethyl ether
at
-
20 °C
,
re
-
dissolved
131
in
20
-
30% (v/v
)
CH
3
CN
/
H
2
O
(
0.1% TFA
)
, and purified by reverse
-
phase preparative HPLC.
132
Fracti
ons that showed clean polyamide without contaminants were frozen in liquid nitrogen
133
and lyophilized
to dryness as a white
-
yellow solid. The i
dentity and purity were confirmed by
134
MALDI
-
TOF mass spectrometry and analytical HPLC
. The observed mass for
1A
135
(
C
59
H
75
N
22
O
10
) is 1251.78 (calculated 1251.60) and for
2A
(
C
58
H
74
N
23
O
10
) is 1252.75
136
(calculated 1252.60).
137
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted August 14, 2020.
.
https://doi.org/10.1101/2020.08.13.249730
doi:
bioRxiv preprint