RESEA
RCH
ARTICL
E
Single
position
substitution
of
hairpin
pyrrole-
imidazole
polyamides
imparts
distinct
DNA-
binding
profiles
across
the
human
genome
Paul
B.
Finn
ID
1
☯
¤
*
, Devesh
Bhimsaria
ID
2
☯
, Asfa
Ali
3
, Asuka
Eguchi
ID
4
, Aseem
Z. Ansari
5
,
Peter
B.
Dervan
ID
1
*
1
Division
of Chemistry
and
Chem
ical
Engineeri
ng,
California
Institute
of Technolo
gy,
Pasadena,
Californi
a,
United
States
of America,
2
Bio
Informatic
als,
Jaipur,
Rajasth
an,
India,
3
Department
of Molecular
Genetic
s,
University
of Texas
Southwester
n Medical
Center,
Dallas,
Texas,
United
States
of America,
4
Departm
ent
of
Microbiol
ogy
and
Immun
ology,
Stanford
Universit
y, Stanford,
Californi
a, United
States
of America,
5
Department
of Chemical
Biology
& Therap
eutics,
St.
Jude
Children
’s Research
Hospital,
Memp
his,
Tennessee,
United
States
of America
☯
These
authors
contribu
ted
equally
to this
work.
¤
Current
address:
Departm
ent
of Bioenginee
ring,
Stanford
University
, Stanford,
Californi
a, United
States
of
America
*
pbfinn@s
tanford.edu
(PBF);
dervan@cal
tech.edu
(PBD)
Abstract
Pyrrole–imidazole
(Py–Im)
polyamides
are
synthetic
molecules
that
can
be
rationally
designed
to target
specific
DNA
sequences
to both
disrupt
and
recruit
transcriptional
machinery.
While
in vitro
binding
has
been
extensively
studied,
in vivo
effects
are
often
diffi-
cult
to predict
using
current
models
of DNA
binding.
Determining
the
impact
of genomic
architecture
and
the
local
chromatin
landscape
on
polyamide-DNA
sequence
specificity
remains
an
unresolved
question
that
impedes
their
effective
deployment
in vivo
.
In this
report
we
identified
polyamide–DNA
interaction
sites
across
the
entire
genome,
by
cova-
lently
crosslinking
and
capturing
these
events
in the
nuclei
of human
LNCaP
cells.
This
tech-
nique
confirms
the
ability
of two
eight
ring
hairpin-polyam
ides,
with
similar
architectures
but
differing
at a single
ring
position
(Py
to Im),
to retain
in vitro
specificities
and
display
distinct
genome-wide
binding
profiles.
Introduction
Regulating
genomic
architecture
and
activity
with
sequence-specific
synthetic
DNA
binding
molecules
is a long-standing
goal
at
the
interface
of
chemistry,
biology
and
medicine.
Small
molecules
that
selectively
target
desired
genomic
loci
could
be
harnessed
to
regulate
critical
gene
networks.
The
greatest
success
in
designing
small
molecules
with
programmable
DNA-
binding
specificity
has
been
with
pyrrole-imidazole
(Py-Im)
polyamides
[1–8].
Pyrrole-imid-
azole
(Py-Im)
polyamides
are
synthetic
DNA-binding
oligomers
with
high
sequence
specificity
and
affinity
[7].
An
oligomer,
comprising
a modular
set
of
aromatic
pyrrole
and
imidazole
amino
acids
linked
in
series
by
a central
aliphatic
γ
-aminobutyric
acid
(GABA)
‘turn’
unit,
PLOS ONE
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
1 / 19
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN
ACCESS
Citation:
Finn PB, Bhimsaria
D, Ali A, Eguchi
A,
Ansari
AZ, Dervan
PB (2020)
Single
position
substitution
of hairpin
pyrrole-imida
zole
polyamides
imparts
distinct
DNA-bindi
ng profiles
across
the human
genome
. PLoS
ONE 15(12):
e0243905.
https://d
oi.org/10.1371/j
ournal.
pone.024390
5
Editor:
Hodaka
Fujii, Hirosak
i Univers
ity Graduate
School
of Medicine,
JAPAN
Received:
September
4, 2020
Accepted:
December
1, 2020
Published:
December
22, 2020
Copyright:
©
2020
Finn et al. This is an open
access
article
distributed
under
the terms
of the
Creative
Commons
Attribution
License,
which
permits
unrestricte
d use, distribu
tion, and
reproduction
in any medium,
provided
the original
author
and source
are credited.
Data
Availabilit
y Statement:
The data reported
in
this paper
have been
deposited
in the NCBI
Gene
Expression
Omnibus
(accession
no. GSE14936
7).
Funding:
Bio Informatica
ls provided
support
in the
form
of salaries
for author
[DB],
but did not have
any additiona
l role in the study
design,
data
collection
and analysis
, decision
to publish,
or
preparation
of the manuscript.
The specific
roles
of
these
authors
are articulated
in the ‘author
contributio
ns’ section.
Bio Informatica
ls provides
fold
into
a hairpin
structure
in
the
minor
groove
of
DNA
and
afford
binding
affinities
and
specificities
comparable
to
natural
transcription
factors
[3,
7].
Sequence
specificity
is pro-
grammed
through
side-by-side
pairs
of
the
Py
and
Im
subunits
that
“read”
the
steric
and
hydrogen
bonding
patterns
presented
by
the
edges
of
the
four
Watson-Crick
base
pairs
on
the
floor
of
the
minor
groove
[5].
DNase
I footprinting
titrations
and
other
in vitro
methods
have
extensively
characterized
the
binding
affinity
and
specificity
of
these
molecules
[3,
6,
7,
9].
An
Im/Py
pair
binds
G•C;
Py/Im
binds
C•G,
and
Py/Py
pairs
both
bind
A•T
and
T•A
(denoted
as
W)
[1,
2].
Py-Im
polyamide
binding
in
the
minor
groove
induces
allosteric
changes
to
DNA,
widening
the
minor
groove
and
narrowing
the
major
groove
[10–12].
Polyamide-DNA
bind-
ing
is sufficient
to
disrupt
protein-DNA
interfaces,
including
DNA
interactions
made
by
tran-
scription
factors
and
the
transcriptional
machinery
[13–15].
Additionally,
polyamides
can
function
as
sequence-specific
synthetic
cofactors
through
allosteric
DNA
modulation
to
enhance
the
assembly
of
protein-DNA
complexes
[12].
Py-Im
polyamides
are
cell
permeable,
localize
to
the
nucleus
in
live
cells
and
are
non-genotoxic
[16–18]
failing
to
activate
canonical
DNA
damage
response
or
significantly
alter
cell
cycle
distribution
[19].
The
identification
of
new
mechanistic
insights
into
Py-Im
polyamide
activity
have
under-
lined
the
importance
of
mapping
polyamide
binding
to
chromatin
[15,
18,
19].
Polyamide
binding
in
the
more
complex
cellular
environment
presents
a formidable
challenge
since
chro-
matin
DNA
has
varying
degrees
of
accessibility.
Sequence
specific
access
by
Py-Im
polyamides
to
the
nucleosome
core
particle
(NCP)
has
been
demonstrated
in vitro
and
with
x-ray
crystal
structures
of
NCP•polyamide
complexes
[20–22].
However,
the
extent
to
which
chromatin
states
influence
polyamide
binding
to
its
cognate
sites
remains
a long-standing
question.
The
lack
of
clarity
on
the
parameters
that
govern
genome-wide
binding
of
polyamides
greatly
impedes
the
deployment
of
this
class
of
molecules
to
regulate
cell
fate-defining
and
disease-
causing
gene
networks
in vivo
.
We
report
here
the
genome-wide
binding
profiles
of
two
Py-Im
polyamides
1
and
2
,
of
identical
architecture
(8-ring
hairpin)
that
differ
at
a single
aromatic
ring
position
in
cellular
nuclei
using
COSMIC-seq
(crosslinking
of
small
molecules
for
isolation
of
chromatin
with
next-generation
sequencing),
Fig
1 [23,
24].
COSMIC-seq
employs
a tripartite
conjugate
com-
posed
of
the
DNA-binding
ligand
attached
to
a biotin
affinity
handle
and
a psoralen
photo-
crosslinker.
Genome-wide
binding
of
these
tripartite
molecules
is captured
by
photo-induced
crosslinking
followed
by
biotin-enabled
enrichment
and
unbiased
NGS
sequencing
of
the
con-
jugated
genomic
loci
[23,
24].
The
ability
to
induce
rapid
crosslinking
at
the
desired
time
point
distinguishes
COSMIC-seq
from
continuous
and
uncontrolled
alkylation-dependent
DNA
conjugations
that
have
been
used
to
query
genome-wide
binding
of
polyamides
[25,
26].
COS-
MIC-seq
also
differs
from
Chem-seq
approaches
that
use
ligands
for
protein
complexes
that
are
associated
with
the
genome
[27].
Previously,
COSMIC-seq
was
utilized
to
access
genome-
wide
binding
of
two
structurally
distinct
Py-Im
polyamides
(hairpin
vs
linear)
that
code
for
very
different
sequences
[24].
An
8-ring
hairpin
Py-Im
polyamide
(TpPyPyIm-
γ
-PyImPyPy-
β
-
Dp)
binds
6 bp
of
DNA
(
5’-WTWCGW-3’
)
[28],
whereas
a linear
polyamide
(ImPy-
β
-ImPy-
β
-Im-
β
-Dp)
binds
9 bp
of
purine
rich
DNA
(
5’-AAGAAGAAG-3’
)
[29–32].
While
such
a
dramatic
difference
in
target
sequence
composition
leads
to
distinct
genome-wide
binding
profiles,
we
wondered
how
a more
challenging
single
position
change
(CH
to
N:)
within
one
ring
of
an
8-ring
hairpin
would
affect
genomic
occupancy.
In
this
study
we
applied
COSMIC-
seq
to
determine
if two
polyamides
of
identical
size
and
architecture,
hairpins
1
and
2
which
code
for
6 base
pair
sites
differing
by
one
base
pair
position
5’-WGWWCW-3’
and
5’-
WGGWCW-3’
,
respectively,
can
display
distinct
genomic
binding
occupancy
on
chromatin.
These
experiments
provide
a more
stringent
test
of
genome-wide
binding
properties
of
hairpin
polyamides
in
a chromatin
environment
for
application
as
precision-targeting
molecules.
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
2 / 19
genomic
data analysis
service,
with a particular
focus
on Cognate
Site Identifica
tion (CSI)
and
COSMIC-se
q.
Competing
interests
:
DB is a sole proprietor
of Bio
Informatica
ls (see www.bioi
nformatical
s.com).
This does
not alter our adherence
to PLOS
ONE
policies
on sharing
data and materials.
Materials
and
methods
Materials
Chemicals
and
solvents
were
purchased
from
standard
chemical
suppliers
and
used
without
further
purification.
(R)-2,4-Fmoc-Dab
(Boc)-OH
(
α
-amino-GABA
turn)
was
purchased
from
Peptides
International.
Monomers
were
synthesized
as
previously
described
[33].
Kaiser
oxime
resin
(100–200
mesh)
and
benzotriazole-1-y
l-oxy-trispyrrolidinophosphonium
hexa-
fluorophosphate
(PyBOP)
were
purchased
from
Novabiochem.
2-Chlorotrityl
chloride
resin
was
purchased
from
Aapptec.
Preparative
HPLC
purification
was
performed
on
an
Agilent
1200
Series
instrument
equipped
with
a Phenomenex
Gemini
preparative
column
(250
x 21.2
mm,
5
μ
m)
with
the
mobile
phase
consisting
of
a gradient
of
acetonitrile
(CH
3
CN)
in
0.1%
aqueous
trifluoroacetic
acid
(TFA).
Polyamide
concentrations
were
measured
by
UV/Vis
spec-
troscopy
in
distilled
and
deionized
water
(ddH
2
O)
with
a molar
extinction
coefficient
of
8650
Fig
1.
Trifunct
ional
Py-Im
polyamide
conjugate
s 1 and
2.
(
A
)
Chemical
structure
of
hairpin
Py-Im
polyamides
1
and
2
which
differ
at
a single
position,
shown
in
red,
and
(
B
)
the
correspond
ing
predicted
target
sequenc
es
based
on
the
pairing
rules.
Py-Im
polyamide
1
targets
the
DNA
sequence
5’-WGWW
CW-3’
and
Py-
Im
polyamide
2
targets
5’-WGGWC
W-3’.
Open
and
filled
circles
represent
N-methylpyrro
le (Py)
and
N-methy
limidazole
(Im),
respectively
(W
is A or
T,
and
A,
C,
G
and
T are
DNA
nucleotides
). The
N-acetylate
d (R)-
γ
-amino
butyric
(NHAc)
acid
turn
residue
is shown
as
a semicirc
le,
and
psoralen
and
biotin
are
denoted
by
P and
B,
respective
ly.
https://doi.o
rg/10.1371/j
ournal.pone
.0243905.g001
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
3 / 19
M
-1
cm
-1
at
310
nm
for
each
N
-methylpyrrole
(Py)
and
N
-methylimidazole
(Im)
and
11,800
M
-1
cm
-1
for
the
psoralen/biotin
derivative
3
[34,
35].
Analytical
HPLC
analysis
was
conducted
on
a Beckman
Gold
instrument
equipped
with
a Phenomenex
Gemini
analytical
column
(250
x 4.6
mm,
5
μ
m),
a diode
array
detector,
and
the
mobile
phase
consisting
of
a gradient
of
aceto-
nitrile
in
0.1%
aqueous
TFA.
Matrix-assisted,
LASER
desorption/ionizat
ion
time-of-flight
(MALDI-TOF)
mass
spectrometry
was
performed
on
an
Autoflex
MALDI
TOF/TOF
(Bruker)
using
α
-cyano-4-hydroxycinnam
ic acid
matrix.
Oligonucleotides
were
purchased
from
Inte-
grated
DNA
Technologies
Inc.
All
sequencing
samples
were
processed
as
single
read
(50
bp)
sequencing
runs
at
the
California
Institute
of
Technology
Millard
and
Muriel
Jacobs
Genetics
and
Genomics
Laboratory
on
an
Illumina
HiSeq
2500
Genome
Analyzer.
Chemical
synthesis
Polyamides
1A
and
2A
were
synthesized
on
solid
support
(Kaiser
oxime
resin,
100–200
mesh),
using
microwave-assisted
PyBOP
coupling
conditions
with
N
-methylpyrrole
(Py),
N
-methyli-
midazole
(Im)
amino
acid
monomers
and
dimers
(
5a
and
5b
)
as
previously
described,
S1A
Fig
[35].
Polyamides
were
cleaved
from
resin
with
neat
3,3
0
-diamino-
N
-methyldipropyl
amine
(60 ̊C,
5 min,
μ
W),
precipitated
with
diethyl
ether
at
-20 ̊C,
re-dissolved
in
20–30%
(v/v)
CH
3
CN/H
2
O
(0.1%
TFA),
and
purified
by
reverse-phase
preparative
HPLC.
Fractions
that
showed
clean
polyamide
without
contaminants
were
frozen
in
liquid
nitrogen
and
lyophilized
to
dryness
as
a white-yellow
solid.
The
identity
and
purity
were
confirmed
by
MALDI-TOF
mass
spectrometry
and
analytical
HPLC.
The
observed
mass
for
1A
(C
59
H
75
N
22
O
10
) is 1251.78
(calculated
1251.60)
and
for
2A
(C
58
H
74
N
23
O
10
) is 1252.75
(calculated
1252.60).
The
psoralen-biotin
peptide
3
was
synthesized
by
manual
Fmoc
solid-phase
synthesis
on
2-chlorotrityl
chloride
resin
by
standard
procedures,
S1B
Fig
[23].
Coupling
and
deprotection
were
performed
at
room
temperature
for
1 h and
15
min,
respectively.
Briefly,
Fmoc-protected
amino
acids
or
polyethylene
glycol
(PEG)
linkers
were
activated
with
HATU
and
HOAt
in
the
presence
of
N
,
N
-diisopropylethylamine
(DIPEA)
in
dimethylformamide
(DMF)
(or
DMSO/
DMF)
and
deprotection
of
the
Fmoc
group
was
achieved
with
20%
piperidine
in
DMF.
Cleav-
age
from
resin
was
achieved
with
a solution
of
95%
(v/v)
TFA,
2.5%
(v/v)
H
2
O,
and
2.5%
(v/v)
triisopropylsilane
and
purified
by
reverse-phase
preparative
HPLC,
lyophilized
to
dryness
as
a
white
powder
and
protected
from
light.
The
identity
and
purity
were
confirmed
by
MALDI-
TOF
mass
spectrometry
and
analytical
HPLC.
The
observed
mass
for
3
(C
43
H
61
N
6
O
15
S)
is
933.36
(calculated
933.39).
Polyamide-peptide
conjugates
1
and
2
were
synthesized
by
solution
phase
peptide
coupling
conditions
and
protected
from
light,
S1C
Fig.
Peptide
acid
3
(1
equiv.)
was
pre-activated
for
5
min
at
room
temperature
with
a solution
of
HATU/HOAt/DIPE
A (3:3:6
equiv.)
in
DMF.
Polyamide
1A
or
2A
was
added
(1–1.5
equiv.),
and
the
coupling
was
allowed
to
proceed
for
30–60
minutes
until
all
of
3
was
consumed
as
determined
by
analytical
HPLC.
The
polyamide-
peptide
conjugates
were
purified
by
reverse-phase
HPLC
and
lyophilized
to
dryness.
The
iden-
tity
and
purity
were
confirmed
by
MALDI-TOF
mass
spectrometry
and
analytical
HPLC.
The
observed
mass
for
1
(C
102
H
133
N
28
O
24
S)
is 2166.26
(calculated
2165.98)
and
for
2
(C
101
H
132
N
29
O
24
S)
is 2167.23
(calculated
2166.97).
Cognate
site
identification
High-throughput
cognate
binding
sites
were
identified
for
the
polyamides
1
and
2
using
SELEX
method
[36].
A DNA
library
with
a central
randomized
20-bp
region
and
flanked
by
constant
sequences
(~10
12
possible
sequences,
Integrated
DNA
Technologies)
was
used
for
PCR
amplification.
Polyamide
conjugates
1
and
2
at
a range
of
concentrations
(5
nM
and
50
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
4 / 19
nM)
were
added
to
100
nM
of
DNA
library
in
binding
buffer
[1
×
PBS
(pH
7.6),
50
ng/
μ
L
poly
(dI-dC)]
and
incubated
for
1 h at
room
temperature.
Enrichment
of
the
compound-DNA
complexes
was
performed
using
streptavidin-coated
magnetic
beads
(Dynabeads,
Invitrogen)
following
manufacturer’s
protocol.
To
remove
unbound
DNA,
three
washes
were
done
after
the
capture,
with
100
μ
L
ice-cold
binding
buffer.
Beads
were
resuspended
in
PCR
master
mix
(EconoTaq
PLUS
2
×
Master
Mix,
Lucigen),
the
DNA
was
amplified
for
15
cycles
and
purified
(QIAGEN).
Three
rounds
of
selection
were
performed
(DNA
was
quantified
by
absorbance
at
260
nm
before
each
round
of
binding).
An
additional
round
of
PCR
was
performed
after
com-
pletion
of
three
rounds
of
selection,
to
incorporate
Illumina
sequencing
adapters
and
a unique
6-bp
barcode
for
multiplexing.
The
starting
library
was
also
barcoded
and
sequenced.
Samples
were
sequenced
on
an
Illumina
HiSeq
2500
at
the
Millard
and
Muriel
Jacobs
Genetics
and
Genomics
Laboratory
in
California
Institute
of
Technology.
Cell
culture
conditions
LNCaP
cells
were
maintained
in
RPMI
1640
(Invitrogen)
with
10%
Fetal
Bovine
Serum
(FBS,
Irvine
Scientific)
at
37 ̊C
under
5%
CO
2
. LNCaP
cells
were
purchased
from
ATCC
(Manassas,
VA,
USA).
Crosslinking
of
small
molecules
for
isolation
of
chromatin
with
next-
generation
sequencing
COSMIC-seq
was
performed
in
LNCaP
nuclei,
as
previously
described
[23].
LNCaP
cells
(~2.5
x 10
7
) were
washed
twice
with
cold
PBS
then
resuspended
in
cold
lysis
buffer
(RSB
+ 0.1%
IGEPAL
CA-630,
2.5
x 10
7
cells/250
μ
L),
incubated
on
ice
for
5 min
then
centrifuged
immedi-
ately
at
130
×
g for
10
min
at
4 ̊C.
Nuclei
were
resuspended
in
binding
buffer
[10
mM
Tris
HCl
(pH
8.0),
5 mM
MgCl
2
, 1 mM
DTT,
0.3
M
KCl,
0.1
M
PMSF,
0.1
M
benzamidine,
0.1
M
pepstatin
A,
10%
glycerol]
and
treated
with
psoralen-biotin
conjugated
polyamide
1
or
2
(4
μ
M,
0.1%
DMSO
final
concentration)
for
1 h at
4 ̊C
in
the
dark.
Nuclei
were
irradiated
for
30
min
with
a UV
lamp
(2.4
μ
W/cm
2
; CalSun)
through
a Pyrex
filter,
centrifuged
at
500
×
g
and
re-suspended
in
COSMIC
buffer
[20
mM
Tris
�
Cl
(pH
8.1),
2 mM
EDTA,
150
mM
NaCl,
1mM
PMSF,
1mM
benzamidine,
1.5
μ
M
pepstatin,1%
Triton
X-100,
0.1%
SDS].
Samples
were
sonicated
at
3 ̊C
for
36
min
with
a cycle
of
10
s ON
and
10
s OFF,
at
HIGH
setting
(Bioruptor
Plus,
Diagenode).
Samples
were
centrifuged
10
min
at
12,000
×
g and
10%
of
the
sample
was
saved
as
input
DNA
and
stored
at
-80 ̊C
until
reversal
of
cross-linking.
The
rest
of
the
sample
was
used
for
the
affinity
purification
(AP).
Streptavidin-coated
magnetic
beads
(100
μ
L
per
sample,
Dynabeads
MyOne
C1)
were
washed
in
COSMIC
buffer
and
incubated
with
AP
sam-
ples
for
16
h at
4 ̊C.
All
washes
were
performed
at
room
temperature
unless
otherwise
noted.
For
1
and
2
,
1A
and
2A
were
added
(5
μ
M),
respectively,
in
the
washes.
Samples
were
washed
twice
with
COSMIC
buffer
(once
12
h and
once
4 h).
Samples
were
then
washed
once
with
washing
buffer
1 [10
mM
Tris
�
Cl
(pH
8.0),
1 mM
EDTA,
3%
(v/v)
SDS],
once
with
washing
buffer
2 [10
mM
Tris
�
Cl
(pH
8.0),
250
mM
LiCl,
1 mM
EDTA,
0.5%
Nonidet
P-40,
1%
sodium
deoxycholate],
twice
with
freshly
prepared
washing
buffer
3 [4
M
urea,
10
mM
Tris
�
Cl
(pH
7.5),
1 mM
EDTA,
0.1%
Nonidet
P-40],
and
twice
with
TE
buffer
[10
mM
Tris
�
Cl
(pH
8.0),
1
mM
EDTA].
Samples
were
re-suspended
in
TE
and
labelled
as
AP
DNA.
Input
and
AP
sam-
ples
were
re-suspended
in
cross-link
reversal
buffer
[10
mM
Tris
(pH
7.6),
0.4
mM
EDTA,
100
mM
KOH].
Crosslinks
were
reversed,
and
DNA
was
eluted
from
beads
at
the
same
time
by
heating
samples
for
30
min
at
90 ̊C.
Input
and
AP
samples
were
neutralized
with
6N
HCl,
and
incubated
first
with
RNase
A (0.2
μ
g/
μ
L)
for
1 h at
37 ̊C
and
then
with
Proteinase
K (0.2
μ
g/
μ
L)
for
1 h at
55 ̊C.
Samples
were
purified
with
the
MinElute
PCR
Purification
Kit
(Qiagen).
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
5 / 19
CSI
data
analysis
The
reads
from
Illumina
sequencing
were
de-multiplexed
using
the
6 bp
barcodes
and
then
truncated
to
include
only
the
20
bp
random
portion
of
the
library.
On
average,
1,031,000
reads
per
barcode
were
obtained.
The
occurrence
of
every
k-mer
(8
mer)
was
counted
using
a sliding
window
of
size
k.
To
correct
for
experimental
biases
and
biases
in
the
initial
DNA
library,
a
standardized
enrichment
score
was
calculated
by
normalizing
the
counts
of
every
k-mer
from
the
enriched
CSI
data
(rounds
1,
2 or
3)
to
the
expected
number
of
counts
in
the
library
with
a
fifth-order
Markov
model
derived
from
the
processed
library
(processed
same
number
of
SELEX
enrichment
rounds
without
the
polyamide
as
done
for
the
polyamide)
[37,
38].
The
most
enriched
8 bp
sub-sequences
were
used
to
derive
position
weight
matrix
(PWM)
motifs
using
MEME
[39,
40].
Data
files
for
mapped
20
bp
reads
and
normalized
8 bp
sequences
are
available
online
(https://ansarilab.biochem.wisc.ed
u/computation.html).
Sequence
logos
PWMs
were
derived
from
the
50
most
enriched
8-mer
sequences
(ranked
by
enrichment)
for
each
polyamide,
using
MEME
[39,
40].
The
following
parameters
were
used
as
inputs
to
the
meme
command
(http://meme-suite.org/doc/meme
.html):
-
dna
-
mod
anr
-
nmotifs
10
-
minw
6
-
maxw
8
-
time
7200
-
maxsize
60000
–
revcomp
Specificity
and
energy
landscapes
Specificity
and
Energy
Landscapes
(SELs)
display
high-throughput
protein-DNA
binding
data
(DNA–protein
interactome
or
DPI)
in
the
form
of
concentric
rings
[41–43].
The
organization
of
data
in
SEL
is detailed
in
S3
Fig.
SELs
were
generated
from
8-mer
enrichment
files
using
the
target
sequence
for
corresponding
polyamide
1
(5’-WGWWCW-3’)
and
2
(5’-WGGWCW-3’)
as
seed
motif.
The
software
for
generating
SELs
is made
available
online
(https://ansarilab.
biochem.wisc.edu/computation
.html).
Genomescapes:
Scoring
in vivo
bound
sites
with
in vitro
data
Genomescapes
are
generated
by
assigning
in vitro
CSI
intensities
(enrichment
values)
to
geno-
mic
regions.
To
generate
CSI
Genomescapes
a sliding
k-mer
window
was
used
to
score
geno-
mic
regions
and
then
plotted
as
a bar
plot
[41,
43].
Summation
of
sites
model
Summation
of
sites
(SOS)
model
was
used
to
predict
DNA
binding
of
polyamides
in
the
human
genome,
hg19
[23,
24,
44].
The
SOS
score
was
obtained
by
summing
(or
averaging)
all
k-mer
in vitro
binding
intensities
(enrichment)
obtained
using
a sliding
k-mer
window
across
a genomic
region.
Data
is displayed
using
genomic
regions
of
420
bp
for
SOS
[23].
For
SOS
predicted
genomic
loci,
the
whole
human
genome
(hg19)
was
divided
into
420
bp
fragments
with
the
overlap
of
half
(210
bp).
These
fragments
were
then
sorted
by
the
predicted
binding
to
polyamide
1
and
2
using
SOS
model.
The
top
1000
predicted
peaks
obtained
were
used
as
final
predicted
peaks
for
further
analysis.
COSMIC-seq
data
analysis
Sequencing
reads
were
mapped
to
the
human
genome
(hg19)
with
Bowtie
(best
-m
1)
to
yield
unique
alignments.
Bound
regions/peaks
were
identified
with
SPP
[24,
44].
The
data
has
been
deposited
in
the
Gene
Expression
Omnibus
(GEO)
database,
www.ncbi.nlm.nih.gov/geo
(accession
no.
GSE149367).
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
6 / 19
Plotting
tag
density
data,
genomescape
and
SOS
as
heatmaps
To
display
data
multiple
heatmaps
are
shown
using
two
types
of
genomic
regions:
the
top
1000
COSMIC
peaks,
and
the
top
1000
genomic
loci
predicted
to
be
the
best
binder
using
the
SOS
model.
These
regions
were
scored
using
COSMIC-seq
tag
density
for
AP
of
a 10
Kbp
region
surrounding
the
peak
using
HOMER
annotatePeaks.pl
command
with
arguments
-
hist as 25
bp
,
SOS
scores
for
10
Kbp
region
surrounding
the
peak,
and
genomescapes
for
1 Kbp
[45].
Dif-
ferent
coloring
scales
were
used
to
display
heatmaps
by
using
a multiplication
factor
of
10x
for
tag
density
and
100x
for
genomescapes
and
SOS
scoring.
Results
Polyamide
design
COSMIC-seq
was
performed
on
two
structurally
identical
hairpin
polyamides,
differing
at
a
single
position
(X
= CH
vs
N:)
on
the
second
aromatic
amino
acid
ring-pair,
Fig
1A.
A single
CH
to
N:
position
substitution
changes
the
ring
pair
from
a Py/Py
to
an
Im/Py
which
invokes
a preference
from
an
A•T
or
T•A
to
a G•C
base
pair,
respectively,
based
on
previously
deter-
mined
pairing
rules,
Fig
1B
[1,
2,
46].
Py-Im
polyamide
1,
designed
to
target
the
consensus
androgen
response
element
(ARE)
half-site
5’-WGWWCW-3’
,
has
been
shown
to
regulate
androgen
receptor
(AR)
and
glucocorticoid
(GR)
driven
gene
expression
in
cell
culture
and
suppress
tumor
growth
in vivo
[14,
18,
47].
Py-Im
polyamide
2,
designed
to
target
the
estrogen
response
element
(ERE)
consensus
half
site
5’-WGGWCW-3’
,
was
shown
to
effect
estrogen
receptor-alpha
(ER
α
)-driven
gene
expression
in vitro
and
in vivo
[48].
In
this
study
each,
hair-
pin
Py-Im
polyamide
is conjugated
at
the
C-terminus
with
a psoralen
and
biotin
for
enrich-
ment
connected
via
a linker
(~36
Å
extended)
capable
of
sampling
pyrimidine
proximal
to
the
polyamide-binding
site
suitable
for
2 + 2 photocycloaddition
[23,
24].
Because
the
psoralen
moiety
crosslinks
proximal
pyrimidines
(T
in
particular),
we
anticipate
subtle
bias
in
the
data,
a contextual
flanking
sequence
nuance
adjacent
to
the
core
binding
sites
of
each
polyamide.
Py-Im
polyamides
were
synthesized
by
Boc
solid-phase
synthesis,
cleaved
from
resin
and
con-
jugated
to
the
psoralen-biotin
moiety
3 (S1
Fig).
Different
sequence
specificities
conferred
by
a single
position
substitution.
To
compre-
hensively
map
in vitro
binding
characteristics
of
hairpin
polyamide-conjugates
1 and
2,
we
performed
solution-based
Cognate
Site
Identifier
(CSI)
analysis,
Fig
2 [6,
41,
42,
49].
Sequence
specificity
data
was
determined
with
next
generation
sequencing
(NGS)
by
solution-based
enrichment
methods
(SELEX-seq)
to
assess
polyamide-DNA
binding
[50].
These
methods
provide
a comprehensive
characterization
of
polyamide-DNA
binding
through
the
sampling
of
a large
sequence
space
(a
dsDNA
library
bearing
all
~10
12
sequence
permutations
of
a 20-bp
site)
using
affinity
purification
coupled
with
massively
parallel
sequencing
[36,
42].
This
plat-
form
allows
rapid,
quantitative
identification
of
the
full
spectrum
of
polyamide
binding
sites
of
up
to
20
bp
in
size,
correlates
well
with
solution-phase
and
microarray
platforms,
and
has
been
used
to
guide
the
refinement
of
general
polyamide
design
principles
[9,
24,
32,
41,
42].
Py-Im
polyamides
1 and
2 were
incubated
with
a duplex
oligonucleotide
library
containing
a ran-
domized
20-mer
region,
and
the
bound
and
unbound
sequences
were
separated
via
affinity
purification
by
streptavidin-coated
magnetic
beads,
Fig
2A.
Following
each
round
of
enrich-
ment,
sequences
were
PCR
amplified,
purified,
multiplexed,
and
subjected
to
massively
parallel
sequencing
analysis.
Computational
analysis
was
applied
to
enriched
sequences
to
obtain
binding
site
intensity
values
corresponding
to
all
8-mer
DNA
sequences,
see
Methods
[38].
Polyamide-DNA
binding
motifs
for
each
round
of
enrichment
were
identified
by
position
weight
matrices
(PWMs)
using
the
top
50
enriched
8-mer
sequences
and
displayed
as
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
7 / 19
Fig
2.
Detectio
n of
in vitro
DNA
binding
of
polyamides
1 and
2 via
Cognate
Site
Identifica
tion
(CSI)
by
SELEX-seq.
(
A
)
Overview
of
CSI
by
SELEX-seq
workflow.
A randomiz
ed
20
bp
DNA
library
is incubated
with
biotinylated
polyamide,
DNA
is enriched
by
streptavidin-c
oated
magnetic
beads,
PCR
amplified
and
sequenced
by
NGS
to
obtain
k-mers
(CSI
enrichmen
t) representi
ng
polyam
ide-DNA
binding.
Enrichm
ent
is displayed
as
a histogram
plot
and
high
bindin
g
sequences
are
represented
as
a position
weight
matrix
(PWM)
logo.
Specificity
and
energy
landscap
es
(SELs)
are
created
to
visualize
the
full
spectrum
of
DNA
binding
across
all
sequence
permutat
ions
of
an
8-mer
binding
site.
(
B
)
PWM
logos
for
polyamides
1
(
top
)
and
2
(
bottom
).
(
C
)
Scatterplot
comparison
of
in vitro
DNA
binding
for
1
vs
2
.
CSI
enrichment
for
8-mers
is plotted
for
sequenc
es
containing
5’-WGWW
CW-3’
(
red
)
and
5’-WGGWC
W-3’
(
green
).
(
D
)
Comprehens
ive
SELs
for
1
(
top
)
and
2
(
bottom
)
using
5’-WGWW
CW-3’
and
5’-WGGWC
W-3’
as
seed
motif,
respectively,
where
W
= A or
T.
https://doi.o
rg/10.1371/j
ournal.pone
.0243905.g002
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
8 / 19
sequence
logos,
Fig
2B
and
S2
Fig
[39,
40].
The
highest
information
content
for
each
polyam-
ide
is found
at
a binding
site
width
of
six,
verifying
the
binding
site
size
expected
when
1 and
2
are
bound
in
a fully
ring-paired,
hairpin
configuration.
The
motifs
generated
are
indicative
of
polyamide-DNA
binding
consistent
with
the
Py-Im
pairing
rules
for
both
1 and
2,
targeting
the
sequences
5
0
-WGWWCW-3
0
and
5
0
WGGWCW-3
0
, respectively.
A clear
difference
in
the
sequence
preference
at
the
third
position,
corresponding
to
the
CH
to
N:
position
substitution
of
the
second
ring
pair
(Py/Py
vs
Im/Py),
was
detected.
Additionally,
both
polyamides
show
subtle
differences
in
binding
preference
at
the
fourth
and
sixth
positions
revealing
sensitivity
of
binding
energetics
to
changes
in
sequence
context.
Scatter
plot
comparison
analysis
of
all
enriched
8-mer
sequences
indicates
a preference
for
the
consensus
motif,
polyamide
1 prefers
WGWWCW
over
WGGWCW
,
whereas
polyamide
2 prefers
WGGWCW
over
WGWWCW
,
Fig
2C
and
S3
Fig.
These
results
demonstrate
that
a single
position
(CH
to
N:)
modification
of
the
aromatic
amino
acid
ring
of
the
polyamide
core
structure
imparts
a significant
change
in
the
global
in
vitro
DNA
sequence
preferences
and
confirms
that
the
C-terminus
modification
does
not
have
significant
impact
on
the
specificity
of
the
hairpin
polyamides.
While
based
on
the
pairing
rules
these
results
may
seem
obvious,
this
experiment
was
important
to
confirm
that
polyam-
ide-conjugates
1 and
2 retain
preference
for
cognate
sequences.
While
PWM-based
motifs
summarize
sequence
preferences
of
DNA-binding
molecules,
they
compress
related
sequences
into
a consensus
motif,
masking
the
impact
of
flanking
sequences
and
local
microstructure
as
well
as
underestimating
the
affinity
spectrum
of
cognate
sites
contained
within
a given
DNA-polyamide
interactome
(DPI).
Sequence
specificity
land-
scapes
(SSLs)
can
optimize
the
cognate
site
motif(s)
and
thereby
uncover
major
binding
motifs
to
visualize
the
effects
of
flanking
sequences
[41].
To
better
visualize
the
full
spectrum
of
DNA
binding
and
compare
the
individual
interactomes
of
each
polyamide,
we
developed
specificity
and
energy
landscapes
(SELs)
for
1 and
2,
Fig
2D
[41–43].
SELs
present
the
enriched
binding
sequences
as
concentric
rings,
organized
by
a “seed
motif”
in
the
zero-mismatch
ring
(central
ring)
having
an
exact
match
for
the
seed
motif,
S4
Fig.
The
PWM-based
motif
is used
as
a seed
and
the
entire
DPI
displayed
in
concentric
rings
as
they
deviate
from
the
seed
motif.
Each
con-
secutive
ring
represents
0,
1,
2,
n.,
mismatches
from
the
seed
motif.
SELs
plotted
for
the
com-
plete
set
of
enrichment
data
using
a 6-mer
seed
motif,
WGWWCW
(1)
and
WGGWCW
(2),
show
a
clear
preference
of
both
polyamides
for
6-mer
seed
motifs
(central
ring).
The
dramatic
drop-
off
in
affinity
for
sequences
that
deviate
from
the
preferred
8-mer
site
(outer
rings),
under-
scores
the
exquisite
sequence
specificity
of
hairpin
polyamides,
Fig
2D
and
S5
Fig
[41–43].
It is important
to
note
that
low-affinity
sequences,
that
are
sequentially
depleted
in
SELEX-
based
approaches,
are
critical
to
develop
accurate
binding
site
models
across
the
genome
[23,
38,
41,
43].
Indeed,
we
observed
both
a concentration
and
selection
effect,
with
each
sequential
round
of
SELEX
steadily
enriching
high-affinity
sites
with
a concomitant
decrease
in
correla-
tion
with
genome-wide
binding
profiles
(S1
File).
For
these
reasons,
DNA-polyamide
interac-
tome
from
round
1 (with
no
successive
rounds
of
SELEX)
with
final
concentration
of
polyamides
at
50
nM
was
used
for
further
analysis.
COSMIC-seq
to
map
genome-wide
binding
profiles
We
utilized
COSMIC-seq
to
map
the
genome-wide
binding
targets
of
1 and
2 in
LNCaP
nuclei
to
determine
if Py-Im
polyamides
could
maintain
their
preferred
differential
binding
specific-
ity
in
a biochemically
active
complex
chromatin
environment,
Fig
3A
[24].
Isolated
nuclei
retain
native
chromatin
states
and
are
widely
used
to
examine
chromatin
structure
and
accessi-
bility
[51–53].
Briefly,
isolated
nuclei
from
human
LNCaP
cells
were
treated
in
biological
duplicate
with
1 or
2 (4
μ
M)
at
4 ̊C
for
1 h and
cross-linked
to
DNA
by
365
nm
UV
irradiation.
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
9 / 19
DNA
was
sheared
by
sonication,
polyamide-DNA
complexes
were
captured
by
streptavidin-
coated
magnetic
beads,
cross-links
were
reversed,
and
enriched
DNA
was
sequenced.
COS-
MIC-seq
reads
were
mapped
using
the
Bowtie
algorithm
and
bound
peaks
were
identified
using
the
standard
peak-calling
algorithms
endorsed
by
the
ENCODE
consortium,
see
Meth-
ods
[24,
44,
54].
Sequence/Read
tag
density
of
the
top
1000
identified
peaks
across
replicates
of
polyamides
was
compared
over
a 10
Kbp
region
centered
at
bound
COSMIC-seq
loci,
and
shown
as
a heat
map,
Fig
3B
and
3C.
Both
polyamides
1 and
2 show
a strong
correlation
between
bound
peaks
of
replicates
while
a consistent
non-correlation
is observed
when
comparing
the
top
1000
iden-
tified
peaks
of
polyamide
1 to
those
of
polyamide
2.
A high
COSMIC
enrichment
signal
for
sites
bound
by
polyamide
1 is observed
for
replicate
treatments
(Fig
3B,
left
),
however,
no
sig-
nificant
enrichment
is observed
when
compared
to
polyamide
2 (Fig
3B,
right
).
A similar
Fig
3.
Genom
e-wide
DNA
binding
of
polyamides
1 and
2 by
COSMIC
-seq.
(
A
)
Overview
of
COSMIC
-seq
in
LNCaP
cells,
nuclei
are
treated
with
polyamides
1
and
2
and
cross-lin
ked
to
DNA
with
UV
irradiation
(365
nm).
Cross-linke
d genomic
DNA
is enriched
and
analyzed
by
NGS.
(
B
,
C
)
Heat
maps
reveal
selective
enrichment
of
polyamides
1
and
2
.
Tag
density
of
each
polyamide
is shown
for
the
top
1,000
loci
for
1
(
B
)
and
2
(
C
).
Data
is displayed
as
sequence
read
tag
density
heatmaps
(
bottom
)
and
averaged
bar
plots
(
top
)
for
the
top
1000
predicted
peaks
are
mapped
on
a 10
Kbp
window.
https://doi.o
rg/10.1371/j
ournal.pone
.0243905.g003
PLOS ONE
DNA-bindi
ng
profiles
of hairpin
pyrrole-imid
azole
polyamide
s
PLOS
ONE
| https://doi.or
g/10.137
1/journal.po
ne.02439
05
December
22,
2020
10
/ 19