Nature Genetics
| Volume 56 | December
2024 | 2827–2841
2827
nature genetics
https://doi.org/10.1038/s41588-024-02000-5
Technical Report
ChIP-DIP maps binding of hundreds of
proteins to DNA simultaneously and
identifies diverse gene regulatory elements
Andrew A. Perez
1,6
, Isabel N. Goronzy
1,2,3
,6
, Mario R. Blanco
1
, Benjamin T. Yeh
1,2
,
Jimmy K. Guo
1,4
, Carolina S. Lopes
5
, Olivia Ettlin
1
, Alex Burr
1
&
Mitchell Guttman
1
Gene expression is controlled by dynamic localization of thousands of
regulatory proteins to precise genomic regions. Understanding this cell
type-specific process has been a longstanding goal yet remains challenging
because DNA–protein mapping methods generally study one protein at a
time. Here, to address this, we developed chromatin immunoprecipitation
done in parallel (ChIP-DIP) to generate genome-wide maps of hundreds
of diverse regulatory proteins in a single experiment. ChIP-DIP produces
highly accurate maps within large pools (>160 proteins) for all classes
of DNA-associated proteins, including modified histones, chromatin
regulators and transcription factors and across multiple conditions
simultaneously. First, we used ChIP-DIP to measure temporal chromatin
dynamics in primary dendritic cells following LPS stimulation. Next, we
explored quantitative combinations of histone modifications that define
distinct classes of regulatory elements and characterized their functional
activity in human and mouse cell lines. Overall, ChIP-DIP generates
context-specific protein localization maps at consortium scale within any
molecular biology laboratory and experimental system.
Although every cell in the body inherits the same genomic DNA
sequence, distinct cell types express different genes to enable specific
functions. Cell type-specific gene regulation involves the coordinated
activity of thousands of regulatory proteins that localize at precise
DNA regions to activate, repress and quantitatively control transcrip
-
tion levels. Genomic DNA is organized around nucleosomes
1
, which
contain histone proteins that undergo extensive post-translational
modifications
2
,
3
and together define cell type-specific chromatin states.
Chromatin state is controlled by regulators that directly read, write and
erase specific histone modifications
2
,
4
as well as control nucleosome
positioning and DNA accessibility
5
,
6
. This determines which genomic
regions are accessible for binding by sequence-specific transcription
factors (TFs)
7
, enzymes that transcribe DNA into RNA (RNA polymer-
ases)
8
and other general and specific regulatory proteins that promote
or suppress transcriptional initiation
9
,
10
. Conversely, recruitment of
these regulatory proteins to specific DNA regions, along with tran
-
scriptional changes, can facilitate changes in chromatin state and DNA
accessibility
5
,
11
.
Understanding how regulatory protein binding leads to cell
type-specific gene expression has been a central goal of molecular
biology for decades
2
. Over the past 20 years, important technical
advances have enabled genome-wide mapping of regulatory proteins
Received: 11 December 2023
Accepted: 21 October 2024
Published online: 25 November 2024
Check for updates
1
Division of Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA.
2
David Geffen School of Medicine, University of California,
Los Angeles, Los Angeles, CA, USA.
3
Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
4
Keck School
of Medicine, University of Southern California, Los Angeles, CA, USA.
5
Program in Bioinformatics and Integrative Biology, University of Massachusetts
Medical School, Worcester, MA, USA.
6
These authors contributed equally: Andrew A. Perez, Isabel N. Goronzy.
e-mail:
mguttman@caltech.edu
Nature Genetics
| Volume 56 | December
2024 | 2827–2841
2828
Technical Report
https://doi.org/10.1038/s41588-024-02000-5
sets of different antibody–bead–oligonucleotide conjugates to create
an antibody–bead pool, (3) performing ChIP, (4) barcoding chroma
-
tin–antibody–bead–oligonucleotide conjugates via split-and-pool
ligation
38
–
40
and (5) sequencing DNA and computationally matching
split-and-pool barcodes that are shared between genomic DNA and
the antibody–oligonucleotide. We define all unique reads containing
the same split-pool barcode as a cluster and combine reads from all
clusters corresponding to the same antibody to generate a localization
map for each protein. The output of ChIP-DIP is analogous to the data
generated by ChIP–seq; however, instead of a single map, ChIP-DIP
generates a map for each antibody used (Fig.
1b
).
To ensure that chromatin–antibody–bead–oligonucleotide conju
-
gates remain intact throughout the ChIP-DIP procedure, we designed a
series of experiments to measure dissociation between oligonucleotide
and bead, antibody and bead, or antibody and chromatin (Extended
Data Fig. 1b and Supplementary Note 1).
(1)
Oligonucleotide–bead dissociation. We found that most clus-
ters (>95%) contained only a single oligonucleotide type (Ex
-
tended Data Fig. 1c), indicating that oligonucleotide move-
ment between beads is rare.
(2)
Antibody–bead dissociation. We found that beads that were
not coupled to any antibodies were associated with little chro
-
matin (<0.5%; Extended Data Fig. 1d), indicating that antibody
movement between beads is rare.
(3)
Antibody–chromatin dissociation. We purified human and
mouse chromatin using differentially labeled beads, mixed
them together and observed minimal levels of chromatin as-
signed to the bead type of the incorrect species (4–6%; Extend-
ed Data Fig. 1e), indicating that the vast majority of antibody–
chromatin interactions (>88–92%) remain intact throughout
the ChIP-DIP procedure.
Together, these results demonstrate that chromatin–antibody–
bead–oligonucleotide conjugates remain intact throughout the
ChIP-DIP procedure, enabling accurate multiplexed protein–DNA
assignment (we discuss additional technical validations of ChIP-DIP
in Supplementary Note 2 and the related Supplementary Figs. 1–3).
ChIP-DIP maps protein–DNA interactions in diverse pools
To test whether ChIP-DIP can accurately map genome-wide protein
localization, we performed ChIP-DIP in human K562 cells using four
well-studied proteins: (1) the CTCF sequence-specific DNA binding
protein that binds to insulator sequences
41
, (2) the histone H3 lysine
4 (H3K4) trimethylation (H3K4me3) modification that localizes at
the promoters of active genes
14
,
42
, (3) the RNA polymerase (RNAP) II
enzyme that transcribes RNA
43
and (4) the histone H3 lysine 27 (H3K27)
trimethylation (H3K27me3) modification that accumulates over
broad genomic regions that are associated with Polycomb-mediated
transcriptional repression
14
,
42
(Supplementary Table 1). We observed
and histone modifications (for example, ChIP followed by sequenc
-
ing (ChIP–seq))
12
–
15
, improved binding site resolution (ChIP-exo)
16
,
17
,
increased sample throughput (for example, through automation
and/or sample pooling)
18
,
19
and enabled mapping within limited num-
bers of cells (for example, cleavage under targets and release using
nuclease (CUT&RUN) and cleavage under targets and tagmentation
(CUT&Tag))
20
–
22
. Yet, while these innovations have uncovered critical
insights into gene regulation, most work by studying a single protein
at a time. The few exceptions are multiplexed versions of CUT&Tag,
which can measure up to three proteins in a single experiment
23
. How
-
ever, these approaches are not readily scalable to larger numbers of
proteins
23
–
25
and are primarily limited to mapping modified histones
and other highly abundant proteins but not most TFs and chromatin
regulators
26
. In contrast to CUT&Tag methods, CUT&RUN can map
many TFs and regulatory proteins, but it is not amenable to multiplexed
mapping of more than one protein at a time
27
. Due to the large number
of distinct regulatory proteins involved and the cell type-specific nature
of their interactions, constructing a comprehensive map of regulatory
factors to dissect gene regulation remains a challenge using existing
approaches. Initial attempts to overcome this led to the formation
of various international consortia that generated reference maps of
hundreds of proteins within a small number of cell types (ENCODE
28
,
PsychENCODE
29
, ImmGen
30
, etc.). Although these efforts have provided
many critical insights
31
–
33
, it is not possible to study cell type-specific
regulation using maps generated from reference cell lines because
protein binding maps and gene expression programs are intrinsically
cell type specific
34
–
36
. To date, most mammalian cell types, model
organisms and experimental models remain uncharacterized because
generating additional cell type-specific regulatory maps using cur
-
rent approaches requires thousands of individual experiments for
each cell type. Accordingly, there is a clear need for a highly scalable,
multiplexed protein profiling method that can increase throughput
of protein mapping by orders of magnitude and profile the diverse
categories of DNA-associated proteins, including classes that have been
traditionally easier to map (for example, modified histones) and those
that have been more challenging (for example, TFs)
37
. Such a method
would allow any laboratory to generate comprehensive maps for any
cell type of interest in a rapid and cost-effective manner and would
enable exploration of key questions that is not currently possible.
Results
Chromatin immunoprecipitation done in parallel enables
multiplexed mapping of DNA-associated proteins
To enable highly multiplexed, genome-wide mapping of hundreds
of DNA-associated proteins in a single experiment, we developed
chromatin immunoprecipitation done in parallel (ChIP-DIP) (Fig.
1a
,
Supplementary Notes 1 and 2 and related Extended Data Fig. 1, and
Supplementary Figs. 1–3). ChIP-DIP works by (1) using a rapid, modu-
lar approach to couple individual antibodies to beads containing a
unique oligonucleotide tag (Extended Data Fig. 1a), (2) combining
Fig. 1 | ChIP-DIP is a highly multiplexed method for mapping proteins to
genomic DNA.
a
, Schematic of the ChIP-DIP method. (1) Beads are coupled with
an antibody and labeled with the associated oligonucleotide (oligo) tag (antibody
ID). (2) Sets of antibody–bead–oligonucleotide conjugates are then mixed
(antibody–bead pool) and used to perform ChIP. (3) Multiple rounds of split-
and-pool barcoding are performed to identify molecules associated with each
chromatin–antibody–bead–oligonucleotide conjugate. (4) DNA is sequenced,
and genomic DNA and antibody (Ab)–oligonucleotide containing the same split-
and-pool barcode are grouped into a cluster, which are used to assign genomic
DNA regions to their linked antibodies. (5) All DNA reads from all clusters
corresponding to the same antibody are used to generate protein localization
maps.
b
, Protein localization maps over a specific human genomic region (hg38,
chromosome (chr)12:53,649,999–54,650,000) for four protein targets: CTCF,
H3K4me3, RNAP II and H3K27me3. Left, protein localization generated by ChIP-
DIP in K562 cells. Top track shows read coverage before protein assignment,
and the bottom four tracks correspond to read coverage after assignment to
individual proteins. Right, ChIP–seq data generated by ENCODE in K562 cells
for these same four proteins are shown for the same region. To enable direct
comparison of scales between datasets, we normalized the scale to coverage per
million aligned reads. Scale is shown from zero to maximum coverage within
each region.
c
, Comparison of ChIP-DIP and ChIP–seq maps over specific regions
corresponding to magnified views of the larger region shown in
b
. The locations
presented are demarcated by colored bars above the gene track in
b
. Scale shown
is like that in
b
.
d
, Genome-wide comparison (density plots of signal correlation)
between the localization of each individual protein measured by ChIP-DIP (
x
axis)
or ChIP–seq (
y
axis). Points are measured genome wide across 10-kb windows
(CTCF, H3K27me3) or all promoter intervals (H3K4me3, RNAP II).
Nature Genetics
| Volume 56 | December
2024 | 2827–2841
2829
Technical Report
https://doi.org/10.1038/s41588-024-02000-5
localization patterns that are highly comparable at specific genomic
sites (Fig.
1b,c
) and strongly correlated genome wide (
r
= 0.837–0.956;
Fig.
1d
) to ChIP–seq profiles generated by the ENCODE consortium
28
,
31
,
44
(Supplementary Table 2).
Because there are many hundreds of regulatory proteins, we
explored whether ChIP-DIP could generate maps for large pools of
distinct proteins. We considered two possibilities that might limit the
scale of ChIP-DIP. (1) As the size of each pool increases, the background
levels of immunoprecipitated chromatin might increase and obscure
our ability to generate high-quality binding maps for individual pro
-
teins (‘pool size’). (2) If multiple proteins bind to similar DNA regions,
this might deplete the associated chromatin and preclude our ability
to accurately map each protein. In this way, the exact composition of
the antibody pool used might impact the maps obtained for an indi
-
vidual protein (‘pool composition’). To explore these possibilities, we
analyzed the genome-wide profiles of the same four proteins (CTCF,
D
C
B
A
ChIP
>5 rounds split-pool tagging
Sequence + assign
Analyze
Protein tagging
Antibody ID
oligo
a
Protein 1
Protein 2
Protein 3
Protein 4
Split
Split
Pool
Pool
Antibody–bead pool
Split
Split
Lyse
Lyse
Round 1
Round 2
Antibody
Bead
B
A
DNA
reads
DNA
reads
Oligo
reads
Oligo
reads
Barcode
DNA
Ab ID
barcode
A
C
B
D
Cross-linked,
sonicated chromatin
96-well plate
A
B
C
Tag IDs
A
Tag
Oligo
DNA
Protein
A
Cluster
Cluster
b
d
Target assignment
(0–5.8)
(0–18.9)
(0–10.0)
(0–27.3)
chr12:53,649,999–54,650,000
ChIP-DIP
(0–8.9)
(0–6.7)
(0–22.7)
chr12:53,649,999–54,650,000
ChIP–seq (ENCODE)
10
3
10
2
10
1
10
0
r
= 0.843
10
3
10
2
10
1
10
0
CTCF
ChIP–seq
log
10
(signal)
10
3
10
2
10
1
10
0
ChIP–seq
log
10
(signal)
10
3
10
4
10
2
10
1
10
0
ChIP–seq
log
10
(signal)
10
3
10
2
10
1
10
0
ChIP–seq
log
10
(signal)
ChIP-DIP
log
10
(signal)
10
3
10
2
10
1
10
0
10
4
r
= 0.956
H3K4me3
ChIP-DIP
log
10
(signal)
10
3
10
2
10
1
10
0
r
= 0.898
RNAP II
ChIP-DIP
log
10
(signal)
10
3
10
2
10
1
10
0
r
= 0.837
H3K27me3
ChIP-DIP
log
10
(signal)
c
CTCF
H3K4me3
2
3
4
RNAP II
H3K27me3
(0–3.4)
(0–9.7)
ATP5MC2
1,000 bp
(0–2.6)
(0–4.3)
HOXC-AS1
HOXC9
1,000 bp
(0–19.0)
(0–22.2)
HNRNPA1
250 bp
(0–0.7)
(0–0.8)
10 kb
ATP5MC2
CALCOCO1
ChIP–seq
ChIP-DIP
1
CTCF
H3K4me3
H3K27me3
RNAP II
All reads
CTCF
H3K4me3
H3K27me3
RNAP II
No deconvolution
(0–0.8)
ATP5MC2
HOXC gene cluster
HNRNPA1
GTSF1
DCD
(0–0.7)
ATP5MC2
HOXC gene cluster
HNRNPA1
GTSF1
DCD
1
2
3
4
1
2
3
4
Nature Genetics
| Volume 56 | December
2024 | 2827–2841
2830
Technical Report
https://doi.org/10.1038/s41588-024-02000-5
a
b
c
f
g
d
h
Ab1
e
(0–6.6)
(0–6.1)
(0–13.5)
(0–10.8)
FOXO1
TPTE2P5
WBP4
NAA16
VWA8
DGKH
(0–8.2)
(0–25.6)
(0–21.8)
(0–16.8)
ATP5MC2
CALCOCO1
CISTR
H3K4me3
CTCF
(0–20.7)
(0–7.9)
(0–16.2)
(0–12.6)
(0–3.4)
(0–3.3)
(0–6.1)
(0–6.7)
VWA8
VWA8-AS1
PLD3
BLVRB
SPTBN4
LTBP4
NUMBL
SNRPA
CYP2A6
CYP2B6
Ab2
Ab1
(0–19.2)
(0–15.3)
(0–20.4)
200 kb
2 kb
10 kb
0.5 kb
H3K4me3
1
(0–7.3)
(0–10.2)
(0–11.3)
(0–8.9)
KLC3
FOSB
OPA3
EML2
GIPR
BHMG1
SYMPK
IRF2BP1
NOVA2
IGFL4
Pool size
45M
5M
500k
50k
100 kb
Cell input
45M
5M
500k
50k
Cell input
Pool size
10
50
100 kb
CTCF
Cell input
500k cells
~14K cells/target
50k cells
~1,400 cells/target
~1.3M cells/target
45M cells
~0.14M cells/target
5M cells
H3K4me3
H3K27me3
CTCF
RNAP II
H3K4me3
H3K27me3
CTCF
RNAP II
45M
45M
45M
45M
45M
45M
5M
5M
5M
5M
5M
5M
500k
500k
500k
500k
500k
500k
50k
50k
50k
50k
50k
50k
45M
45M
45M
45M
45M
45M
5M
5M
5M
5M
5M
5M
500k
500k
500k
500k
500k
500k
50k
50k
50k
50k
50k
50k
1.00
0.75
0.50
0.25
0
–0.25
–0.50
–0.75
–1.00
Pearson correlation coeicient
1.00
0.75
0.50
0.25
0
–0.25
–0.50
–0.75
–1.00
H3K4me3
H3K27me3
CTCF
RNAP II
Pearson correlation coeicient
1
1
1
1
10
10
10
10
35
35
35
35
35
35
50
50
50
52
52
H3K4me3
H3K27me3
CTCF
RNAP II
1
1
1
1
10
10
10
10
35
35
35
35
35
35
50
50
50
52
52
10
35
50
Pool size
1
10
35
50
1,000
1
52
Fig. 2 | ChIP-DIP accurately maps known protein–DNA interactions across
a range of multiplexed protein numbers, protein compositions and cell
numbers.
a
, Schematic of the experimental design to test the scalability of
antibody–bead pool size and composition.
b
, Correlation heatmap for protein
localization maps of 4 proteins (CTCF, H3K4me3, RNAP II and H3K27me3)
generated using antibody pools of 5 different sizes (1, 10, 35, 50 and 52 antibodies
per pool) and compositions. Correlations were calculated over the set of regions
corresponding to the union of all peaks called for any of the four targets in the
K562 ten-antibody experiment and were calculated using the background-
corrected ChIP-DIP signal for each sample (Methods). Pool sizes are listed along
the top and left axes. Replicate proteins in the same pool indicate that a different
antibody was used for that protein. Some proteins were not included in every
pool.
c
, Comparison of H3K4me3 localization over a specific genomic region
(hg38, chr19:45,345,500–46,045,500) when measured within various antibody
pool sizes and compositions. Scale is normalized to coverage per million aligned
reads.
d
, Comparison of CTCF localization over a specific genomic region
(hg38, chr19:40,349,999–41,050,000) when measured within a pool of 10
antibodies containing a single CTCF-targeting antibody (top) or within a pool
of 52 antibodies containing 2 different CTCF-targeting antibodies (bottom).
Scale is normalized to coverage per million aligned reads.
e
, Schematic of the
experimental design to test the amount of cell input required for ChIP-DIP. k,
thousand; M, million.
f
, Correlation heatmap for protein localization maps of
four targets (CTCF, H3K4me3, RNAP II and H3K27me3) generated using various
amounts of input cell lysate. Correlations were calculated over the same set
of regions as
b
and using the background-corrected ChIP-DIP signal for each
sample (Methods). Amounts of input cell lysate are listed along the top and left
axes.
g
, Comparison of H3K4me3 localization over a specific genomic region
(hg38, chr13:40,600,000–42,300,000) when measured using various amounts
of input cell lysate. Scale is normalized to coverage per million aligned reads.
h
, Comparison of CTCF localization over a specific genomic region (hg38,
chr12:53,664,000–53,764,000) when measured using various amounts of input
cell lysate. Scale is normalized to coverage per million aligned reads.