Characterization of human transcription factor function and patterns of gene regulation in HepG2 cells
Abstract
Transcription factors (TFs) aretrans-acting proteins that bindcis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.
Copyright and License
Acknowledgement
We thank Jessie Engreitz for providing ABC model predictions (Fulco et al. 2019) specific to HepG2 data for use in this paper, and thank Jill Moore and Zhiping Weng for providing the V4 cCRE annotations. We additionally thank Sara Cooper, Greg Cooper, and Nick Cochran for helpful conversations and valuable feedback. We also thank the ENCODE Consortium for providing v4 cCRE calls to us for analyses and access to ChIP-seq and ATAC-seq data. Special thanks go to Cathleen Shaw for assistance in producing figures (Fig. 3C was made with BioRender). This work was funded by National Institutes of Health (NIH) grant UM1HG009411 to R.M.M. and E.M.M., funds from The HudsonAlpha Institute for Biotechnology and by NIH UM1HG009443, and the Bren Chair and Caltech Merkin Institute to B.J.W.
Data Availability
All raw and processed MPRA and ChIP-seq sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE235360 and GSE235477, respectively. Code is provided as Supplemental Code, and both code and relevant data for the creation of plots are available at GitHub (https://github.com/bmoyers/Moyers_et_al_2023_HepG2_TF/).
Conflict of Interest
The authors declare no competing interests.
Files
Additional details
- ISSN
- 1549-5469
- PMCID
- PMC10760452
- National Institutes of Health
- UM1HG009411
- National Institutes of Health
- UM1HG009443
- HudsonAlpha Institute for Biotechnology
- California Institute of Technology
- Richard N. Merkin Institute for Translational Medicine
- California Institute of Technology
- Bren Professor of Molecular Biology
- Caltech groups
- Division of Biology and Biological Engineering, Richard N. Merkin Institute for Translational Research