CaltechAUTHORS
  A Caltech Library Service

Shape-based peak identification for ChIP-Seq

Hower, Valerie and Evans, Steven N. and Pachter, Lior (2011) Shape-based peak identification for ChIP-Seq. BMC Bioinformatics, 12 . Art. No. 15. ISSN 1471-2105. PMCID PMC3032669. doi:10.1186/1471-2105-12-15. https://resolver.caltech.edu/CaltechAUTHORS:20170306-111327579

[img] PDF - Published Version
Creative Commons Attribution.

1MB
[img] PDF (Authors’ original file for figure 1) - Supplemental Material
Creative Commons Attribution.

38kB
[img] PDF (Authors’ original file for figure 2) - Supplemental Material
Creative Commons Attribution.

164kB
[img] PDF (Authors’ original file for figure 3) - Supplemental Material
Creative Commons Attribution.

268kB
[img] PDF (Authors’ original file for figure 4) - Supplemental Material
Creative Commons Attribution.

125kB
[img] PDF (Authors’ original file for figure 5) - Supplemental Material
Creative Commons Attribution.

164kB
[img] PDF (Authors’ original file for figure 6) - Supplemental Material
Creative Commons Attribution.

164kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170306-111327579

Abstract

Background: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call "peaks" representing bound regions from mapped reads. Most current algorithms incorporate multiple heuristics, and despite much work it remains difficult to accurately determine individual peaks corresponding to distinct binding events. Results: Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is statistically sound and robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We validate our approach using previously published data and show that it can discover previously missed regions. Conclusions: The difficulty in accurately calling peaks for ChIP-Seq data is partly due to the difficulty in defining peaks, and we demonstrate a novel method that improves on the accuracy of previous methods in resolving peaks. Our introduction of a robust statistical test based on ideas from topological data analysis is also novel. Our methods are implemented in a program called T-PIC (T ree shape P eak I dentification for C hIP-Seq) is available at http://bio.math.berkeley.edu/tpic/.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://dx.doi.org/10.1186/1471-2105-12-15DOIArticle
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-15PublisherArticle
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032669/PubMed CentralArticle
https://arxiv.org/abs/1005.0793arXivDiscussion Paper
ORCID:
AuthorORCID
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2011 Hower et al; licensee BioMed Central Ltd. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Received: 3 June 2010. Accepted: 12 January 2011. Published: 12 January 2011. SNE is supported in part by NSF grant DMS-0907630 and VH is funded by NSF fellowship DMS-0902723. Authors' contributions: LP proposed the problem of using the shape of a putative peak to determine binding sites in ChIP-Seq. SNE developed the probability theory. VH explored ideas from topological data analysis, implemented the algorithm, and analyzed the ChIP-Seq data. VH, SNE and LP worked together to develop the peak calling algorithm, and all contributed to writing the manuscript. All authors read and approved the final manuscript.
Funders:
Funding AgencyGrant Number
NSFDMS-0907630
NSF Graduate Research FellowshipDMS-0902723
PubMed Central ID:PMC3032669
DOI:10.1186/1471-2105-12-15
Record Number:CaltechAUTHORS:20170306-111327579
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20170306-111327579
Official Citation:Hower V, Evans SN, Pachter L. Shape-based peak identification for ChIP-Seq. BMC Bioinformatics. 2011;12:15. doi:10.1186/1471-2105-12-15.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74785
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:06 Mar 2017 20:45
Last Modified:11 Nov 2021 05:29

Repository Staff Only: item control page