CaltechAUTHORS
  A Caltech Library Service

RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition

Mao, Shunfu and Pachter, Lior and Tse, David and Kannan, Sreeram (2020) RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition. PLoS ONE, 15 (6). Art. No. e0232946. ISSN 1932-6203. PMCID PMC7266320. https://resolver.caltech.edu/CaltechAUTHORS:20200602-124021279

[img] PDF - Published Version
Creative Commons Attribution.

1549Kb
[img] PDF (S1 File. Compare Shannon with StringTie) - Supplemental Material
Creative Commons Attribution.

112Kb
[img] PDF (S2 File. RefShannon algorithm details) - Supplemental Material
Creative Commons Attribution.

266Kb
[img] PDF (S3 File. Additional comparisons among different assemblers) - Supplemental Material
Creative Commons Attribution.

114Kb
[img] PDF (S4 File. Parameter setting for different assemblers) - Supplemental Material
Creative Commons Attribution.

44Kb
[img] PDF (S5 File. Different thresholds on sensitivity and false positive) - Supplemental Material
Creative Commons Attribution.

223Kb
[img] PDF (S6 File. Comparison of assembly performance (ROC) using different aligners) - Supplemental Material
Creative Commons Attribution.

113Kb
[img] PDF (S7 File. Compare memory and time consumption of RefShannon to other assemblers) - Supplemental Material
Creative Commons Attribution.

112Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200602-124021279

Abstract

High throughput sequencing of RNA (RNA-Seq) has become a staple in modern molecular biology, with applications not only in quantifying gene expression but also in isoform-level analysis of the RNA transcripts. To enable such an isoform-level analysis, a transcriptome assembly algorithm is utilized to stitch together the observed short reads into the corresponding transcripts. This task is complicated due to the complexity of alternative splicing - a mechanism by which the same gene may generate multiple distinct RNA transcripts. We develop a novel genome-guided transcriptome assembler, RefShannon, that exploits the varying abundances of the different transcripts, in enabling an accurate reconstruction of the transcripts. Our evaluation shows RefShannon is able to improve sensitivity effectively (up to 22%) at a given specificity in comparison with other state-of-the-art assemblers. RefShannon is written in Python and is available from Github (https://github.com/shunfumao/RefShannon).


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1371/journal.pone.0232946DOIArticle
https://doi.org/10.1371/journal.pone.0232946.s001DOIS1 File
https://doi.org/10.1371/journal.pone.0232946.s002DOIS2 File
https://doi.org/10.1371/journal.pone.0232946.s003DOIS3 File
https://doi.org/10.1371/journal.pone.0232946.s004DOIS4 File
https://doi.org/10.1371/journal.pone.0232946.s005DOIS5 File
https://doi.org/10.1371/journal.pone.0232946.s006DOIS6 File
https://doi.org/10.1371/journal.pone.0232946.s007DOIS7 File
http://www.ncbi.nlm.nih.gov/pmc/articles/pmc7266320/PubMed CentralArticle
ORCID:
AuthorORCID
Mao, Shunfu0000-0002-8203-0507
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2020 Mao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: October 18, 2019; Accepted: April 24, 2020; Published: June 2, 2020. The authors would like to thank Joseph Hui and Kayvon Mazooji for their support at the initial stage of the project. Data Availability Statement: All relevant data are within the manuscript and its Supporting Information files. This project is funded by NIH award 1R01HG008164, NSF CCF-1651236, and NSF CIF-1703403. The authors have declared that no competing interests exist. Author Contributions: Conceptualization: Lior Pachter, David Tse, Sreeram Kannan. Data curation: Sreeram Kannan. Formal analysis: Shunfu Mao. Funding acquisition: Lior Pachter, David Tse, Sreeram Kannan. Investigation: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan. Methodology: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan. Project administration: Lior Pachter, David Tse, Sreeram Kannan. Software: Shunfu Mao, Sreeram Kannan. Supervision: Sreeram Kannan. Validation: Shunfu Mao. Visualization: Shunfu Mao. Writing – original draft: Shunfu Mao. Writing – review & editing: Shunfu Mao, Sreeram Kannan.
Funders:
Funding AgencyGrant Number
NIH1R01HG008164
NSFCCF-1651236
NSFCIF-1703403
Issue or Number:6
PubMed Central ID:PMC7266320
Record Number:CaltechAUTHORS:20200602-124021279
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200602-124021279
Official Citation:Mao S, Pachter L, Tse D, Kannan S (2020) RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition. PLoS ONE 15(6): e0232946. https://doi.org/10.1371/journal.pone.0232946
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:103638
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:02 Jun 2020 20:02
Last Modified:06 Jul 2020 23:10

Repository Staff Only: item control page