CaltechAUTHORS
  A Caltech Library Service

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Trapnell, Cole and Williams, Brian A. and Pertea, Geo and Mortazavi, Ali and Kwan, Gordon and van Baren, Marijke J. and Salzberg, Steven L. and Wold, Barbara J. and Pachter, Lior (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28 (5). pp. 511-515. ISSN 1087-0156. PMCID PMC3146043. https://resolver.caltech.edu/CaltechAUTHORS:20100601-111602154

[img] PDF - Accepted Version
See Usage Policy.

976Kb
[img] MS Excel (Genes with complex isoform expression dynamics in C2C12 myogenesis) - Supplemental Material
See Usage Policy.

80Kb
[img] Archive (ZIP) (Supplementary Data) - Supplemental Material
See Usage Policy.

5Mb
[img]
Preview
PDF (Supplementary Tables 1–3, Supplementary Figs. 1–11 and Supplementary Methods) - Supplemental Material
See Usage Policy.

2058Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20100601-111602154

Abstract

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1038/nbt.1621DOIArticle
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146043/PubMed CentralArticle
http://rdcu.be/pQoePublisherFree ReadCube access
ORCID:
AuthorORCID
Mortazavi, Ali0000-0002-4259-6362
Wold, Barbara J.0000-0003-3235-8130
Pachter, Lior0000-0002-9164-6231
Alternate Title:Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms
Additional Information:© 2010 Nature Publishing Group. Received 02 February 2010; Accepted 22 March 2010; Published online 02 May 2010. This work was supported in part by the US National Institutes of Health (NIH) grants R01-LM006845 and ENCODE U54-HG004576, as well as the Beckman Foundation, the Bren Foundation, the Moore Foundation (Cell Center Program) and the Miller Research Institute. We thank I. Antosechken and L. Schaeffer of the Caltech Jacobs Genome Center for DNA sequencing, and D. Trout, B. King and H. Amrhein for data pipeline and database design, operation and display. We are grateful to R. K. Bradley, K. Datchev, I. Hallgrímsdóttir, J. Landolin, B. Langmead, A. Roberts, M. Schatz and D. Sturgill for helpful discussions. Author Contributions: C.T. and L.P. developed the mathematics and statistics and designed the algorithms; B.A.W. and G.K. performed the RNA-Seq and B.A.W. designed and executed experimental validations; C.T. implemented Cufflinks and Cuffdiff; G.P. implemented Cuffcompare; M.J.v.B. and A.M. tested the software; C.T., G.P. and A.M. performed the analysis; L.P., A.M. and B.J.W. conceived the project; C.T., L.P., A.M., B. J.W. and S.L.S. wrote the manuscript. Software availability. TopHat (http://tophat.cbcb.umd.edu) is freely available as source code. It takes a reference genome (as a Bowtie29 index) and RNA-Seq reads as FASTA or FASTQ and produces alignments in SAM30 format. TopHat is distributed under the Artistic License and runs on Linux and Mac OS X. The Cufflinks assembler and abundance estimation algorithms (http://cufflinks.cbcb.umd.edu/) are open-source C++ programs and are freely available in both source and binary. The package includes the assembler along with utilities to structurally compare Cufflinks output between samples (Cuffcompare) and to perform differential expression testing (Cuffdiff). Cufflinks is distributed under the Boost License and runs on Linux and Mac OS X. The source code for Cufflinks version 0.8.0 is provided in Supplementary Data 3. The authors declare no competing financial interests.
Funders:
Funding AgencyGrant Number
NIHR01-LM006845
NIHENCODE U54-HG004576
Arnold and Mabel Beckman FoundationUNSPECIFIED
Bren FoundationUNSPECIFIED
Gordon and Betty Moore FoundationUNSPECIFIED
Miller Institute for Basic Research in ScienceUNSPECIFIED
Issue or Number:5
PubMed Central ID:PMC3146043
Record Number:CaltechAUTHORS:20100601-111602154
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20100601-111602154
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:18505
Collection:CaltechAUTHORS
Deposited By: Jason Perez
Deposited On:28 Jun 2010 16:28
Last Modified:29 Oct 2019 23:11

Repository Staff Only: item control page