Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Creators: Trapnell, Cole; Williams, Brian A.; Pertea, Geo; Mortazavi, Ali; Kwan, Gordon; van Baren, Marijke J.; Salzberg, Steven L.; Wold, Barbara J.; Pachter, Lior

Abstract

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

Additional Information

© 2010 Nature Publishing Group. Received 02 February 2010; Accepted 22 March 2010; Published online 02 May 2010. This work was supported in part by the US National Institutes of Health (NIH) grants R01-LM006845 and ENCODE U54-HG004576, as well as the Beckman Foundation, the Bren Foundation, the Moore Foundation (Cell Center Program) and the Miller Research Institute. We thank I. Antosechken and L. Schaeffer of the Caltech Jacobs Genome Center for DNA sequencing, and D. Trout, B. King and H. Amrhein for data pipeline and database design, operation and display. We are grateful to R. K. Bradley, K. Datchev, I. Hallgrímsdóttir, J. Landolin, B. Langmead, A. Roberts, M. Schatz and D. Sturgill for helpful discussions. Author Contributions: C.T. and L.P. developed the mathematics and statistics and designed the algorithms; B.A.W. and G.K. performed the RNA-Seq and B.A.W. designed and executed experimental validations; C.T. implemented Cufflinks and Cuffdiff; G.P. implemented Cuffcompare; M.J.v.B. and A.M. tested the software; C.T., G.P. and A.M. performed the analysis; L.P., A.M. and B.J.W. conceived the project; C.T., L.P., A.M., B. J.W. and S.L.S. wrote the manuscript. Software availability. TopHat (http://tophat.cbcb.umd.edu) is freely available as source code. It takes a reference genome (as a Bowtie29 index) and RNA-Seq reads as FASTA or FASTQ and produces alignments in SAM30 format. TopHat is distributed under the Artistic License and runs on Linux and Mac OS X. The Cufflinks assembler and abundance estimation algorithms (http://cufflinks.cbcb.umd.edu/) are open-source C++ programs and are freely available in both source and binary. The package includes the assembler along with utilities to structurally compare Cufflinks output between samples (Cuffcompare) and to perform differential expression testing (Cuffdiff). Cufflinks is distributed under the Boost License and runs on Linux and Mac OS X. The source code for Cufflinks version 0.8.0 is provided in Supplementary Data 3. The authors declare no competing financial interests.

Attached Files

Accepted Version - nihms190938.pdf

Supplemental Material - nbt.1621-S1.pdf

Supplemental Material - nbt.1621-S2.xls

Supplemental Material - nbt.1621-S3.zip

Files

nbt.1621-S1.pdf

Files (9.1 MB)

Name	Size	Download all
nbt.1621-S1.pdf md5:effe196ce2216bebe92996a126648783	2.1 MB	Preview Download
nbt.1621-S3.zip md5:ab58e5ae9197c9d2939ebd06a8effb89	5.9 MB	Preview Download
nihms190938.pdf md5:677da3db451c014dd7dfe22c51a1d518	999.9 kB	Preview Download
nbt.1621-S2.xls md5:5c2dd68f1fa19490f54dae70488f3f81	82.4 kB	Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes