A Caltech Library Service

TopHat: discovering splice junctions with RNA-Seq

Trapnell, Cole and Pachter, Lior and Salzberg, Steven L. (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25 (9). pp. 1105-1111. ISSN 1367-4803. PMCID PMC2672628. doi:10.1093/bioinformatics/btp120.

[img] PDF - Published Version
Creative Commons Attribution Non-commercial.


Use this Persistent URL to link to this item:


Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2009 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. We thank Adam Phillippy, Geo Pertea, Ben Langmead, Kasper Hansen, Angela Brooks and Ali Mortazavi for helpful technical discussions. We thank Diane Trout, Ali Mortazavi, Brian Williams, Kenneth McCue, Lorian Schaeffer and Barbara Wold for making their data available for our case study. Funding: National Institues of Health (R01-LM06845, R01-GM083873 to S.L.S.); National Science Foundation (CCF 0347992 to L.P.). Conflict of Interest: none declared.
Funding AgencyGrant Number
Issue or Number:9
PubMed Central ID:PMC2672628
Record Number:CaltechAUTHORS:20170306-141357019
Persistent URL:
Official Citation:Cole Trapnell, Lior Pachter, Steven L. Salzberg; TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009; 25 (9): 1105-1111. doi: 10.1093/bioinformatics/btp120
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74802
Deposited By: George Porter
Deposited On:06 Mar 2017 23:00
Last Modified:11 Nov 2021 05:29

Repository Staff Only: item control page