CaltechAUTHORS
  A Caltech Library Service

Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

Roberts, Michael and Zimin, Aleksey V. and Hayes, Wayne and Hunt, Brian R. and Ustun, Cevat and White, James R. and Havlak, Paul and Yorke, James (2008) Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps. PLoS ONE, 3 (3). e1836. ISSN 1932-6203. PMCID PMC2266800. doi:10.1371/journal.pone.0001836. https://resolver.caltech.edu/CaltechAUTHORS:ROBplosone08

[img]
Preview
PDF - Published Version
Creative Commons Attribution.

210kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:ROBplosone08

Abstract

The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1371/journal.pone.0001836DOIArticle
http://www.ncbi.nlm.nih.gov/pmc/articles/pmc2266800/PubMed CentralArticle
Additional Information:© 2008 Roberts et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: October 15, 2007; Accepted: February 9, 2008; Published: March 19, 2008. We thank Phil Green for providing a copy of the Phrap software. Author Contributions: Conceived and designed the experiments: MR AZ WH BH JY CU. Performed the experiments: MR AZ WH JW CU. Analyzed the data: MR PH AZ WH JW CU. Contributed reagents/materials/analysis tools: PH. Wrote the paper: AZ WH JY. Other: Headed the project: JY. PI on the grant: JY. Funding: This work was supported under NSF grant DMS0616585 and under NIH grant 1R01HG0294501. Competing interests: The authors have declared that no competing interests exist.
Issue or Number:3
PubMed Central ID:PMC2266800
DOI:10.1371/journal.pone.0001836
Record Number:CaltechAUTHORS:ROBplosone08
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:ROBplosone08
Official Citation:Roberts M, Zimin AV, Hayes W, Hunt BR, Ustun C, et al. (2008) Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps. PLoS ONE 3(3): e1836. doi:10.1371/journal.pone.0001836
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:10401
Collection:CaltechAUTHORS
Deposited By: Archive Administrator
Deposited On:02 May 2008
Last Modified:08 Nov 2021 21:07

Repository Staff Only: item control page