CaltechAUTHORS
  A Caltech Library Service

SWALO: scaffolding with assembly likelihood optimization

Rahman, Atif and Pachter, Lior (2021) SWALO: scaffolding with assembly likelihood optimization. Nucleic Acids Research, 49 (20). Art. No. e117. ISSN 0305-1048. doi:10.1093/nar/gkab717. https://resolver.caltech.edu/CaltechAUTHORS:20210930-221100053

[img] PDF - Published Version
Creative Commons Attribution Non-commercial.

1MB
[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial.

1MB
[img] PDF (Supplementary data) - Supplemental Material
Creative Commons Attribution Non-commercial.

687kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210930-221100053

Abstract

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called SWALO with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. SWALO is freely available for download at https://atifrahman.github.io/SWALO/.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1093/nar/gkab717DOIArticle
https://doi.org/10.1101/081786DOIDiscussion Paper
https://atifrahman.github.io/SWALO/.Related ItemCode
ORCID:
AuthorORCID
Rahman, Atif0000-0003-1805-3971
Pachter, Lior0000-0002-9164-6231
Additional Information:© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. Received: 22 December 2020; Revision received: 16 June 2021; Accepted: 16 August 2021; Published: 20 August 2021. We thank Dan Rokhsar, Páll Melsted, Harold Pimentel, Shannon McCurdy and Nicolas Bray for helpful conversations during the development of SWALO. Funding: NIH [R01 HG006129 to L.P., in part]; Fulbright Science & Technology Fellowship [15093630 to A.R., in part]. Funding for open access charge: NIH [R01 HG006129]. Conflict of interest statement: None declared.
Funders:
Funding AgencyGrant Number
NIHR01 HG006129
Fulbright Foundation15093630
Issue or Number:20
DOI:10.1093/nar/gkab717
Record Number:CaltechAUTHORS:20210930-221100053
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210930-221100053
Official Citation:Atif Rahman, Lior Pachter, SWALO: scaffolding with assembly likelihood optimization, Nucleic Acids Research, Volume 49, Issue 20, 18 November 2021, Page e117, https://doi.org/10.1093/nar/gkab717
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:111143
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:04 Oct 2021 20:48
Last Modified:18 Nov 2021 22:53

Repository Staff Only: item control page