CaltechAUTHORS
  A Caltech Library Service

SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model

Alexandersson, Marina and Cawley, Simon and Pachter, Lior (2003) SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model. Genome Research, 13 (3). pp. 496-502. ISSN 1088-9051. PMCID PMC430255. doi:10.1101/gr.424203. https://resolver.caltech.edu/CaltechAUTHORS:20170308-154151410

[img] PDF - Published Version
Creative Commons Attribution Non-commercial.

237kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170308-154151410

Abstract

Comparative-based gene recognition is driven by the principle that conserved regions between related organisms are more likely than divergent regions to be coding. We describe a probabilistic framework for gene structure and alignment that can be used to simultaneously find both the gene structure and alignment of two syntenic genomic regions. A key feature of the method is the ability to enhance gene predictions by finding the best alignment between two syntenic sequences, while at the same time finding biologically meaningful alignments that preserve the correspondence between coding exons. Our probabilistic framework is the generalized pair hidden Markov model, a hybrid of (1) generalized hidden Markov models, which have been used previously for gene finding, and (2) pair hidden Markov models, which have applications to sequence alignment. We have built a gene finding and alignment program called SLAM, which aligns and identifies complete exon/intron structures of genes in two related but unannotated sequences of DNA. SLAM is able to reliably predict gene structures for any suitably related pair of organisms, most notably with fewer false-positive predictions compared to previous methods (examples are provided for Homo sapiens/Mus musculus andPlasmodium falciparum/Plasmodium vivax comparisons). Accuracy is obtained by distinguishing conserved noncoding sequence (CNS) from conserved coding sequence. CNS annotation is a novel feature of SLAM and may be useful for the annotation of UTRs, regulatory elements, and other noncoding features.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1101/gr.424203DOIArticle
http://genome.cshlp.org/content/13/3/496.abstractPublisherArticle
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC430255/PubMed CentralArticle
ORCID:
AuthorORCID
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2003 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, http://creativecommons.org/licenses/by-nc/4.0/). Received May 13, 2002. Accepted December 3, 2002. We thank Terry Speed and David Kulp for helpful suggestions and support, and James Harley Gorrell for technical computing advice. Marina Alexandersson was supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education. This work was partially supported by NIH grant R01 HG02362-01. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.
Funders:
Funding AgencyGrant Number
Swedish Foundation for International Cooperation in Research and Higher Education (STINT)UNSPECIFIED
NIHR01-HG02362-01
Issue or Number:3
PubMed Central ID:PMC430255
DOI:10.1101/gr.424203
Record Number:CaltechAUTHORS:20170308-154151410
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20170308-154151410
Official Citation:SLAM: Cross-Species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model Marina Alexandersson, Simon Cawley, and Lior Pachter Genome Res. March 1, 2003 13: 496-502; Published in Advance February 12, 2003, doi:10.1101/gr.424203
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74938
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:09 Mar 2017 16:01
Last Modified:15 Nov 2021 16:29

Repository Staff Only: item control page