A Caltech Library Service

HMM sampling and applications to gene finding and alternative splicing

Cawley, Simon L. and Pachter, Lior (2003) HMM sampling and applications to gene finding and alternative splicing. Bioinformatics, 19 (Suppl 2). ii36-ii41. ISSN 1367-4803. doi:10.1093/bioinformatics/btg1057.

[img] PDF - Published Version
See Usage Policy.


Use this Persistent URL to link to this item:


The standard method of applying hidden Markov models to biological problems is to find a Viterbi (maximal weight) path through the HMM graph. The Viterbi algorithm reduces the problem of finding the most likely hidden state sequence that explains given observations, to a dynamic programming problem for corresponding directed acyclic graphs. For example, in the gene finding application, the HMM is used to find the most likely underlying gene structure given a DNA sequence. In this note we discuss the applications of sampling methods for HMMs. The standard sampling algorithm for HMMs is a variant of the common forward-backward and backtrack algorithms, and has already been applied in the context of Gibbs sampling methods. Nevetheless, the practice of sampling state paths from HMMs does not seem to have been widely adopted, and important applications have been overlooked. We show how sampling can be used for finding alternative splicings for genes, including alternative splicings that are conserved between genes from related organisms. We also show how sampling from the posterior distribution is a natural way to compute probabilities for predicted exons and gene structures being correct under the assumed model. Finally, we describe a new memory efficient sampling algorithm for certain classes of HMMs which provides a practical sampling alternative to the Hirschberg algorithm for optimal alignment. The ideas presented have applications not only to gene finding and HMMs but more generally to stochastic context free grammars and RNA structure prediction.

Item Type:Article
Related URLs:
URLURL TypeDescription
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2003 Oxford University Press. Received on March 17, 2003; accepted on June 9, 2003. The authors would like to thank Michael Siani-Rose for useful discussions. This work was partially supported by NIH grant R01-HG02362-01.
Funding AgencyGrant Number
Subject Keywords:suboptimal parses, sampling, hidden Markov model, conserved alternative splicing
Issue or Number:Suppl 2
Record Number:CaltechAUTHORS:20170308-151105303
Persistent URL:
Official Citation:Simon L. Cawley, Lior Pachter; HMM sampling and applications to gene finding and alternative splicing . Bioinformatics 2003; 19 (suppl_2): ii36-ii41. doi: 10.1093/bioinformatics/btg1057
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74929
Deposited By: George Porter
Deposited On:09 Mar 2017 15:28
Last Modified:15 Nov 2021 16:29

Repository Staff Only: item control page