CaltechAUTHORS
A Caltech Library Service

Speeding up HMM decoding and training by exploiting sequence repetitions

Lifshits, Yury and Mozes, Shay and Weimann, Oren and Ziv-Ukelson, Michal (2009) Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica, 54 (3). pp. 379-399. ISSN 0178-4617 http://resolver.caltech.edu/CaltechAUTHORS:20090731-105215144

[img] PDF - Published Version
Restricted to Repository administrators only
See Usage Policy.

473Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20090731-105215144

Abstract

We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi’s decoding and training algorithms (IEEE Trans. Inform. Theory IT-13:260–269, 1967), as well as to the forward-backward and Baum-Welch (Inequalities 3:1–8, 1972) algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. Initially, we show how to exploit repetitions of all sufficiently small substrings (this is similar to the Four Russians method). Then, we describe four algorithms based alternatively on run length encoding (RLE), Lempel-Ziv (LZ78) parsing, grammar-based compression (SLP), and byte pair encoding (BPE). Compared to Viterbi’s algorithm, we achieve speedups of Θ(log n) using the Four Russians method, Ω(r/log r)using RLE, Ω(log n/k) using LZ78, Ω(r/k) using SLP, and Ω(r) using BPE, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. We also discuss a parallel implementation of our algorithms.


Item Type:Article
Additional Information:© 2009 Springer. Received: 10 June 2007. Accepted: 5 November 2007. Published online: 28 November 2007. A preliminary version of this paper appeared in Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 4–15, 2007. Y. Lifshits’ research was supported by the Center for the Mathematics of Information and the Lee Center for Advanced Networking.
Funders:
Funding AgencyGrant Number
Center for the Mathematics of Information, CaltechUNSPECIFIED
Lee Center for Advanced Networking, CaltechUNSPECIFIED
Subject Keywords:HMM; Viterbi; Dynamic programming; Compression
Record Number:CaltechAUTHORS:20090731-105215144
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20090731-105215144
Related URLs:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:14757
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:20 Aug 2009 21:29
Last Modified:26 Dec 2012 11:07

Repository Staff Only: item control page