Hidden Markov models of biological primary sequence information
Abstract
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN^2) operations, linear in the number of sequences.
Additional Information
© 1994 National Academy of Sciences. Communicated by Leroy Hood, October 12, 1993 (received for review January 14, 1993).
Attached Files
Published - PNAS-1994-Baldi-1059-63.pdf
Files
Name | Size | Download all |
---|---|---|
md5:daa8a4e57a3d19ee8869e9cd13153276
|
1.2 MB | Preview Download |
Additional details
- Eprint ID
- 52490
- DOI
- 10.1073/pnas.91.3.1059
- Resolver ID
- CaltechAUTHORS:20141209-084332388
- PMCID
- PMC521453
- Created
-
2014-12-09Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field