Peterson, Eric L. and Kondev, Jané and Theriot, Julie A. and Phillips, Rob (2009) Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics, 25 (11). pp. 1356-1362. ISSN 1367-4803 http://resolver.caltech.edu/CaltechAUTHORS:20090727-143233526
|
PDF
- Supplemental Material
Restricted to Repository administrators only See Usage Policy. 1484Kb | |
|
PDF
- Published Version
Restricted to Repository administrators only See Usage Policy. 194Kb |
Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20090727-143233526
Abstract
Motivation: Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by comparing protein sequences written in a reduced alphabet. Results: We tested over 150 of the amino acid clustering schemes proposed in the literature with all-versus-all pairwise sequence alignments of sequences in the Distance matrix alignment (DALI) database. We combined several metrics from information retrieval popular in the literature: mean precision, area under the Receiver Operating Characteristic curve and recall at a fixed error rate and found that, in contrast to previous work, reduced alphabets in many cases outperform full alphabets. We find that reduced alphabets can perform at a level comparable to full alphabets in correct pairwise alignment of sequences and can show increased sensitivity to pairs of sequences with structural similarity but low-sequence identity. Based on these results, we hypothesize that reduced alphabets may also show performance gains with more sophisticated methods such as profile and pattern searches.
| Item Type: | Article | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Additional Information: | © The Author (2009). Published by Oxford University Press. A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from http://www.rpgroup.caltech.edu/publications/supplements/alphabets. Funding: National Institutes of Health (Director’s Pioneer Award to RP); Department of Homeland Security (graduate fellowship to EP); National Science Foundation (DMR-0403997 to JK); Research Corporation (Cottrell Scholar to JK). The authors would like to thank Ralf Bundschuh, John Chodera, Ken Dill, Alexander Grosberg, Liisa Holm, Chris Myers, Eugene Shakhnovich, John Spouge, Peter Swain, Ned Wingreen, Chris Wiggins and Jasmine Zhou for helpful discussions and suggestions. | ||||||||||
| Funders: |
| ||||||||||
| Record Number: | CaltechAUTHORS:20090727-143233526 | ||||||||||
| Persistent URL: | http://resolver.caltech.edu/CaltechAUTHORS:20090727-143233526 | ||||||||||
| Related URLs: | |||||||||||
| Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||
| ID Code: | 14679 | ||||||||||
| Collection: | CaltechAUTHORS | ||||||||||
| Deposited By: | Tony Diaz | ||||||||||
| Deposited On: | 28 Jul 2009 16:47 | ||||||||||
| Last Modified: | 26 Dec 2012 11:06 |
Repository Staff Only: item control page


