A Caltech Library Service

A dictionary based approach for gene annotation

Pachter, Lior and Batzoglou, Serafim and Spitkovsky, Valentin I. and Beebee, William S. and Lander, Eric S. and Berger, Bonnie and Kleitman, Daniel J. (1999) A dictionary based approach for gene annotation. In: Proceedings of the third annual international conference on Computational molecular biology (RECOMB '99). ACM , New York, NY, pp. 285-294. ISBN 1-58113-069-4.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


This paper describes a fast and fully automated dictionary based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O(1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction. For instance, using the OWL protein database on a benchmark test set of 130 genes, and after removing sequences from the database with exact amino acid homology to genes in our test set, we find 88% of coding nucleotides, and 99% of our predictions of coding nucleotides are correct. Also, 81% of coding exons are predicted exactly, while 82% of our predictions of exons agree exactly with the published annotation of their genes.

Item Type:Book Section
Related URLs:
URLURL TypeDescription
Pachter, Lior0000-0002-9164-6231
Lander, Eric S.0000-0003-2662-4631
Additional Information:© 1999 ACM. We thank Eric Banks, Ben Cooke, John Dunagan, Nick Feamster, Aram Harrow, Ben Ho, Julia Lipman, Theo Tonchev, Tina Tyan and Bill Wallis for helping in countless ways with the implementation of the ideas outlined in this paper. This project has been supported by Merck. Pachter has been supported in part by an NIH training grant and a Program in Mathematics and Molecular Biology graduate fellowship.
Funding AgencyGrant Number
NIH Predoctoral FellowshipUNSPECIFIED
Program in Mathematics and Molecular BiologyUNSPECIFIED
Record Number:CaltechAUTHORS:20170309-113403506
Persistent URL:
Official Citation:Lior Pachter, Serafim Batzoglou, Valentin I. Spitkovsky, William S. Beebee, Jr., Eric S. Lander, Bonnie Berger, and Daniel J. Kleitman. 1999. A dictionary based approach for gene annotation. In Proceedings of the third annual international conference on Computational molecular biology (RECOMB '99). ACM, New York, NY, USA, 285-294. DOI=
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74982
Deposited By: George Porter
Deposited On:13 Mar 2017 15:58
Last Modified:15 Nov 2021 16:29

Repository Staff Only: item control page