A Caltech Library Service

Almost all human genes resulted from ancient duplication

Britten, Roy (2006) Almost all human genes resulted from ancient duplication. Proceedings of the National Academy of Sciences of the United States of America, 103 (50). pp. 19027-19032. ISSN 0027-8424. PMCID PMC1748171. doi:10.1073/pnas.0510007103.

PDF - Published Version
See Usage Policy.


Use this Persistent URL to link to this item:


Results of protein sequence comparison at open criterion show a very large number of relationships that have, up to now, gone unreported. The relationships suggest many ancient events of gene duplication. It is well known that gene duplication has been a major process in the evolution of genomes. A collection of human genes that have known functions have been examined for a history of gene duplications detected by means of amino acid sequence similarity by using BLASTp with an expectation of two or less (open criterion). Because the collection of genes in build 35 includes sets of transcript variants, all genes of known function were collected, and only the longest transcription variant was included, yielding a 13,298-member library called KGMV (for known genes maximum variant). When all lengths of matches are accepted, >97% of human genes show significant matches to each other. Many form matches with a large number of other different proteins, showing that most genes are made up from parts of many others as a result of ancient events of duplication. To support the use of the open criterion, all of the members of the KGMV library were twice replaced with random protein sequences of the same length and average composition, and all were compared with each other with BLASTp at expectation two or less. The set of matches averaged 0.35% of that observed for the KGMV set of proteins.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Additional Information:© 2006 by The National Academy of Sciences of the USA. Contributed by Roy Britten, November 18, 2005. John Williams carried out much of the data processing and wrote necessary Perl programs, Eric Davidson’s laboratory supplied support, and Dixie Mager made crucial criticism of an earlier version. Author contributions: R.J.B. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper. The author declares no conflict of interest
Subject Keywords:open criterion, protein, relationships, sequence
Issue or Number:50
PubMed Central ID:PMC1748171
Record Number:CaltechAUTHORS:BRIpnas06
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:9865
Deposited By: Archive Administrator
Deposited On:25 Mar 2008
Last Modified:01 Jun 2023 23:04

Repository Staff Only: item control page