A Caltech Library Service

The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences

White, Stephen H. and Jacobs, Russell E. (1993) The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. Journal of Molecular Evolution, 36 (1). pp. 79-95. ISSN 0022-2844.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identity using the run test statistic (r_o) of Mood (1940, Ann. Math. Stat. 11, 367–392). The probability density of r_o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.

Item Type:Article
Related URLs:
URLURL TypeDescription
Jacobs, Russell E.0000-0002-1382-8486
Additional Information:© 1993 Springer. Received November 4, 1991 / Revised June 27, 1992. We are grateful for the advice and criticisms of the "Wednesday Afternoon Group" consisting of Prof. Howard Tucker, Prof. Mark Finkelstein, Mr. Norbert Schumacher, and Mr. Les Vernon of the Mathematics Department. We thank the referees for their helpful comments and criticisms. The research was supported by the National Science Foundation (DMB-8807431).
Funding AgencyGrant Number
Subject Keywords:Protein evolution; Protein sequence analysis; Random protein sequences; Run test; Protein folding; Rhodopsin; Bacteriorhodopsin
Issue or Number:1
Record Number:CaltechAUTHORS:20151230-152925048
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:63272
Deposited By: George Porter
Deposited On:29 Jan 2016 21:05
Last Modified:03 Oct 2019 09:26

Repository Staff Only: item control page