A Caltech Library Service

Subtree power analysis finds optimal species for comparative genomics

McAuliffe, Jon D. and Jordan, Michael I. and Pachter, Lior (2005) Subtree power analysis finds optimal species for comparative genomics. Proceedings of the National Academy of Sciences of the United States of America, 102 (22). pp. 7900-7905. ISSN 0027-8424. PMCID PMC1142384. doi:10.1073/pnas.0502790102.

[img] PDF - Published Version
See Usage Policy.

[img] PDF (Fig. 3. The 21-species phylogenetic tree estimate used in the empirical power analysis. Numbers are maximum-likelihood estimates of expected mutation counts, under the conserved regime, based on our sequence alignments) - Supplemental Material
See Usage Policy.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a study of vertebrate species. Our results suggest that marsupials are prime sequencing candidates.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle Information Paper
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2005 The National Academy of Sciences. Communicated by Peter J. Bickel, University of California, Berkeley, CA, April 6, 2005 (received for review December 13, 2004) We thank Peter Bickel and Adam Siepel for helpful comments. M.I.J. was supported by National Institutes of Health Grant R33-HG003070. L.P. was supported by National Institutes of Health Grant R01-HG2362-3, a Sloan Foundation Research Fellowship, and National Science Foundation Career Award CCF-0347992. Author contributions: J.D.M., M.I.J., and L.P. designed research; J.D.M., M.I.J., and L.P. performed research; J.D.M., M.I.J., and L.P. contributed new reagents/analytic tools; J.D.M. analyzed data; and J.D.M. wrote the paper.
Funding AgencyGrant Number
Alfred P. Sloan FoundationUNSPECIFIED
Subject Keywords:hypothesis testing; likelihood ratio; sequence analysis
Issue or Number:22
PubMed Central ID:PMC1142384
Record Number:CaltechAUTHORS:20170307-090057219
Persistent URL:
Official Citation:Jon D. McAuliffe, Michael I. Jordan, and Lior Pachter Subtree power analysis and species selection for comparative genomics PNAS 2005 102 (22) 7900-7905; published ahead of print May 23, 2005, doi:10.1073/pnas.0502790102
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74831
Deposited By: Tony Diaz
Deposited On:07 Mar 2017 17:46
Last Modified:11 Nov 2021 05:30

Repository Staff Only: item control page