A Caltech Library Service

Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study

Levin, Tera C. and Glazer, Andrew M. and Pachter, Lior and Brem, Rachel B. and Eisen, Michael B. (2010) Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study. PLOS ONE, 5 (7). Art. No. e11645. ISSN 1932-6203. PMCID PMC2912228. doi:10.1371/journal.pone.0011645.

[img] PDF - Published Version
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Pachter, Lior0000-0002-9164-6231
Eisen, Michael B.0000-0002-7528-738X
Additional Information:© 2010 Levin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: February 25, 2010; Accepted: June 8, 2010; Published: July 29, 2010. Author Contributions: Conceived and designed the experiments: TL AMG MBE. Performed the experiments: TL AMG. Analyzed the data: TL AMG. Contributed reagents/materials/analysis tools: TL AMG. Wrote the paper: TL AMG RB MBE. Supervised the research: MBE LP RB. The authors have no support or funding to report. Competing interests: MBE is a member of the PLoS Board of Directors. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
Issue or Number:7
PubMed Central ID:PMC2912228
Record Number:CaltechAUTHORS:20170306-123004736
Persistent URL:
Official Citation:Levin TC, Glazer AM, Pachter L, Brem RB, Eisen MB (2010) Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study. PLoS ONE 5(7): e11645. doi:10.1371/journal.pone.0011645
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74790
Deposited By: George Porter
Deposited On:06 Mar 2017 21:18
Last Modified:11 Nov 2021 05:29

Repository Staff Only: item control page