A Caltech Library Service

Viral Population Estimation Using Pyrosequencing

Tesler, Glenn and Eriksson, Nicholas and Pachter, Lior and Mitsuya, Yumi and Rhee, Soo-Yon and Wang, Chunlin and Gharizadeh, Baback and Ronaghi, Mostafa and Shafer, Robert W. and Beerenwinkel, Niko (2008) Viral Population Estimation Using Pyrosequencing. PLoS Computational Biology, 4 (5). Art. No. e1000074. ISSN 1553-7358. PMCID PMC2323617. doi:10.1371/journal.pcbi.1000074.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF (Chain decompositions. Figure S1.) - Supplemental Material
Creative Commons Attribution.

[img] PDF (Haplotype reconstruction. Figure S2.) - Supplemental Material
Creative Commons Attribution.

[img] PDF (Error correction. Figure S3.) - Supplemental Material
Creative Commons Attribution.

[img] PDF (Size of the read graph cover. Figure S4.) - Supplemental Material
Creative Commons Attribution.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation–maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper CentralArticle
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2008 Eriksson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: July 2, 2007; Accepted: March 27, 2008; Published: May 9, 2008. N. Eriksson and L. Pachter were partially supported by the NSF (grants DMS-0603448 and CCF-0347992, respectively). N. Beerenwinkel was funded by a grant from the Bill and Melinda Gates Foundation through the Grand Challenges in Global Health Initiative. The NSF has played no role in any part of this work. The authors have declared that no competing interests exist. Author Contributions. Performed the experiments: YM SR CW BG MR RS. Analyzed the data: NE LP NB. Wrote the paper: NE LP NB
Funding AgencyGrant Number
Bill and Melinda Gates FoundationUNSPECIFIED
Issue or Number:5
PubMed Central ID:PMC2323617
Record Number:CaltechAUTHORS:20170306-133352205
Persistent URL:
Official Citation:Eriksson N, Pachter L, Mitsuya Y, Rhee S-Y, Wang C, Gharizadeh B, et al. (2008) Viral Population Estimation Using Pyrosequencing. PLoS Comput Biol 4(5): e1000074. doi:10.1371/journal.pcbi.1000074
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74797
Deposited On:06 Mar 2017 22:27
Last Modified:11 Nov 2021 05:29

Repository Staff Only: item control page