A Caltech Library Service

Genome Sequencing of Sewage Detects Regionally Prevalent SARS-CoV-2 Variants

Crits-Christoph, Alexander and Kantor, Rose S. and Olm, Matthew R. and Whitney, Oscar N. and Al-Shayeb, Basem and Lou, Yue Clare and Flamholz, Avi and Kennedy, Lauren C. and Greenwald, Hannah and Hinkle, Adrian and Hetzel, Jonathan and Spitzer, Sara and Koble, Jeffery and Tan, Asako and Hyde, Fred and Schroth, Gary and Kuersten, Scott and Banfield, Jillian F. and Nelson, Kara L. (2021) Genome Sequencing of Sewage Detects Regionally Prevalent SARS-CoV-2 Variants. mBio, 12 (1). Art. No. e02703-20. ISSN 2150-7511. PMCID PMC7845645. doi:10.1128/mBio.02703-20.

PDF - Published Version
Creative Commons Attribution.

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

[img] MS Excel (Table S1) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Table S2) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Table S3) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Table S4) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and nearly complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the United States or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside California, indicating that wastewater sequencing can provide evidence for recent introductions of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context.

Item Type:Article
Related URLs:
URLURL TypeDescription ItemData ItemData CentralArticle Paper
Kantor, Rose S.0000-0002-5402-8979
Olm, Matthew R.0000-0001-5540-350X
Whitney, Oscar N.0000-0002-4858-2615
Al-Shayeb, Basem0000-0002-3120-3201
Flamholz, Avi0000-0002-9278-5479
Kennedy, Lauren C.0000-0002-4451-2361
Schroth, Gary0000-0002-3055-056X
Additional Information:© 2021 Crits-Christoph et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license. Received 21 September 2020; Accepted 15 December 2020; Published 19 January 2021. We gratefully acknowledge the originating and submitting laboratories of SARS-CoV-2 genomes in the GISAID EpiCoV database ( that were used for our comparisons to clinical samples and in particular the Innovative Genomics Institute SARS-CoV-2 Sequencing Group for Alameda County genomes. We also gratefully acknowledge Vinson Fan for assistance with RT-qPCR and the laboratory of Robert Tjian for sharing materials. Funding was provided to K.L.N. and J.F.B. by a Rapid Research Response grant from the Innovative Genomics Institute (IGI) and a seed grant from the Center for Information Technology Research in the Interest of Society (CITRIS) at UC Berkeley. Data availability: Sequencing data for this project has been released under NCBI BioProject ID PRJNA661613. Processed data, reproducible code, and workflows for the analyses performed are available at
Funding AgencyGrant Number
Innovative Genomics Institute (IGI)UNSPECIFIED
University of California, BerkeleyUNSPECIFIED
Subject Keywords:coronavirus, environmental microbiology, genomics, metagenomics
Issue or Number:1
PubMed Central ID:PMC7845645
Record Number:CaltechAUTHORS:20201119-100731171
Persistent URL:
Official Citation:Crits-Christoph A, Kantor RS, Olm MR, Whitney ON, Al-Shayeb B, Lou YC, Flamholz A, Kennedy LC, Greenwald H, Hinkle A, Hetzel J, Spitzer S, Koble J, Tan A, Hyde F, Schroth G, Kuersten S, Banfield JF, Nelson KL. 2021. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. mBio 12:e02703-20.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:106736
Deposited By: Tony Diaz
Deposited On:19 Nov 2020 19:06
Last Modified:01 Jun 2023 23:33

Repository Staff Only: item control page