Optimizing campus-wide COVID-19 test notifications with interpretable wastewater time-series features using machine learning models
Abstract
During the COVID-19 pandemic, wastewater surveillance of the SARS CoV-2 virus has been demonstrated to be effective for population surveillance at the county level down to the building level. At the University of California, San Diego, daily high-resolution wastewater surveillance conducted at the building level is being used to identify potential undiagnosed infections and trigger notification of residents and responsive testing, but the optimal determinants for notifications are unknown. To fill this gap, we propose a pipeline for data processing and identifying features of a series of wastewater test results that can predict the presence of COVID-19 in residences associated with the test sites. Using time series of wastewater results and individual testing results during periods of routine asymptomatic testing among UCSD students from 11/2020 to 11/2021, we develop hierarchical classification/decision tree models to select the most informative wastewater features (patterns of results) which predict individual infections. We find that the best predictor of positive individual level tests in residence buildings is whether or not the wastewater samples were positive in at least 3 of the past 7 days. We also demonstrate that the tree models outperform a wide range of other statistical and machine models in predicting the individual COVID-19 infections while preserving interpretability. Results of this study have been used to refine campus-wide guidelines and email notification systems to alert residents of potential infections.
Copyright and License
© Te Author(s) 2023. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Contributions
Author Contributions: Study design: T.L., J.Z., A.S., N.M., V.D.G., R.K., and R.S.; Data analysis: T.L., J.Z., N.M., and V.D.G.; Data visualization: T.L., S.K., and J.Z.; Manuscript writing: T.L., J.Z., N.M., and V.D.G. Manuscript revisions: all authors.
Data Availability
All raw wastewater sequencing data are available via the NCBI Sequence Read Archive under the BioProject ID PRJNA819090. Consensus sequences from clinical and wastewater surveillance are all available on GISAID. Spike-in sequencing data are available via Google cloud (https://console.cloud.google.com/storage/browser/search-reference_data). The UCSD campus dashboard can be accessed at https://returntolearn.ucsd.edu/dashboard/. The SEARCH genomic surveillance dashboard is available at https://searchcovid.info/dashboards/sequencing-statistics/. The wastewater time series features are available to researchers for non-commercial use per request.
Code Availability
The code for all analysis involved in this manuscript is hosted publicly on GitHub repository (https://github.com/tuolin123/Wastewater_UCSD).
Ethics
The Institutional Review Board (IRB) of University of California, San Diego provided approval for human subject protection oversight of the data obtained by the EXCITE laboratory for the campus clinical samples. Informed consent was obtained from all participants included in the study, and the appropriate institutional forms have been archived, and any sample identifiers included were de-identified. The wastewater component of this project was discussed with our IRB and was not deemed to be human subject research as it did not record personally identifiable information. All methods were carried out in accordance with relevant guidelines and regulations.
Conflict of Interest
The authors declare no competing interests. R.K.'s current conflicts of interest are: Gencirq (stock and SAB member), DayTwo (consultant and SAB member), Cybele (stock and consultant), Biomesense (stock, consultant, SAB member), Micronoma (stock, SAB member, co-founder), and Biota (stock, co-founder).
Files
Name | Size | Download all |
---|---|---|
md5:b56666a39ad0af9867241e0ca84dca69
|
1.4 MB | Preview Download |
md5:df2802836f8388c8adceea2f600b8f26
|
568.9 kB | Download |
Additional details
- PMCID
- PMC10673837
- UCSD Chancellor's Office Fund
- Accepted
-
2023-11-24published online
- Caltech groups
- Division of Geological and Planetary Sciences, COVID-19