Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 3, 2024 | Published
Journal Article Open

Machine learning models can identify individuals based on a resident oral bacteriophage family

Abstract

Metagenomic studies have revolutionized the study of novel phages. However these studies trade depth of coverage for breadth. We show that the targeted sequencing of a small region of a phage terminase family can provide sufficient sequence diversity to serve as an individual-specific barcode or a "phageprint'', defined as the relative abundance profile of the variants within a terminase family. By collecting ~700 oral samples from ~100 individuals living on multiple continents, we found a consistent trend wherein each individual harbors one or two dominant variants that coexist with numerous low-abundance variants. By tracking phageprints over the span of a month across ten individuals, we observed that phageprints were generally stable, and found instances of concordant temporal fluctuations of variants shared between partners. To quantify these patterns further, we built machine learning models that, with high precision and recall, distinguished individuals even when we eliminated the most abundant variants and further downsampled phageprints to 2% of the remaining variants. Except between partners, phageprints are dissimilar between individuals, and neither country-of-residence, genetics, diet nor cohabitation seem to play a role in the relatedness of phageprints across individuals. By sampling from six different oral sites, we were able to study the impact of millimeters to a few centimeters of separation on an individual's phageprint and found that such limited spatial separation results in site-specific phageprints.

Copyright and License

© 2024 Mahmoudabadi, Homyk, Catching, Mahmoudabadi, Foley, Tadmor and Phillips. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Acknowledgement

We are grateful to members of the Phillips Lab and the Boundaries of Life Initiative for helpful discussions.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by the National Science Foundation (Graduate Research Fellowship; DGE‐1144469), the John Templeton Foundation (Boundaries of Life Initiative; 51250), the National Institute of Health (Maximizing Investigator’s Research Award; RFA-GM-17-002), and the National Institute of Health (Exceptional Unconventional Research Enabling Knowledge Acceleration; R01-GM098465).

Contributions

GM: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. KH: Methodology, Validation, Writing – review & editing. AC: Methodology, Validation, Writing – review & editing. AM: Methodology, Validation, Writing – review & editing. HF: Methodology, Validation, Writing – review & editing. AT: Conceptualization, Writing – review & editing. RP: Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing – review & editing.

Data Availability

The original contributions presented in the study are publicly available. This data can be found here: https://github.com/gitamahm/phageprint.

Conflict of Interest

Authors KH was employed by Genentech Inc.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Ethics

The studies involving humans were approved by California Institute of Technology IRB board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Supplemental Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/frmbi.2024.1408203/full#supplementary-material

Files

frmbi-2-1408203.pdf
Files (20.4 MB)
Name Size Download all
md5:9b43553154df38ba82a5325ea880f9f1
5.4 MB Preview Download
md5:607cf6c1a7447c5b775e7ccb85e83a0e
15.0 MB Download

Additional details

Created:
October 10, 2024
Modified:
October 10, 2024