Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 4, 2024 | Online First
Journal Article Open

Annotation-free prediction of microbial dioxygen utilization

  • 1. ROR icon California Institute of Technology
  • 2. ROR icon University of California, San Diego

Abstract

Aerobes require dioxygen (O2) to grow; anaerobes do not. However, nearly all microbes—aerobes, anaerobes, and facultative organisms alike—express enzymes whose substrates include O2, if only for detoxification. This presents a challenge when trying to assess which organisms are aerobic from genomic data alone. This challenge can be overcome by noting that O2 utilization has wide-ranging effects on microbes: aerobes typically have larger genomes encoding distinctive O2-utilizing enzymes, for example. These effects permit high-quality prediction of O2 utilization from annotated genome sequences, with several models displaying ≈80% accuracy on a ternary classification task for which blind guessing is only 33% accurate. Since genome annotation is compute-intensive and relies on many assumptions, we asked if annotation-free methods also perform well. We discovered that simple and efficient models based entirely on genomic sequence content—e.g., triplets of amino acids—perform as well as intensive annotation-based classifiers, enabling rapid processing of genomes. We further show that amino acid trimers are useful because they encode information about protein composition and phylogeny. To showcase the utility of rapid prediction, we estimated the prevalence of aerobes and anaerobes in diverse natural environments cataloged in the Earth Microbiome Project. Focusing on a well-studied O2 gradient in the Black Sea, we found quantitative correspondence between local chemistry (O2:sulfide concentration ratio) and the composition of microbial communities. We, therefore, suggest that statistical methods like ours might be used to estimate, or “sense,” pivotal features of the chemical environment using DNA sequencing data.

Copyright and License

© 2024 Flamholz et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

Acknowledgement

The authors thank T. P. Barnum, D. Dar, J. Jabłońska, R. Murali, and J. Leadbetter for valuable discussions.
A.I.F was supported by the Jane Coffin Childs Memorial Fund for Medical Research. J.E.G. was supported by the Gordon and Betty Moore Foundation as Physics of Living Systems Fellows through grant number GBMF4513 and NASA’s Interdisciplinary Consortia for Astrobiology Research (grant number 80NSSC23K1357). P.A.R. was supported through the Schmidt Scholars in Software Engineering program at Caltech. A.J. acknowledges support from the Howard Hughes Medical Institute as a Hanna Gray Fellow (Grant #GT16787) and from the National Institute of Health through the UCSD FIRST program. W.W.F. acknowledges support from the Resnick Sustainability Institute, the Caltech Center for Evolutionary Sciences, and NSF NNA grant 2127442. This research was also sponsored by the Army Research Office and was accomplished under the Cooperative Agreement Number W911NF-22-2-0210 to D.K.N. This research was supported in part by the National Science Foundation under Grant No. NSF PHY-1748958 to the Kavli Institute of Theoretical Physics.

Supplemental Material

Supplemental figures and tables : msystems.00763-24-s0001.pdf
 

Data Availability

Source code is available at github.com/flamholz/annotation_free_dioxygen_utilization.
A provided script automates retrieval of data from the figshare repository at https://figshare.com/articles/dataset/Annotation-free_prediction_of_microbial_dioxygen_utilization/26065345

Contributions

Avi I. Flamholz and Joshua E. Goldford contributed equally to this article. The author order was decided alphabetically.

Files

flamholz-et-al-2024-annotation-free-prediction-of-microbial-dioxygen-utilization.pdf
Files (2.3 MB)

Additional details

Created:
October 14, 2024
Modified:
October 14, 2024