A Caltech Library Service

Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity

Shaffer, Justin P. and Nothias, Louis-Félix and Thompson, Luke R. and Sanders, Jon G. and Salido, Rodolfo A. and Couvillion, Sneha P. and Brejnrod, Asker D. and Lejzerowicz, Franck and Haiminen, Niina and Huang, Shi and Lutz, Holly L. and Zhu, Qiyun and Martino, Cameron and Morton, James T. and Karthikeyan, Smruthi and Nothias-Esposito, Mélissa and Dührkop, Kai and Böcker, Sebastian and Kim, Hyun Woo and Aksenov, Alexander A. and Bittremieux, Wout and Minich, Jeremiah J. and Marotz, Clarisse and Bryant, MacKenzie M. and Sanders, Karenina and Schwartz, Tara and Humphrey, Greg and Vásquez-Baeza, Yoshiki and Tripathi, Anupriya and Parida, Laxmi and Carrieri, Anna Paola and Beck, Kristen L. and Das, Promi and González, Antonio and McDonald, Daniel and Ladau, Joshua and Karst, Søren M. and Albertsen, Mads and Ackermann, Gail and DeReus, Jeff and Thomas, Torsten and Petras, Daniel and Shade, Ashley and Stegen, James and Song, Se Jin and Metz, Thomas O. and Swafford, Austin D. and Dorrestein, Pieter C. and Jansson, Janet K. and Gilbert, Jack A. and Knight, Rob and Angenant, Lars T. and Berry, Alison M. and Bittleston, Leonora S. and Bowen, Jennifer L. and Chavarría, Max and Cowan, Don A. and Distel, Dan and Girguis, Peter R. and Huerta-Cepas, Jaime and Jensen, Paul R. and Jiang, Lingjing and King, Gary M. and Lavrinienko, Anton and MacRae-Crerar, Aurora and Makhalanyane, Thulani P. and Mappes, Tapio and Marzinelli, Ezequiel M. and Mayer, Gregory and McMahon, Katherine D. and Metcalf, Jessica L. and Miyake, Sou and Mousseau, Timothy A. and Murillo-Cruz, Catalina and Myrold, David and Palenik, Brian and Pinto-Tomás, Adrián A. and Porazinska, Dorota L. and Ramond, Jean-Baptiste and Rowher, Forest and RoyChowdhury, Taniya and Sandin, Stuart A. and Schmidt, Steven K. and Seedorf, Henning and Shade, Ashley and Shipway, J. Reuben and Smith, Jennifer E. and Stegen, James and Stewart, Frank J. and Tait, Karen and Thomas, Torsten and Tucker, Yael and U’Ren, Jana M. and Watts, Phillip C. and Webster, Nicole S. and Zaneveld, Jesse R. and Zhang, Shan (2022) Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nature Microbiology, 7 (12). pp. 2128-2150. ISSN 2058-5276. PMCID PMC9712116. doi:10.1038/s41564-022-01266-x.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth’s environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Shaffer, Justin P.0000-0002-9371-6336
Nothias, Louis-Félix0000-0001-6711-6719
Thompson, Luke R.0000-0002-3911-1280
Sanders, Jon G.0000-0001-6077-4014
Couvillion, Sneha P.0000-0003-0307-9343
Haiminen, Niina0000-0002-8663-1019
Lutz, Holly L.0000-0001-6454-809X
Zhu, Qiyun0000-0002-3568-6271
Martino, Cameron0000-0001-9334-1258
Morton, James T.0000-0003-3189-2681
Karthikeyan, Smruthi0000-0001-6226-4536
Böcker, Sebastian0000-0002-9304-8091
Bittremieux, Wout0000-0002-3105-1359
Parida, Laxmi0000-0002-7872-5074
Carrieri, Anna Paola0000-0003-2349-1896
Beck, Kristen L.0000-0002-4603-0235
McDonald, Daniel0000-0003-0876-9060
Albertsen, Mads0000-0002-6151-190X
Ackermann, Gail0000-0002-3901-4931
Thomas, Torsten0000-0001-9557-3001
Petras, Daniel0000-0002-6561-3022
Shade, Ashley0000-0002-7189-3067
Metz, Thomas O.0000-0001-6049-3968
Swafford, Austin D.0000-0001-5655-8300
Dorrestein, Pieter C.0000-0002-3003-1030
Jansson, Janet K.0000-0002-5487-4315
Gilbert, Jack A.0000-0001-7920-7001
Knight, Rob0000-0002-0975-9019
Girguis, Peter R.0000-0002-3599-8160
Additional Information:We thank G. Milivenvsky, A. Møller, I. Chizhevsky, S. Kirieiev, A. Nosovsky and M. Ivanenko for logistic support with fieldwork in Ukraine; L. Goldasich and J. Toronczak for assistance with sample processing for sequencing; M. Fedarko, R. Diner, E. Wood-Charlson, S. Nayfach, D. Udwary and E. Eloe-Fadrosh for reviewing the manuscript. This work was supported in part by the Samuel Freeman Charitable Trust, US National Institute of Health (NIH) (awards 1RF1-AG058942-01, 1DP1AT010885, R01HL140976, R01DK102932, R01HL134887, U19AG063744 and U01AI124316 to R.K.), US Department of Agriculture – National Institute of Food and Agriculture (USDA-NIFA) (award 2019-67013-29137 to R.K.), the US National Science Foundation (NSF) - Center for Aerosol Impacts on Chemistry of the Environment, Crohn’s & Colitis Foundation Award (CCFA) (award 675191 to R.K.), US Department of Energy - Office of Science - Office of Biological and Environmental Research - Environmental System Science Program, Semiconductor Research Corporation and Defence Advanced Research Projects Agency (SRC/DARPA) (award GI18518 to R.K.), Department of Defense (award W81XWH-17-1-0589 to R.K.), the Office of Naval Research (ONR) (award N00014-15-1-2809 to R.K.), the Emerald Foundation (award 3022 to R.K.), IBM Research AI through the AI Horizons Network, and the Center for Microbiome Innovation. J.P.S. was supported by NIH/NIGMS IRACDA K12 GM068524. L.-F.N. was supported by the NIH (award R01-GM107550). A.D.B. was supported by the Danish Council for Independent Research (DFF) (award 9058-00025B). W.B. was supported by the Research Foundation – Flanders (12W0418N). K.D. and S.B. were supported by Deutsche Forschungsgemeinschaft (BO 1910/20 and 1910/23). P.C.D. was supported by the Gordon and Betty Moore Foundation (award GBMF7622) and the NIH (award R01-GM107550). Metabolomics analyses at Pacific Northwest National Laboratory (PNNL) were supported by the Laboratory Directed Research and Development program via the Microbiomes in Transition Initiative and performed in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the US Office of Biological and Environmental Research and located at PNNL. This contribution originates in part from the River Corridor Scientific Focus Area project at PNNL. PNNL is a multiprogram national laboratory operated by Battelle for the Department of Energy (DOE) under contract DE-AC05-76RLO 1830, as well as work supported by COMPASS-FME, a multi-institutional project supported by the US DOE, Office of Science, Biological and Environmental Research as part of the Environmental System Science Program. We thank Eppendorf, Illumina and Integrated DNA Technologies for in-kind support at various phases of the project. Contributions: The EMP500 Consortium collected and provided samples. J.A.G., J.K.J. and R.K. conceived the idea for the project. P.C.D. and R.K. designed the multi-omics component of the project and provided project oversight. J.P.S. managed the project, performed preliminary data exploration, coordinated data analysis, analysed data and provided data interpretation. L.-F.N. coordinated and performed LC–MS/MS analysis, and the processing, annotation and interpretation of LC–MS/MS data. M.N.-E. performed sample preparation and extraction before LC–MS/MS analysis. L.R.T. designed the multi-omics component of the project, solicited sample collection, curated sample metadata, processed samples, performed preliminary data exploration and provided project oversight. J.G.S. designed the multi-omics component, managed the project, developed protocols and tools, coordinated and performed sequencing, and performed preliminary exploration of sequence data. R.A.S. developed protocols, and coordinated and performed sequencing. S.P.C. and T.O.M. coordinated and performed GC–MS sample processing and provided interpretation of GC–MS data. A.D.B. conceived the idea for the paper, performed preliminary data exploration, analysed data and provided data interpretation. S.H. performed machine-learning analyses. F.L. performed co-occurrence analysis, multinomial regression analyses and correlations with co-occurrence data. H.L.L. performed multinomial regression analyses. Q.Z. developed tools and provided interpretation of shotgun metagenomics data. C. Martino and J.T.M. provided oversight and interpretation of RPCA, multinomial regression and co-occurrence analyses. S.K. performed preliminary exploration of shotgun metagenomics data. K.D., S.B. and H.W.K. contributed to the annotation of LC–MS/MS data. A.A.A. processed GC–MS data. W.B. provided oversight for machine-learning analyses. C. Marotz processed samples for sequencing. Y.V.B. performed preliminary data exploration and provided oversight for machine-learning analysis. A.T. and D.P. performed preliminary data exploration. J.L. provided oversight and interpretation of nestedness analyses. L.P., A.P.C., N.H. and K.L.B. performed preliminary exploration of shotgun metagenomic data and performed machine-learning analyses. P.D. performed preliminary exploration of shotgun metagenomics data. A.G. developed tools, provided interpretation of shotgun metagenomics data and analysed shotgun metagenomics data. G.H. coordinated short-read amplicon and shotgun metagenomics sequencing. M.M.B. and K.S. performed short-read amplicon and shotgun metagenomics sequencing. T.S. assisted with DNA extraction. D.M. coordinated long-read amplicon sequencing, analysed shotgun metagenomics data and provided interpretation of the data. S.M.K. and M.A. coordinated and performed long-read amplicon sequencing and long-read sequence data analysis. J.J.M. collected samples, coordinated field logistics, developed protocols, and performed short-read amplicon and shotgun metagenomics sequencing. S.J.S. collected samples, coordinated field logistics and provided interpretation of the data. G.A. curated sample metadata and organized sequence data. J.D. processed sequence data. A.D.S. provided project oversight and data interpretation. T.T., A.S. and J.S. collected samples, coordinated field logistics and provided interpretation of the data. J.P.S. wrote the manuscript, with contributions from all authors. Data availability. The mass spectrometry method and data (.RAW and .mzML) were deposited on the MassIVE public repository and are available under the dataset accession number MSV000083475. The processing files were also added to the deposition (updates/2019-08-21_lfnothias_7cc0af40/other/1908_EMPv2_INN/). GNPS molecular networking job is available at and was also performed in analogue mode The DEREPLICATOR jobs can be accessed at and The SIRIUS results are available on the GitHub repository (emp/data/metabolomics/FBMN/SIRIUS). The notebooks for metabolomics data preparation and microbially related molecules establishment are available at Amplicon and shotgun metagenomic sequence data were submitted to the European Nucleotide Archive under Project PRJEB42019 ( Raw and demultiplexed amplicon and shotgun sequence data, the feature-table for full-length rRNA operon analysis, feature-tables for LC–MS/MS classical molecular networking and feature-based molecular networking, and the feature-table for GC–MS molecular networking data are available for download and analysis through Qiita at (study: 13114). The GreenGenes database for 16S rRNA can be accessed at The SILVA 138 database for 16S and 18S rRNA can be accessed at The UNITE 9 database for fungal ITS sequences can be accessed at The Web of Life database can be accessed at The Rep200 database can be accessed at The Natural Products Atlas database can be accessed at The MIBiG database can be accessed at Code availability. Complete protocols for laboratory and computational workflows for both metagenomics and metabolomics data for use by the broader community are available in GitHub ( Competing interests. S.B. and K.D. are co-founders of Bright Giant GmbH, which implements some of the tools used for metabolite annotation here (that is, SIRIUS, CSI-FingerID+CANOPUS). The remaining authors declare no competing interests. Peer review. Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.
Funding AgencyGrant Number
Samuel Freeman Charitable TrustUNSPECIFIED
Crohn’s and Colitis Foundation of America675191
Semiconductor Research CorporationUNSPECIFIED
Defense Advanced Research Projects Agency (DARPA)GI18518
Department of DefenseW81XWH-17-1-0589
Office of Naval Research (ONR)N00014-15-1-2809
Emerald Foundation3022
Center for Microbiome InnovationUNSPECIFIED
NIHK12 GM068524
Danish Council for Independent Research9058-00025B
Fonds Wetenschappelijk Onderzoek (FWO)12W0418N
Deutsche Forschungsgemeinschaft (DFG)BO 1910/20
Deutsche Forschungsgemeinschaft (DFG)BO 1910/23
Gordon and Betty Moore FoundationGBMF7622
Pacific Northwest National LaboratoryUNSPECIFIED
Department of Energy (DOE)DE-AC05-76RL01830
Issue or Number:12
PubMed Central ID:PMC9712116
Record Number:CaltechAUTHORS:20221215-540386000.18
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:118353
Deposited By: George Porter
Deposited On:16 Dec 2022 23:43
Last Modified:16 Dec 2022 23:43

Repository Staff Only: item control page