A Caltech Library Service

Tracking materials science data lineage to manage millions of materials experiments and analyses

Soedarmadji, Edwin and Stein, Helge S. and Suram, Santosh K. and Guevarra, Dan and Gregoire, John M. (2019) Tracking materials science data lineage to manage millions of materials experiments and analyses. npj Computational Materials, 5 . Art. No. 79. ISSN 2057-3960. doi:10.1038/s41524-019-0216-x.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Plate table) - Supplemental Material
Creative Commons Attribution.

[img] Plain Text (Table script) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


In an era of rapid advancement of algorithms that extract knowledge from data, data and metadata management are increasingly critical to research success. In materials science, there are few examples of experimental databases that contain many different types of information, and compared with other disciplines, the database sizes are relatively small. Underlying these issues are the challenges in managing and linking data across disparate synthesis and characterization experiments, which we address with the development of a lightweight data management framework that is generally applicable for experimental science and beyond. Five years of managing experiments with this system has yielded the Materials Experiment and Analysis Database (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments, as well as the analysis and distillation of that data into property and performance metrics via software in an accompanying open source repository. The unprecedented quantity and diversity of experimental data are searchable by experiment and analysis attributes generated by both researchers and data processing software. The search web interface allows users to visualize their search results and download zipped packages of data with full annotations of their lineage. The enormity of the data provides substantial challenges and opportunities for incorporating data science in the physical sciences, and MEAD’s data and algorithm management framework will foster increased incorporation of automation and autonomous discovery in materials and chemistry research.

Item Type:Article
Related URLs:
URLURL TypeDescription
Stein, Helge S.0000-0002-3461-0232
Suram, Santosh K.0000-0001-8170-2685
Guevarra, Dan0000-0002-9592-3195
Gregoire, John M.0000-0002-2863-5265
Additional Information:© 2019 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received 22 December 2018; Accepted 26 June 2019; Published 26 July 2019. Data availability: The data sets generated during and/or analyzed during the current study are available in the HTE-JCAP repository, or Summary tables of plates and compositions are available at This study and the acquisition of all data is based upon work performed by the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the US Department of Energy (Award No. DE-SC0004993). The development of database export algorithms was also supported by a grant from the Toyota Research Institute through the Accelerated Materials Design and Discovery program. Use of the Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, is supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515. Author Contributions: E.S. designed, developed, and maintained the IT infrastructure, data management system, database, searchable index, and the web UI that runs MEAD. E.S., J.G., S.K.S., and D.G. designed data management protocols. J.G., S.K.S., D.G., and H.S. designed, developed, and verified data analysis algorithms. The paper was written by E.S. and J.G. with contributions from H.S. The authors declare no competing interests.
Funding AgencyGrant Number
Department of Energy (DOE)DE-SC0004993
Toyota Research InstituteUNSPECIFIED
Department of Energy (DOE)DE-AC02-76SF00515
Record Number:CaltechAUTHORS:20190726-101520787
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:97441
Deposited By: Tony Diaz
Deposited On:26 Jul 2019 17:27
Last Modified:01 Jun 2023 22:53

Repository Staff Only: item control page