A Caltech Library Service

ESAMP: Event-Sourced Architecture for Materials Provenance Management and Application to Accelerated Materials Discovery

Statt, Michael J. and Rohr, Brian A. and Brown, Kris and Guevarra, Dan and Hummelshøj, Jens and Hung, Linda and Anapolsky, Aabraham and Gregoire, John M. and Suram, Santosh K. (2021) ESAMP: Event-Sourced Architecture for Materials Provenance Management and Application to Accelerated Materials Discovery. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

[img] PDF - Supplemental Material
Creative Commons Attribution Non-commercial No Derivatives.


Use this Persistent URL to link to this item:


While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence and interrelationships among events occurring in a simulation or experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments performed using various modalities such as serial, parallel, multimodal experimentation. Our data architecture tracks the evolution of a material’s state, enabling a demonstration of how stateequivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with statechanging processes in addition to the typically used composition data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in enabling several aspects of accelerated materials discovery such as dynamic workflow design, generation of knowledge graphs, and efficient integration of theory and experiment.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemSupplementary weblinks
Guevarra, Dan0000-0002-9592-3195
Gregoire, John M.0000-0002-2863-5265
Suram, Santosh K.0000-0001-8170-2685
Additional Information:The content is available under CC BY NC ND 4.0 License. Jun 09, 2021 Version 1. The development and implementation of the architecture were supported by the Toyota Research Institute through the Accelerated Materials Design and Discovery program. Generation of all experimental data was supported by the Joint Center for Artificial Photosynthesis, a US Department of Energy (DOE) Energy Innovation Hub, supported through the Office of Science of the DOE under Award Number DE-SC0004993. The development of the catalyst discovery use case was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award DESC0020383. The authors thank Dr. Edwin Soedarmadji for stewardship of MEAD and all members of the JCAP High Throughput Experimentation group for the generation of the data. The authors thank Daniel Schweigert for providing insights into standard database management practices. All the authors declare no competing interest.
Funding AgencyGrant Number
Toyota Research InstituteUNSPECIFIED
Department of Energy (DOE)DE-SC0004993
Department of Energy (DOE)DE-SC0020383
Subject Keywords:Data Architecture; machine-learning model; Accelerated Discovery
Record Number:CaltechAUTHORS:20210629-214641737
Persistent URL:
Official Citation:Statt M, Rohr BA, Brown KS, Guevarra D, Hummelshøj JS, Hung L, et al. ESAMP: Event-Sourced Architecture for Materials Provenance Management and Application to Accelerated Materials Discovery. ChemRxiv. Cambridge: Cambridge Open Engage; 2021; This content is a preprint and has not been peer-reviewed.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109659
Deposited By: Tony Diaz
Deposited On:29 Jun 2021 22:29
Last Modified:16 Nov 2021 19:37

Repository Staff Only: item control page