The Materials Experiment Knowledge Graph
Materials knowledge is inherently hierarchical. While high-level descriptors such as composition and structure are valuable for contextualizing materials data, the data must ultimately be considered in the context of its low-level acquisition details. Graph databases offer an opportunity to represent hierarchical relationships among data, organizing semantic relationships into a knowledge graph. Herein, we establish a knowledge graph of materials experiments whose construction encodes the complete provenance of each material sample and its associated experimental data and metadata. Additional relationships among materials and experiments further encode knowledge and facilitate data exploration. We illustrate the Materials Experiment Knowledge Graph (MEKG) using several use cases, demonstrating the value of modern graph databases for the enterprise of data-driven materials science.
The content is available under CC BY 4.0 License. This material is primarily based on work performed by the Liquid Sunlight Alliance, which is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Fuels from Sunlight Hub under Award DE-SC0021266. Development of the graph database schema was supported by Toyota Research Institute. Much of the underlying data was generated by research in the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the U.S. Department of Energy (Award No. DE-SC0004993). Storage for MEAD is provided by the Open Storage Network via XSEDE allocation INI210004. Author Contributions: M.J.S., B.A.R., D.G., S.K.S., and J.M.G. designed the MEKG and the use cases. M.J.S. and B.A.R. implemented MEKG with assistance from D.G. and J.M.G.. J.B. and D.G. implemented the design of experiments use case. Data Availability. The MPS SQL database from which MEKG is built and the three sub-databases are available at https://data.caltech.edu/records/aeffy-dcr62 (doi: 10.22002/aeffy-dcr62). The MEKG neo4j database is available at https://data.caltech.edu/records/h88fq-dk449 (doi: 10.22002/h88fq-dk449). Code Availability. The code for the query time use cases and MEKG migration from MPS is available at https://github.com/modelyst/mekgmigrations. The code for the design of experiments and hypothesis evaluation use cases is available at https://data.caltech.edu/records/m4mpa-4mt17 (doi: 10.22002/m4mpa-4mt17) Conflicts of interest. Modelyst LLC implements custom data management systems in a professional context.
Submitted - the-materials-experiment-knowledge-graph.pdf