A Caltech Library Service

Analyzing machine learning models to accelerate generation of fundamental materials insights

Umehara, Mitsutaro and Stein, Helge S. and Guevarra, Dan and Newhouse, Paul F. and Boyd, David A. and Gregoire, John M. (2019) Analyzing machine learning models to accelerate generation of fundamental materials insights. npj Computational Materials, 5 . Art. No. 34. ISSN 2057-3960. doi:10.1038/s41524-019-0172-5.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Complete dataset used for model training) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Machine learning for materials science envisions the acceleration of basic science research through automated identification of key data relationships to augment human interpretation and gain scientific understanding. A primary role of scientists is extraction of fundamental knowledge from data, and we demonstrate that this extraction can be accelerated using neural networks via analysis of the trained data model itself rather than its application as a prediction tool. Convolutional neural networks excel at modeling complex data relationships in multi-dimensional parameter spaces, such as that mapped by a combinatorial materials science experiment. Measuring a performance metric in a given materials space provides direct information about (locally) optimal materials but not the underlying materials science that gives rise to the variation in performance. By building a model that predicts performance (in this case photoelectrochemical power generation of a solar fuels photoanode) from materials parameters (in this case composition and Raman signal), subsequent analysis of gradients in the trained model reveals key data relationships that are not readily identified by human inspection or traditional statistical analyses. Human interpretation of these key relationships produces the desired fundamental understanding, demonstrating a framework in which machine learning accelerates data interpretation by leveraging the expertize of the human scientist. We also demonstrate the use of neural network gradient analysis to automate prediction of the directions in parameter space, such as the addition of specific alloying elements, that may increase performance by moving beyond the confines of existing data.

Item Type:Article
Related URLs:
URLURL TypeDescription ItemCode
Umehara, Mitsutaro0000-0001-8665-0028
Stein, Helge S.0000-0002-3461-0232
Guevarra, Dan0000-0002-9592-3195
Newhouse, Paul F.0000-0003-2032-3010
Gregoire, John M.0000-0002-2863-5265
Additional Information:© 2019 The Author(s). Open Access - This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received 22 October 2018. Accepted 06 February 2019. Published 08 March 2019. This study is based upon work performed by the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the U.S. Department of Energy (Award No. DE-SC0004993). Development of the algorithm for automating the model interpretation (J.M.G. and H.S.S.) was funded by Toyota Research Institute through the Accelerated Materials Design and Discovery program. Author Contributions: M.U. performed model training and gradient analysis. H.S.S. and D.G. assisted with design of the model and comparisons to other techniques. P.F.N., D.G. and D.A.B. performed all experiments. M.U., H.S.S., D.G. and J.M.G. interpreted model outputs and created data visualization schemes. J.M.G. created algorithm for automated relationship identification with assistance from M.U. and H.S.S. M.U., H.S.S. and J.M.G. were the primary authors of the manuscript. Code availability: The authors declare that the code used to perform the analysis is provided at The authors declare no competing interests.
Funding AgencyGrant Number
Department of Energy (DOE)DE-SC0004993
Toyota Research InstituteUNSPECIFIED
Record Number:CaltechAUTHORS:20190308-084331413
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:93644
Deposited By: George Porter
Deposited On:08 Mar 2019 22:01
Last Modified:16 Nov 2021 16:59

Repository Staff Only: item control page