CaltechAUTHORS
  A Caltech Library Service

Benchmarking the Acceleration of Materials Discovery by Sequential Learning

Rohr, Brian and Stein, Helge S. and Guevarra, Dan and Wang, Yu and Haber, Joel A. and Aykol, Muratahan and Suram, Santosh K. and Gregoire, John M. (2020) Benchmarking the Acceleration of Materials Discovery by Sequential Learning. Chemical Science, 11 (10). pp. 2696-2706. ISSN 2041-6520. https://resolver.caltech.edu/CaltechAUTHORS:20200110-151145517

[img] PDF - Published Version
Creative Commons Attribution.

1300Kb
[img] PDF - Supplemental Material
Creative Commons Attribution.

1854Kb
[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

7Mb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200110-151145517

Abstract

Sequential learning (SL) strategies, i.e. iteratively updating a machine learning model to guide experiments, have been proposed to significantly accelerate materials discovery and research. Applications on computational datasets and a handful of optimization experiments have demonstrated the promise of SL, motivating a quantitative evaluation of its ability to accelerate materials discovery, specifically in the case of physical experiments. The benchmarking effort in the present work quantifies the performance of SL algorithms with respect to a breadth of research goals: discovery of any “good” material, discovery of all “good” materials, and discovery of a model that accurately predicts the performance of new materials. To benchmark the effectiveness of different machine learning models against these goals, we use datasets in which the performance of all materials in the search space is known from high-throughput synthesis and electrochemistry experiments. Each dataset contains all pseudo-quaternary metal oxide combinations from a set of six elements (chemical space), the performance metric chosen is the electrocatalytic activity (overpotential) for the oxygen evolution reaction (OER). A diverse set of SL schemes is tested on four chemical spaces, each containing 2121 catalysts. The presented work suggests that research can be accelerated by up to a factor of 20 compared to random acquisition in specific scenarios. The results also show that certain choices of SL models are ill-suited for a given research goal resulting in substantial deceleration compared to random acquisition methods. The results provide quantitative guidance on how to tune an SL strategy for a given research goal and demonstrate the need for a new generation of materials-aware SL algorithms to further accelerate materials discovery.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1039/c9sc05999gDOIArticle
http://www.rsc.org/suppdata/c9/sc/c9sc05999g/c9sc05999g1.pdfPublisherSupplementary Information
https://chemrxiv.org/articles/Benchmarking_the_Acceleration_of_Materials_Discovery_by_Sequential_Learning/11303606OrganizationDiscussion Paper
ORCID:
AuthorORCID
Stein, Helge S.0000-0002-3461-0232
Guevarra, Dan0000-0002-9592-3195
Wang, Yu0000-0003-3589-9274
Haber, Joel A.0000-0001-7847-5506
Aykol, Muratahan0000-0001-6433-7217
Suram, Santosh K.0000-0001-8170-2685
Gregoire, John M.0000-0002-2863-5265
Additional Information:© 2020 The Royal Society of Chemistry. This Open Access Article is licensed under a Creative Commons Attribution 3.0 Unported Licence. Received 27th November 2019, Accepted 27th January 2020, First published on 29th January 2020. This work was funded by Toyota Research Institute through the Accelerated Materials Design and Discovery program (machine learning and simulation of sequential learning); by the Joint Center for Artificial Photosynthesis, a US Department of Energy (DOE) Energy Innovation Hub, supported through the Office of Science of the DOE under Award Number DE-SC0004993 (data acquisition); and by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Award DE-SC0020383 (data curation and establishment of baselines). Data and code availability: All catalyst data is visualized in Fig. 2 or the ESI† and is available for interactive visualization and download at http://data.matr.io/ACE-I. Source code for benchmarking sequential learning runs against random sample selection and demonstrating the sequential learning is available at http://https://github.com/SantoshSuram-TRI/ACE-I. The compilation of data is available in that repository and also at http://https://data.caltech.edu/records/1345 (DOI: 10.22002/D1.1345). Conflicts of interest: B. R., H. S., S. S. and J. G. filed a provisional patent application on active learning enabled experimental catalyst materials discovery: US app. no. 62/837,379. The remaining authors declare no competing interests.
Group:JCAP
Funders:
Funding AgencyGrant Number
Toyota Research InstituteUNSPECIFIED
Joint Center for Artificial Photosynthesis (JCAP)UNSPECIFIED
Department of Energy (DOE)DE-SC0004993
Department of Energy (DOE)DE-SC0020383
Subject Keywords:active learning; autonomous science; oxygen evolution reaction
Issue or Number:10
Record Number:CaltechAUTHORS:20200110-151145517
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200110-151145517
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:100643
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:11 Jan 2020 00:49
Last Modified:02 Apr 2020 14:59

Repository Staff Only: item control page