A Caltech Library Service

Synthesis, optical imaging, and absorption spectroscopy data for 179072 metal oxides

Stein, Helge S. and Soedarmadji, Edwin and Newhouse, Paul F. and Guevarra, Dan and Gregoire, John M. (2019) Synthesis, optical imaging, and absorption spectroscopy data for 179072 metal oxides. Scientific Data, 6 . Art. No. 9. ISSN 2052-4463. PMCID PMC6437643. doi:10.1038/s41597-019-0019-4.

[img] PDF - Published Version
Creative Commons Attribution.

[img] Archive (ZIP) (ISA-Tab metadata file) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Optical absorption spectroscopy is an important materials characterization for applications such as solar energy generation. This data descriptor describes the to date (Dec 2018) largest publicly available curated materials science dataset for near infrared to near UV (UV-Vis) light absorbance, composition and processing properties of metal oxides. By supplying the complete synthesis and processing history of each of the 179072 samples from 99965 unique compositions we believe the dataset will enable the community to develop predictive models for materials, such as prediction of optical properties based on composition and processing, and ultimately serve as a benchmark dataset for continued integration of machine learning in materials science. The dataset is also a resource for identifying materials composition and synthesis to attain specific optical properties.

Item Type:Article
Related URLs:
URLURL TypeDescription ItemData CentralArticle
Stein, Helge S.0000-0002-3461-0232
Newhouse, Paul F.0000-0003-2032-3010
Guevarra, Dan0000-0002-9592-3195
Gregoire, John M.0000-0002-2863-5265
Additional Information:© The Author(s) 2019. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received 09 October 2018; Accepted 11 February 2019; Published 27 March 2019; Issue Date 01 December 2019. This study is based upon work performed by the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the U.S. Department of Energy (Award No. DE-SC0004993). We thank Kevin Kan for processing materials libraries. Author Contributions: H.S.S. and J.M.G. conceived the project and wrote the majority of code and manuscript. E.S. maintained the database backend and generated composition information. P.F.N. synthesized libraries and collected spectra. D.G. curated processing information and helped in generating the h5 container. J.M.G. supervised the research project. Code Availability: Custom code for handling the dataset is available at This python code enables users to easily download the dataset, pull specific or random images and accompanying spectra as well as processing and composition data. The code is intended to enable easy exploration of the dataset and to provide templates for use in machine learning models. The code requires python version 3.6.4 or higher with the following packages: h5py > = 2.7.1, numpy >  = 1.15.2, tqdm >  = 4.23.0. The authors declare no competing interests.
Funding AgencyGrant Number
Department of Energy (DOE)DE-SC0004993
Subject Keywords:Imaging techniques; Photocatalysis
PubMed Central ID:PMC6437643
Record Number:CaltechAUTHORS:20190327-100527853
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94207
Deposited By: Tony Diaz
Deposited On:27 Mar 2019 17:54
Last Modified:01 Mar 2022 17:54

Repository Staff Only: item control page