A Caltech Library Service

Multi-component background learning automates signal detection for spectroscopic data

Ament, Sebastian E. and Stein, Helge S. and Guevarra, Dan and Zhou, Lan and Haber, Joel A. and Boyd, David A. and Umehara, Mitsutaro and Gregoire, John M. and Gomes, Carla P. (2019) Multi-component background learning automates signal detection for spectroscopic data. npj Computational Materials, 5 . Art. No. 77. ISSN 2057-3960. doi:10.1038/s41524-019-0213-0.

[img] PDF - Published Version
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Automated experimentation has yielded data acquisition rates that supersede human processing capabilities. Artificial Intelligence offers new possibilities for automating data interpretation to generate large, high-quality datasets. Background subtraction is a long-standing challenge, particularly in settings where multiple sources of the background signal coexist, and automatic extraction of signals of interest from measured signals accelerates data interpretation. Herein, we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets, a transformative capability with many applications in the physical sciences and beyond.

Item Type:Article
Related URLs:
URLURL TypeDescription ItemData ItemData ItemData ItemData
Stein, Helge S.0000-0002-3461-0232
Guevarra, Dan0000-0002-9592-3195
Zhou, Lan0000-0002-7052-266X
Haber, Joel A.0000-0001-7847-5506
Umehara, Mitsutaro0000-0001-8665-0028
Gregoire, John M.0000-0002-2863-5265
Additional Information:© 2019 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received 18 February 2019; Accepted 26 June 2019; Published 19 July 2019. The development of the MCBL algorithm, inkjet printing synthesis, and Raman measurements were supported by a an Accelerated Materials Design and Discovery grant from the Toyota Research Institute. Initial design of the algorithm and data procurement were supported by the NSF Expedition award for Computational Sustainability CCF-1522054 and by Army Research Office (ARO) award W911-NF-14-1-0498. The implementation of the algorithm for automated, unsupervised operation was supported by MURI/AFOSR grant FA9550. Compute infrastructure was provided by NSF award CNS-0832782 and by ARO DURIP award W911NF-17-1-0187. The sputter deposition and XRD measurements were supported through the Office of Science of the U.S. Department of Energy under Award No. DE-SC0004993. The authors thank Edwin Soedarmadji for assistance with data management. Data availability: The datasets analyzed during the current study are available in the Caltech Data repository: XRD at, and Raman at, Code availability: The codes pertaining to the current study will be available at Author Contributions: C.G. and J.G. identified the problem to be solved. S.A. and C.G. conceptualized the model. S.A. developed the mathematical framework, designed the algorithm, and implemented it. J.G., H.S. and D.G. inspected results. S.A., D.G. and J.G. created visualizations of the results. L.Z. performed materials synthesis and data acquisition for XRD data. J.H. synthesized materials for Raman measurements. D.B. and M.U. acquired and provided the Raman data. S.A., J.G., C.G., H.S. and D.G. wrote the paper. The authors declare no competing interests.
Funding AgencyGrant Number
Toyota Research InstituteUNSPECIFIED
Army Research Office (ARO)W911-NF-14-1-0498
Air Force Office of Scientific Research (AFOSR)FA9550
Army Research Office (ARO)W911NF-17-1-0187
Department of Energy (DOE)DE-SC0004993
Record Number:CaltechAUTHORS:20190719-095139134
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:97288
Deposited By: Tony Diaz
Deposited On:19 Jul 2019 17:11
Last Modified:16 Nov 2021 17:30

Repository Staff Only: item control page