A Caltech Library Service

Where is Waldo (and his friends)? A comparison of anomaly detection algorithms for time-domain astronomy

Martínez-Galarza, Juan Rafael and Bianco, Federica and Crake, Dennis and Tirumala, Kushal and Mahabal, Ashish A. and Graham, Matthew J. and Giles, Daniel (2020) Where is Waldo (and his friends)? A comparison of anomaly detection algorithms for time-domain astronomy. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Our understanding of the Universe has progressed through deliberate, targeted studies of known phenomena, like the supernova campaigns that enabled the discovery of the accelerated expansion of the Universe, as much as through serendipitous, unexpected discoveries. The discovery of the Jovian moons, and of interstellar objects like 1I/'Oumuamua forced us to rethink the framework through which we explain the Universe and develop new theories. Recent surveys, like the Catalina Realtime-Transient Survey and the Zwicky Transient Facility, and upcoming ones, like the Rubin Legacy Survey of Space and Time, explore the parameter space of astrophysical transients at all time scales, from hours to years, and offer the opportunity to discover new, unexpected phenomena. In this paper, we investigate strategies to identify novel objects and to contextualize them within large time-series data sets to facilitate the discovery of new objects, new classes of objects, and the physical interpretation of their anomalous nature. We compare tree-based and manifold-learning algorithms for anomaly detection as they are applied to a data set of light curves from the Kepler observatory that include the bona fide anomalous Boyajian's star. We assess the impact of pre-processing and feature engineering schemes and investigate the astrophysical nature of the objects that our models identify as anomalous by augmenting the Kepler data with Gaia color and luminosity information. We find that multiple models, used in combination, are a promising strategy to not only identify novel time series but also to find objects that share phenomenological and astrophysical characteristics with them, facilitating the interpretation of their anomalous characteristics.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Martínez-Galarza, Juan Rafael0000-0002-5069-0324
Bianco, Federica0000-0003-1953-8727
Mahabal, Ashish A.0000-0003-2242-0244
Graham, Matthew J.0000-0002-3168-0139
Giles, Daniel0000-0002-8723-1797
Additional Information:We would like to thank the organizers and participants of the Detecting the Unexpected workshop that took place at STScI in 2017. The ideas for this work came from a hack during that workshop and have produced also other papers. In particular, we thank Lucianne Walkovicz for a continuous exchange of ideas and for proposing the original hack. We also thank Dalya Baron for useful insight about the use of the URF method.We thank the original hackers' team which included Kelle Cruz, and Umaa Rebbapragada. In carrying out this research we have used the scikit-learn Python package. This paper includes data collected by the Kepler mission and obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the Kepler mission is provided by the NASA Science Mission Directorate. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555.
Funding AgencyGrant Number
Subject Keywords:methods: data analysis, methods: statistical, stars: are, stars: peculiar (except chemically peculiar)
Record Number:CaltechAUTHORS:20210205-093247639
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:107936
Deposited By: George Porter
Deposited On:05 Feb 2021 22:52
Last Modified:05 Feb 2021 22:52

Repository Staff Only: item control page