A Caltech Library Service

Real-time data mining of massive data streams from synoptic sky surveys

Djorgovski, S. G. and Graham, M. J. and Donalek, C. and Mahabal, A. A. and Drake, A. J. and Turmon, M. and Fuchs, T. (2016) Real-time data mining of massive data streams from synoptic sky surveys. Future Generation Computer Systems, 59 . pp. 95-104. ISSN 0167739X. doi:10.1016/j.future.2015.10.013.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time. Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets. Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a key bottleneck that limits the scientific returns from the current and anticipated synoptic sky surveys. Similar challenge arises in other contexts, from environmental monitoring using sensor networks to autonomous spacecraft systems. Given the exponential growth of data rates, and the time-critical response, we need a fully automated and robust approach. We describe the results obtained to date, and the possible future developments.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper
Djorgovski, S. G.0000-0002-0603-3087
Graham, M. J.0000-0002-3168-0139
Mahabal, A. A.0000-0003-2242-0244
Turmon, M.0000-0002-6463-063X
Additional Information:© 2016 Elsevier B.V. Received 16 March 2015; Received in revised form 3 October 2015; Accepted 19 October 2015. This work was supported in part by the NASA grant 08-AISR08-0085, the NSF grants AST-0909182, IIS-1118041, and AST-1313422, by the W. M. Keck Institute for Space Studies at Caltech (KISS), and by the US Virtual Astronomical Observatory, itself supported by the NSF grant AST-0834235. Some of this work was assisted by the Caltech students Nihar Sharma, Yutong Chen, Alex Ball, Victor Duan, Allison Maker, and others, supported by the Caltech SURF program. We thank numerous collaborators and colleagues, especially within the CRTS survey team, and the worldwide Virtual Observatory and astroinformatics community, for stimulating discussions.
Group:Keck Institute for Space Studies
Funding AgencyGrant Number
NSFAST- 0909182
Keck Institute for Space Studies (KISS)UNSPECIFIED
NSFAST- 0834235
Caltech Summer Undergraduate Research Fellowship (SURF)UNSPECIFIED
Subject Keywords:Sky surveys; Massive data streams; Machine learning; Bayesian methods; Automated decision making
Record Number:CaltechAUTHORS:20160202-141416022
Persistent URL:
Official Citation:S.G. Djorgovski, M.J. Graham, C. Donalek, A.A. Mahabal, A.J. Drake, M. Turmon, T. Fuchs, Real-time data mining of massive data streams from synoptic sky surveys, Future Generation Computer Systems, Volume 59, June 2016, Pages 95-104, ISSN 0167-739X, (
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:64171
Deposited By: Colette Connor
Deposited On:02 Feb 2016 22:37
Last Modified:10 Nov 2021 23:26

Repository Staff Only: item control page