CaltechAUTHORS
  A Caltech Library Service

Robust Assessment of Clustering Methods for Fast Radio Transient Candidates

Aggarwal, Kshitij and Burke-Spolaor, Sarah and Law, Casey J. and Bower, Geoffrey C. and Butler, Bryan J. and Demorest, Paul B. and Lazio, T. Joseph W. and Linford, Justin and Sydnor, Jessica and Anna-Thomas, Reshma (2021) Robust Assessment of Clustering Methods for Fast Radio Transient Candidates. Astrophysical Journal, 914 (1). Art. No. 53. ISSN 0004-637X. https://resolver.caltech.edu/CaltechAUTHORS:20210626-183435301

[img] PDF - Published Version
See Usage Policy.

785kB
[img] PDF - Submitted Version
See Usage Policy.

788kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210626-183435301

Abstract

Fast radio transient search algorithms identify signals of interest by iterating and applying a threshold on a set of matched filters. These filters are defined by properties of the transient such as time and dispersion. A real transient can trigger hundreds of search trials, each of which has to be post-processed for visualization and classification tasks. In this paper, we have explored a range of unsupervised clustering algorithms to cluster these redundant candidate detections. We demonstrate this for Realfast, the commensal fast-transient search system at the Karl G. Jansky Very Large Array. We use four features for clustering: sky position (l, m), time, and dispersion measure (DM). We develop a custom performance metric that makes sure that the candidates are clustered into a small number of pure clusters, i.e., clusters with either astrophysical or noise candidates. We then use this performance metric to compare eight different clustering algorithms. We show that using sky location along with DM/time improves clustering performance by ~10% as compared to the traditional DM/time-based clustering. Therefore, positional information should be used during clustering if it can be made available. We conduct several tests to compare the performance and generalizability of clustering algorithms to other transient data sets and propose a strategy that can be used to choose an algorithm. Our performance metric and clustering strategy can be easily extended to different single-pulse search pipelines and other astronomy and non-astronomy-based applications.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.3847/1538-4357/abf92bDOIArticle
https://arxiv.org/abs/2104.07046arXivDiscussion Paper
ORCID:
AuthorORCID
Aggarwal, Kshitij0000-0002-2059-0525
Burke-Spolaor, Sarah0000-0003-4052-7838
Law, Casey J.0000-0002-4119-9963
Bower, Geoffrey C.0000-0003-4056-9982
Butler, Bryan J.0000-0002-5344-820X
Demorest, Paul B.0000-0002-6664-965X
Lazio, T. Joseph W.0000-0002-3873-5497
Linford, Justin0000-0002-3873-5497
Sydnor, Jessica0000-0002-3360-9299
Anna-Thomas, Reshma0000-0001-8057-0633
Additional Information:© 2021. The American Astronomical Society. Received 2021 March 23; revised 2021 April 13; accepted 2021 April 16; published 2021 June 15. K.A. would like to thank Shalabh Singh for useful discussions regarding the performance metric. K.A. and S.B.S acknowledge support from NSF grant AAG-1714897. S.B.S is a CIFAR Azrieli Global Scholar in the Gravity and the Extreme Universe Program. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The NANOGrav project receives support from National Science Foundation (NSF) Physics Frontiers Center award number 1430284. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. Facility: EVLA. - Software: NumPy (Harris et al. 2020), Matplotlib (Hunter 2007), Pandas (Pandas Development Team 2020; McKinney 2010), scikit-learn (Pedregosa et al. 2011; Buitinck et al. 2013), HDBSCAN (Campello et al. 2015), rfpipe (Law 2017).
Funders:
Funding AgencyGrant Number
NSFAAG-1714897
Canadian Institute for Advanced Research (CIFAR)UNSPECIFIED
NASA/JPL/CaltechUNSPECIFIED
NSFPHY-1430284
Subject Keywords:Clustering; Random Forests; Radio transient sources; Radio interferometry; Extragalactic radio sources; Radio bursts; Very Large Array
Issue or Number:1
Classification Code:Unified Astronomy Thesaurus concepts: Clustering (1908); Random Forests (1935); Radio transient sources (2008); Radio interferometry (1346); Extragalactic radio sources (508); Radio bursts (1339); Very Large Array (1766)
Record Number:CaltechAUTHORS:20210626-183435301
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210626-183435301
Official Citation:Kshitij Aggarwal et al 2021 ApJ 914 53
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109591
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:28 Jun 2021 15:14
Last Modified:28 Jun 2021 15:14

Repository Staff Only: item control page