Robust Assessment of Clustering Methods for Fast Radio Transient Candidates
Fast radio transient search algorithms identify signals of interest by iterating and applying a threshold on a set of matched filters. These filters are defined by properties of the transient such as time and dispersion. A real transient can trigger hundreds of search trials, each of which has to be post-processed for visualization and classification tasks. In this paper, we have explored a range of unsupervised clustering algorithms to cluster these redundant candidate detections. We demonstrate this for Realfast, the commensal fast-transient search system at the Karl G. Jansky Very Large Array. We use four features for clustering: sky position (l, m), time, and dispersion measure (DM). We develop a custom performance metric that makes sure that the candidates are clustered into a small number of pure clusters, i.e., clusters with either astrophysical or noise candidates. We then use this performance metric to compare eight different clustering algorithms. We show that using sky location along with DM/time improves clustering performance by ~10% as compared to the traditional DM/time-based clustering. Therefore, positional information should be used during clustering if it can be made available. We conduct several tests to compare the performance and generalizability of clustering algorithms to other transient data sets and propose a strategy that can be used to choose an algorithm. Our performance metric and clustering strategy can be easily extended to different single-pulse search pipelines and other astronomy and non-astronomy-based applications.
© 2021. The American Astronomical Society. Received 2021 March 23; revised 2021 April 13; accepted 2021 April 16; published 2021 June 15. K.A. would like to thank Shalabh Singh for useful discussions regarding the performance metric. K.A. and S.B.S acknowledge support from NSF grant AAG-1714897. S.B.S is a CIFAR Azrieli Global Scholar in the Gravity and the Extreme Universe Program. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The NANOGrav project receives support from National Science Foundation (NSF) Physics Frontiers Center award number 1430284. The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc. Facility: EVLA. - Software: NumPy (Harris et al. 2020), Matplotlib (Hunter 2007), Pandas (Pandas Development Team 2020; McKinney 2010), scikit-learn (Pedregosa et al. 2011; Buitinck et al. 2013), HDBSCAN (Campello et al. 2015), rfpipe (Law 2017).
Submitted - 2104.07046.pdf
Published - Aggarwal_2021_ApJ_914_53.pdf