A search for radio emission from double-neutron star merger GW190425 using Apertif

Detection of the electromagnetic emission from coalescing binary neutron stars (BNS) is important for understanding the merger and afterglow. We present a search for a radio counterpart to the gravitational-wave source GW190425, a BNS merger, using Apertif on the Westerbork Synthesis Radio Telescope (WSRT). We observe a field of high probability in the associated localisation region for 3 epochs at 68, 90 and 109 days post merger. We identify all sources that exhibit flux variations consistent with the expected afterglow emission of GW190425. We also look for possible transients. These are sources which are only present in one epoch. In addition, we quantify our ability to search for radio afterglows in fourth and future observing runs of the gravitational-wave detector network using Monte Carlo simulations. We found 25 afterglow candidates based on their variability. None of these could be associated with a possible host galaxy at the luminosity distance of GW190425. We also found 55 transient afterglow candidates that were only detected in one epoch. All turned out to be image artefacts. In the fourth observing run, we predict that up to three afterglows will be detectable by Apertif. While we did not find a source related to the afterglow emission of GW190425, the search validates our methods for future searches of radio afterglows.


Introduction
The Advanced LIGO detector network started its first observing run in 2015 and was joined by the Advanced Virgo detector for the second run (Aasi et al. 2015;Acernese et al. 2014). These runs yielded ten detections of binary black hole (BBH) mergers and one binary neutron star (BNS) merger (LIGO Scientific Collaboration and Virgo Collaboration et al. 2019). The successful detection of electromagnetic (EM) emission associated with the sole BNS merger, GW170817 (LIGO Scientific Collaboration and Virgo Collaboration et al. 2017), included the dynamical ejecta of a kilonova (KN; see e.g., Chornock et al. 2017;Coulter et al. 2017 ), the short gamma-ray burst GRB170817A (e.g., Abbott et al. 2017) , and the afterglow of the jet interacting with the interstellar environment (e.g., Alexander et al. 2017;Haggard et al. 2017;Hallinan et al. 2017). Together these studies brought forth an exciting new chapter in multi-messenger astronomy. In the first half of the third observing run (O3), the Advanced LIGO and Virgo detector network detected 39 candidate compact binary coalescences (Abbott et al. 2020c). The first detection of a BNS candidate in this run, LIGO/Virgo S190425z, later confirmed as GW190425 (Abbott et al. 2020a), occurred on April 25 2019. It was observed solely by the Advanced LIGO Livingston detector at a distance of 159 +69 −72 Mpc. The Advanced Virgo detector was also operational but did not detect the merger. The localisation capabilities of just a single detector are poor and, as such, the LAL Inference (Veitch et al. 2015)  partly in Fig. 1, has a 90% credible localisation region on the sky of 7461 deg 2 . Nonetheless, follow-up campaigns using optical and/or infrared facilities were performed directly after the merger. Campaigns such as GROWTH (Coughlin et al. 2019) and GRANDMA (Gendre et al. 2019) aimed to find coincident EM emission from a KN or short gamma-ray burst (SGRB) counterpart. While these efforts covered a large region of the probability map, the searches did not lead to identifying a viable source of the GW emission.
In the absence of an optical or infrared counterpart, radio emission may provide the only way to localise the source at later times (Hotokezaka et al. 2016). In order to maximise the probability of detecting coincident radio emission, a relatively large field-of-view and high sensitivity are necessary. Here, we present a follow-up of GW190425 with the new Apertif phased array feeds (PAFs; Oosterloo et al. 2010; Adams & van Leeuwen 2019) on the Westerbork Synthesis Radio Telescope (WSRT). While the fields-of-view of these PAFs, at 9.5 deg 2 (van Leeuwen et al. 2021), are one of the largest in the world, they are still dwarfed by the error region. As the detection of an afterglow has the potential to significantly further our understanding of such mergers, we performed the search against these odds. In Section 2 we discuss our observations and the data reduction methods. We examine the flux consistency between our three observations in Section 3. In Section 4 we describe the search for radio transients in our observations. We make predictions for the amount of radio afterglows Apertif will detect in the fourth and future gravitational-wave detector network observing runs in Section 5, and conclude in Section 6.

Observations & Data Reduction
We observed a target field with Apertif which was chosen to cover a 9.5 deg 2 region of high probability in the localisation skymap. This field centred on coordinates = 16 h 25 m 12 s , = +17 • 47 24 (J2000) was observed three times for 12 h at Δ = 68, 90, 109 d post merger. We will refer to these observations as Epoch 1, 2 and 3, respectively. Table 1 gives a summary of the observations. For epoch 2, all twelve Westerbork antennas with Apertif PAFs (on the fixed, equidistant Radio Telescopes RT2..RT9 plus the movable dishes RTA..RTD) were operational. For epoch 1 antenna RTD was not operational, and for epoch 3 antenna RT2 was not in use. Furthermore, data from antenna RTC was not included in the processing of epoch 1 as this dish suffered from delay issues.
The field is covered by 40 partially overlapping beams formed from the PAFs with a FWHM of∼ 35 arcminutes each on the sky. The data has a centre frequency of 1280 MHz with 300 MHz of bandwidth being recorded. The flux and bandpass calibrator 3C 147 was either observed before (first and third epoch) or after (second epoch) the observation for 3 mins for each of the 40 Apertif beams. The data was imaged for each beam separately using Apercal, the Apertif imaging pipeline (Adebahr et al. 2021), version 2.5. Due to high levels of Radio Frequency Interference (RFI) in the lower part of the band, only the upper 150 MHz is processed. As the signal is expected to be broad band, the frequency coverage itself is not overly important, but reducing the bandwidth by a factor of two results in an increase of the image noise, and thus the detection threshold, by a factor of √ 2 (Adams et al. 2021) As part of the pipeline, automatic flagging of RFI is performed using AOFlagger (Offringa et al. 2010(Offringa et al. , 2012. This resulted in ∼ 14%, 21% and 12% of the processed data per beam being flagged, on average, due to RFI for epoch 1, 2 and 3, respectively. Additionally, the first 12 mins of observation for epoch 2 were flagged manually as the dishes were not properly settled on source. Including this data resulted in artefacts in the continuum image, while flagging it only had a minor impact on the overall sensitivity. For epoch 1 and 3, all reduction steps including crosscalibration, direction-independent self calibration, imaging and cleaning, were done using standard pipeline parameters (Adebahr et al. 2021). In particular, the intervals of the self calibration solutions, used to update the solutions obtained from the calibrator observations, were derived automatically. For epoch 2, however, solution intervals of 30s, shorter than derived by the pipeline, were necessary to suppress notable phase artefacts in the cleaned continuum images.
The final data product per beam produced by Apercal is a multi-frequency Stokes I image created over the full processed frequency range. A mosaicking script then combines the beam continuum images using a linear algorithm. An inverse-squared weighting, based on preliminary primary beam response models of Apertif, is used to adjust for decreasing sensitivity away from the phase centre. Additionally, different parts of the observation were flagged per beam, affecting the resolution. To ensure a consistent resolution across the mosaic, the images are convolved to the common highest possible resolution before making the mosaic. For both epoch 1 and 3, one beam failed internal data quality checks implemented in the pipeline and was not used when creating the mosaics. For epoch 2, nine beams failed internal data quality checks and were not used in the mosaic. As a consequence, epoch 2 has a slightly reduced FOV compared to the other epochs. We show the mosaic of the third epoch in Fig.  2. The beam size of epoch 2 is 50 arcsec × 12 arcsec, with major axis position angle PA= −0.1 • ; consistent with that at epoch 3, of 48 arcsec × 11 arcsec (PA= −0.6 • ). The beam size of epoch 1, 90 arcsec × 23 arcsec (PA= 0.6 • ) , is considerably worse. We there lacked the longest baselines, as data from antennas RTC and RTD could not be used. The best rms noise sensitivity reached is ∼ 70 Jy, 60 Jy and 50 Jy, for epoch 1, 2 and 3, respectively. This sensitivity decreases closer to the edges of the mosaics or around bright sources due to direction-dependent errors. We inspect the stability of the source fluxes in the next section.

Flux consistency between epochs
The radio sky is relatively quiet compared to other wavelengths, with an estimated transient rate of less than 0.37 deg −2 at 1.4 GHz for sources with a peak flux density greater than 0.21 mJy (Mooley et al. 2013). Furthermore, most sources should not exhibit any variation in their flux besides measurement errors due to noise (see, e.g., Sarbadhicary et al. 2020 and references therein). Because the observations were taken during the Apertif commissioning phase, it is important to test how stable the measured source fluxes actually are between the epochs. To this extent, the mosaics of each epoch were analysed using the source finder PyBDSF (which is also used in the Apercal pipeline; Mohan & Rafferty 2015) and the integrated fluxes for unresolved sources with a positional match within 10 arcsec were compared. We used the local noise map computed by PyBDSF and selected sources 5 above the local rms noise level. Surrounding pixels above the 3 level were also used for source fitting. We constrained the shape of the Gaussian fit to the restoring beam.
The flux scale is consistent between all epochs, which is within the usual range of flux uncertainties for imaging observations. We recover a median of the ratio in fluxes of 1.00, 0.97 and 0.96 between epoch 1 and 2, epoch 1 and 3, and epoch 2 and 3, respectively. To quantify the scatter in the fluxes, for each source we define: where Δ , is the absolute value of the relative difference in integrated flux and int, and int, are the integrated fluxes in epoch and .
The scatter in the radio fluxes is higher than expected, with the median of Δ 1,2 , Δ 1,3 and Δ 2,3 being 13.7%, 10.6% and 8.9%, respectively. We plot this statistic as a function of integrated flux in Fig. 3 for each combination of epochs. The scatter does not exhibit a strong flux dependence beyond what could be expected in terms of rms noise errors. We suspect that the quality of the phase calibration is the main source of this scatter. We discuss The scatter is larger than what could be expected in terms of rms noise error with median values of Δ 1,2 , Δ 1,3 and Δ 2,3 of 13.7%, 10.6% and 8.9%, respectively. We suspect that the main cause of the scatter is the quality of the phase calibration.
these errors, other possible causes, and the implications in our search for the afterglow in the next section.

Radio variables and transients search
We made use of the LOFAR Transients Pipeline (TraP; Swinbank et al. 2015) to detect variable sources or transients in our observations. TraP identifies sources over multiple epochs and constructs light curves for each source. From these light curves, statistics are calculated which can be used to quantify the variability of sources. For sources which are only identified in one epoch, a light curve cannot be constructed. Instead, using TraP we assigned these sources a candidate transient classification based on their signal-to-noise-ratio (SNR) properties. For a detailed explanation of TraP and the associated variability statistics we refer the reader to Swinbank et al. 2015.
We set up TraP such that pixels with a flux count 5 above the local rms noise level are identified as a source and surrounding pixels with a flux count 3 above the noise become a part of that same source. The Gaussian fit that is then performed on the source to measure its flux density is constrained to have the same shape as the restoring beam. The dimensionless de Ruiter radius (de Ruiter et al. 1977), which is used to associate sources between epochs based on their angular separation, is set to the default value of 5.68. Furthermore, matching sources between epochs should have an angular separation of less than one semimajor axis of the restoring beam.
For sources with flux density measurements in two or three epochs, the reduced chi-squared statistic and the fractional variability statistic are calculated by TraP from the resulting light curve. A Gaussian function is fit to the distribution of both statistics in logarithmic space with mean and standard deviation , and , , respectively. Sources with both and significantly higher than the mean values of their distributions are identified as candidate variable sources. Thresholds for both metrics to separate such variables from stable sources can be chosen somewhat freely. Different thresholds do influence recall and precision rates of the produced set of candidate variables considerably (Rowlinson et al. 2019). We followed Dobie et al. (2019) and specified a source as candidate variable if both > + 1.5 and > + 1 .

Expected afterglow emission of GW190425
To search for the radio afterglow of GW190425, we aim to select the potential candidates identified by TraP that have a flux evolution consistent with the expected afterglow emission. We based the criteria for emission partly on the light curve of the sole radio counterpart detected to date: that of GW170817. That event peaked at Δ = 174 +9 −6 d (Mooley et al. 2018a), significantly later than our last epoch at Δ = 109 d post merger. In population studies (Duque et al. 2019), however, a significant fraction of BNS-merger afterglows peak closer to our first or second epoch. Given this range, peak time is not a criterion for candidate selection per se. Even so, these jet dominated afterglows should display only a single rise and decline in flux (Hotokezaka et al. 2016;Dobie et al. 2019), as observed in GW170817 (Mooley et al. 2018a,b;Ghirlanda et al. 2019). We therefore excluded sources with both a decrease in flux density from epoch 1 to 2 and an increase in flux density from epoch 2 to 3.
If the afterglow emission of GW190425 exhibits the same flux evolution as GW170817, the associated radio source should be selected by TraP as an outlier in flux variability compared to the general population of radio sources. The scatter in fluxes mentioned in Section 3, however, might hinder our ability to detect the afterglow emission. We can estimate the expected increase in flux density using the light curve of GW170817 (Mooley et al. 2018a Table 2. Overview of the five candidate variables found in our observations with flux measurements consistent with the expected afterglow emission of GW190425. After cross matching these candidates with the GLADE catalogue (Dálya et al. 2018), no possible host galaxies were found within 3 of the estimated luminosity distance of GW190425. None of these candidates are thus a source of the afterglow emission. We suspect that for these candidates their variability is either caused by direction-dependent errors from a nearby bright source (row two to five), or insufficient phase calibration (first row).
of our epoch 1 and 2 and by ∼ 17% between the time of our epoch 2 and 3. These increases are still larger than the median scatter in our observations. Some models predict even stronger changes in flux over the timescale of our observations (Hotokezaka et al. 2016). Thus, we feel confident that a radio source of the afterglow of GW190425 should still be picked up as an outlier in variability. Even so, the significance of such a detection will be low. We show the results of our search next.

Candidate variables
We obtained the following fit to the distributions of and in logarithmic space: = −0.97, = 0.38 and = 0.16, = 1.07. This corresponds to thresholds of > 59 and > 0.25. Using these thresholds, we recovered 30 candidate variables in our observations. From the 30 candidate variables, we selected those sources with flux measurements consistent with the expected afterglow emission. This resulted in 25 candidate variables possibly associated with GW190425 in our observations. After further manual inspection, 20 candidates in regions of increased noise, e.g., close to the edges of the mosaics, were found to have unreliable flux measurements and were discarded.
The five remaining candidates were cross matched with the GLADE catalogue (Dálya et al. 2018) within a 30 arcsec radius (corresponding to 20 kpc offset) to look for possible host galaxies. No galaxies were found within 3 of the estimated luminosity distance 159 +69 −72 Mpc (90% credible intervals) of GW190425 (Abbott et al. 2020a). We conclude that none of these five candidates are a source of the afterglow emission of GW190425. Possible other origins of their variability are discussed in Section 4.5. For completeness, we list their properties in Table 2.

Candidate transients
Sources with only one flux density measurement can appear in either one of the three epochs to be consistent with the expected afterglow emission. TraP found 291 such sources in our observations, all in the third epoch. Upon manual inspection, most of these sources do seem to be present in previous epochs. We suspect that they were not identified as a source due to their shape not matching the restoring beam. We checked this hypothesis by running TraP without the previously mentioned restoring beam constraint. This led to substantially more sources being properly recognised in the first two epochs and not identified as a new source in the third epoch. However, many more image artefacts were now incorrectly being identified as a new source instead. We thus opted to continue with the restoring beam constraint in place.
To filter out most sources which were either in an area with high noise or had a shape inconsistent with the restoring beam in previous epochs, we made use of their SNR calculated by TraP in the best noise region of the epoch with the previous lowest rms value (Swinbank et al. 2015). If the SNR in the best noise region crossed our detection threshold plus some extra margin , the source was determined to be a candidate transient. What margin to choose depends both on the quality of the observations and desired recall and precision rates of the set of possible transients (see Rowlinson et al. 2019 for an in depth discussion). Setting = 34 (Rowlinson et al. 2019) worked well to cut out most easily identifiable false positives in our candidate transient set. This narrowed down our search to 55 transient candidates. After manual inspection, all candidates were determined to result from either direction-dependent errors around bright sources or noise artefacts at the edges of the mosaics. Thus, we identified no transients in our observations as a source of the afterglow emission of GW190425.

Discussion
The ideal observing campaign for detecting a radio counterpart to GW190425 in our field of view involves a considerable number of repeat visits, and an image quality approaching the theoretical sensitivity limit. The study we present here did not yet meet these specifications. Below will describe the factors that limited our ability to detect the afterglow.
For sources with flux density measurements in only a few epochs, the reduced chi-squared statistic can vary appreciably over these epochs (Rowlinson et al. 2019). In general, the three epochs we observed might not be enough for most sources to robustly determine this measure for the stability of their light curve. Furthermore, this stability is also impacted by the general systematic variability mentioned in Section 3. Outliers in variability are less apparent when the scatter in flux between observations is already substantial for most sources. Thus, the distribution of in log-space becomes wider, which is apparent in the large value of the standard deviation from the Gaussian fit to . For reference, Fig A.1 in appendix A shows a histogram of the 2-dimensional distribution of and with the large spread in clearly visible.
We presume the source of the general systematic variability in our observations to be related to the data reduction. We deem the main source of error to be the quality of the phase calibration. If the phases are not sufficiently calibrated, the emission from point sources is not actually point-like but partially spread out. This can affect the flux that is measured in the source-finding procedures. Besides the resulting scatter in fluxes between the epochs, these calibration errors were also apparent when using TraP to find transients. Many sources were not properly recognised across the epochs due to their inconsistent shape. This produced many false positives in our candidate transient set. Uncertainties in the observations for the primary beam models could also increase the scatter in the flux measurements. Two possible complications are, for example, strong RFI during these observations and the malfunctioning of individual elements. Strong RFI could interfere with the measurements of the primary beam response. The frequency range used in our primary beam model, however, is normally relatively RFI free. We thus do not expect this to be a big influence. Some antennas might have a deformed primary beam shape because of malfunctioning individual elements. This may not be taken into account in the global primary beam model used which is averaged across all antennas. The above, and other, factors can change on timescales shorter than the time between our epochs.
The influence of such variations in the mosaics would be partially compensated at low levels of the primary beam response. Here, the beams overlap which suppresses model errors. Strong changes in the inner part of the beam would not be compensated as much. For epoch 1 and 2, the same set of primary beam models, closest in observation time to these epochs, were used. Epoch 3 was corrected with a different set of models. To measure any fluctuations between the two sets of models, we made a mosaic of the primary beam weights for each set. Between the two mosaics, the difference is at most ∼ 4% at the centre of the beams, with variations often below 1% at the edges. These fluctuations are thus too small to be the main source of the systematic variability in flux.
Sources which are near very bright sources are additionally affected by their direction-dependent errors. These errors gave rise to a non-Gaussian distribution of the noise in parts of our images which impeded accurate flux measurements. We suspect that for four of the five candidates in Section 4.3 their variability is due to these errors. They are relatively close to the same bright source and their relative change in flux between each epoch is similar, see row two to five in Table 2. The other candidate, first row in Table 2, also has a similar light curve evolution but is not particularly close to the other candidates. The shape of this source deviates from the restoring beam in epoch 1, likely due to the insufficient phase calibration, and presumably does not have its flux accurately measured in this epoch. As the flux is constant within the errors between epoch 2 and 3, we believe that the variability of this source, evident in the large values for and the fractional variability , is a result of the calibration error in epoch 1. The characteristic rings resulting from the direction dependent errors also triggered a lot of false positives in the candidate transients set. Reducing these errors is an active area of development for Apercal. In Section 5.1, we detail a manual reprocessing of a few Apertif beams to improve the image quality.
In summary, a number of aspects of our search can be improved for future follow-up campaigns. Their impact would foremost be to reduce the number of false-positive candidates. Any sufficiently bright transients in our observing field would still have been detected.

Improvements in Apertif Imaging
The success of future afterglow searches with Apertif is partly contingent on the achievable image quality. The first Apertif survey data release (Adams et al. 2021) has shown that noise levels down to 30 Jy and good image quality are regularly attained using automatic Apercal pipeline processing. While the automatic processing did not reach a similar quality for our observations, future reprocessing of the data could potentially yield improvements with updated versions of Apercal .
To investigate this prospect, we reprocessed a few centre Apertif beams as a demonstration, using WSClean (Offringa et al. 2014) andDPPP (van Diepen et al. 2018). First, a further crosscalibration was applied to the data using a mask derived from the NVSS survey (Condon et al. 1998). This was followed by two cycles of direction-independent calibration and imaging in a standard way on minute solution intervals. Then the sources with strong artefacts were identified in the images, and the corresponding directions were stored. Finally, additional directiondependent calibration was performed in these directions to solve Work is ongoing to improve the self calibration of the phases in Apercal and implement direction-dependent calibration.
for both amplitude and phase variations on an hour solution interval.
We show a cut-out of the continuum image for beam "00" of epoch 2 in Fig. 4. The image on the left was made with Apercal, while the image on the right was manually reprocessed as described. Compared to Apercal, the sidelobes of the bright source in the centre are better suppressed in the reprocessed image. Furthermore, the rings from the direction-dependent errors around this source are nearly gone. Noise levels are improved reaching ∼ 45 -50 Jy in the centre of the reprocessed image. For epoch 3, the manual reprocessing of beam "00" gives similar improvements, while for epoch 1 the difference with Apercal is less pronounced.
For bright sources with integrated fluxes above a few mJy, the fluxes were consistent within 5% in the reprocessed images of epoch 2 and 3 for beam "00". Compared to the ∼ 10% scatter in the Apercal images for such sources, this is a noticeable improvement which points to the source shapes being more consistent between the two epochs. It is difficult, however, to measure the overall scatter in flux based on a single beam as the number of faint sources is limited. To measure if the scatter decreased for faint sources too, we did an initial comparison in flux between mosaics of seven reprocessed centre beams for epochs 2 and 3. No major changes in the scatter were present in the integrated flux between the epochs compared to the results of Section 3. Of the few reprocessed beams, beam "00" had the most obvious improvements in image quality. We thus did not find a universal improvement in image quality across beams. We reason improvements are unlikely when running the reprocessed data through TraP to search for variable sources. Still, the direction-dependent calibration does reduce artefacts. This will lead to fewer false positive transients in our future afterglow searches.
We conclude that more observational epochs, further characterisation and understanding of the Apertif system, and improvements in the pipeline will greatly benefit our search for radio counterparts to BNS mergers. In the next section, we will describe the prospects for finding these counterparts in the fourth and future observing runs of the gravitational-wave detector network.

Expanding GW network of detectors
The fourth observing run (O4) of the gravitational-wave detector network is expected to commence in 2022, although the early suspension of O3 has made the start date and the anticipated detector sensitivities more uncertain. Here, we will use the numbers for sensitivities outlined by the KAGRA Collaboration, LIGO Scientific Collaboration and Virgo Collaboration (Abbott et al. 2020b). Four detectors are planned to be operational for 1 year in O4: the two Advanced LIGO (aLIGO) detectors at design sensitivity, Advanced Virgo (AdV) in its Phase 1 upgrade and the newest addition, KAGRA (Somiya 2012;The KAGRA Collaboration et al. 2013). During this year, the aLIGO detectors, AdV and KAGRA will have an anticipated BNS range of 160-190 Mpc, 90-120 Mpc and 25-130 Mpc, respectively. Importantly, from the 10 +52 −10 BNS events that are expected in total, 38-44% of these events are predicted to have a 90% credible region on the sky smaller than 20 deg 2 . This can be covered with just three Apertif pointings. While the localisation region on sky will likely be drastically reduced, the increase in BNS range will mean that more events will be detected at larger luminosity distances in O4. This will decrease our ability to detect the radio afterglow due to the inverse square relation between the radio flux and the luminosity distance. As a point of reference, we show a fit to the radio light curve of the afterglow of GW170817 (Mooley et al. 2018a) in Fig. 5 in blue with the best achieved 3 Apertif sensitivity indicated in black. We would have been able to detect the emission if Apertif was operational at the time but this would not have been possible if the merger happened much further out. Just one observation would still yield a rather marginal detection. It is thus essential to regularly sample the light curve across multiple epochs to obtain a robust detection with physical information. Especially the peak and the subsequent decay of the light curve should be monitored carefully. Certainly, the decay rate gives crucial information about the physical processes of the BNS merger (see Mooley et al. 2018a and references therein). If we had observed GW170817 at the same days post merger as our observations for GW190425, orange stripes in Fig. 5, we would have detected the afterglow in the last two observations.
To quantify our ability to detect radio counterparts to BNS events in O4 and beyond, we follow the population study in Duque et al. (2019) and include Apertif. We will summarise both the criterion for radio detection and gravitational-wave detection of the merger in the next section. We refer to their work for a detailed explanation of the afterglow model and the distribution of BNS population parameters.

Forecasts for radio detections of binary neutron star merger afterglows
It is assumed that the peak of the BNS afterglow emission is dominated by a relativistic core jet with the peak flux scaling as (Nakar et al. 2002): where iso,c is the core jet isotropic equivalent kinetic energy, is the jet opening angle, is the external density, e , B and are shock microphysics parameters, is the wavelength, is the distance and is the viewing angle. The afterglow was assumed to be detectable if the peak flux exceeded the sensitivity of the radio telescope which we set at 90 Jy for Apertif observing at = 1.4 Ghz. As a comparison, we also included four radio arrays at sensitivities listed in Duque et al. (2019) observing at = 3 Ghz: the Karl Jansky Very Large Array (VLA) at = 15 Jy, the Square Kilometer Array (Fender et al. 2015) in phase 1 (SKA1) at = 3 Jy, and SKA in phase 2 (SKA2) and Next Generation VLA (ngVLA, Selina et al. 2018) both at = 0.3 Jy. The BNS mergers detected and localised in GWs were determined using the following single detector criterion:

√︂
1 + 6 cos 2 + cos 4 8 >¯, where¯= √︃ 2 5 , with = 2.26 the horizon of the second most sensitive instrument in the detector network, in this case aLIGO Hanford. Using this criterion, it was presumed that if aLIGO Hanford detected the GW, it was also detected by aLIGO Livingston (the most sensitive instrument in the network). We also assumed that this led to a sufficiently small localisation region for radio follow-up to be possible with one or a few telescope pointings. We note that this is a relatively simple criterion which might underestimate the size of the localisation region inferred solely through GW detection.
For the Monte Carlo simulation we used the fiducial population model from Duque et al. (2019, their Section 4.1). This uses the following parameter distributions: a broken power-law distribution for iso,c , motivated by the gamma-ray luminosity function of cosmological short gamma-ray bursts (Beniamini et al. 2016;Duque et al. 2019), with a density of probability: where is normalised to unity, min = 10 50 erg and max = 10 53 erg. Furthermore, 1 = 0.53, 2 = 3.4 and b = 2×10 52 erg were used, adopted from Ghirlanda et al. (2016). For both and B a log-normal distribution was used with central value = 10 −3 and standard deviation = 0.75. Similar values have been reported when fitting the afterglow of GW170817 (see, e.g., Hallinan et al. (2017)). The distribution was restricted to [10 −4 , 10 −2 ] for B . The rest of the jet parameters were fixed at typical values of = 0.1 rad, e = 0.1, and = 2.2. The binaries were distributed uniformly in volume and cos .
We simulated a sufficient amount of binaries within the skyaveraged horizon distance¯for convergence. In Fig. 6, we plot the ratio of events detected both in GWs and in radio through their afterglow emission ( joint ) versus those only detected in GWs ( GW ). We repeated the simulation multiple times to cover a range in beyond the expected detector sensitivity of O4.

Monte Carlo simulation
Our results from the Monte Carlo simulation for the observatories besides Apertif are nearly identical to Duque et al. (2019), confirming our methods. Future observatories such as SKA1, SKA2 or ngVLA, which are not yet operational during O4, could be sensitive enough to facilitate population level studies of BNS afterglows. They pose great promise in future observing runs. If they were operational during O4 already, SKA1 would detect 34% of the well localised events with SKA2 detecting almost two-thirds of the events at 64% coverage.
From the telescopes that will be operational during O4, Apertif will be able to detect roughly a third of the afterglows that SKA1 would detect. This is ∼ 11% of the well localised GW events. The VLA will be able to detect ∼ 19% of the same events. While Apertif is certainly less sensitive than the VLA, the criterion of Eq. 3 does not take into account the field-of-view of the different arrays. In this regard, Apertif has an advantage over the VLA. Hence, for events with bigger localisation regions it could be that Eq. 3 is not met, impeding a VLA observation of the afterglow, whereas Apertif might still be able to make a detection. Furthermore, even if Eq. 3 is met, the localisation region may still only be tractable for follow-up by large field-of-view telescopes. Previous GW detections by only two detectors have shown large localisation regions in the past (Fig. 5 of Abbott et al. 2020b) but we note that optical/infrared follow up could also be crucial for localising such events.
Half of the BNS events in O4 will likely have a localisation small enough to be followed up with Apertif. The median 90% credible localisation region is just 33 deg 2 (Abbott et al. 2020b), or about four Apertif pointings if the shape matches. From the expected BNS events in O4 and the results presented above, it follows that up to three afterglows will be detectable by Apertif. In the optimistic scenario that all events will have either a small localisation region or are localised through, e.g, optical/infrared follow up, this number doubles. Moreover, for the afterglows which have not been observed in optical or infrared wavelengths, for example, for the large fraction that happen in the day-time sky, wide field radio telescopes such as Apertif may be the only way to detect the electromagnetic signal at all.
As a final point, we emphasise the need for regular radio follow up of the BNS merger. The radio criterion used in these simulations only looks at the detectability of the afterglow. This does not necessarily translate into an actual detection if the emission flux density is close to the sensitivity limit of the telescope (this is also discussed in Duque et al. 2019). The forecasts given in this section should therefore be seen as an upper limit. Even so, multiple marginal detections would still have significant scientific potential and should be actively pursued.

Conclusion
In this work, we have described the first follow-up of a BNS merger with the new Apertif PAFs installed on the WSRT. We covered an 9.5 deg 2 region of high probability in the localisation skymap of the first O3 BNS event, GW190425, over three observational epochs. While we only observed a small fraction of the approximate 7500 deg 2 90% credible localisation region, a possible counterpart could have been of high scientific significance. We identified five candidate counterparts based on their flux variability in our observations. As we found no associated host galaxies for either of the sources at a luminosity distance consistent with GW190425, we ruled them all out. We also looked for possible counterparts in transient sources with only one flux measurement. Our initial analysis found 55 such candidate transients. These were all determined to be imaging artefacts after manual inspection. Although we did not find a radio afterglow counterpart to GW190425, this search helped develop the pipeline and methods which will be instrumental for future searches of radio afterglows with Apertif. Furthermore, the average sensitivity that will be achieved in future observations for finding BNS afterglows will increase with further characterisation and commissioning of Apertif.
We also made predictions for our ability to detect BNS merger afterglows in the fourth observing run of the GW detector network. We extended simulations from Duque et al. (2019) by including Apertif and estimate that up to three afterglows will be detectable by the telescope. While the sensitivity of Apertif is lower than other radio telescopes, it has a significant advantage in terms of field-of-view. We caution that considerable uncertainties remain in both the GW localisation array as well as the BNS population distribution.