of 17
PhaseLink: A Deep Learning Approach to Seismic Phase
Association
Zachary E. Ross
1
, Yisong Yue
2
, Men-Andrin Meier
1
, Egill Hauksson
1
, Thomas H. Heaton
1
1
Seismological Laboratory
California Institute of Technology
Pasadena, CA 91125
2
Department of Computing and Mathematical Sciences
California Institute of Technology
Pasadena, CA 91125
Abstract
Seismic phase association is a fundamental task in seismology that pertains to link-
ing together phase detections on different sensors that originate from a common
earthquake. It is widely employed to detect earthquakes on permanent and tempo-
rary seismic networks, and underlies most seismicity catalogs produced around the
world. This task can be challenging because the number of sources is unknown,
events frequently overlap in time, or can occur simultaneously in different parts
of a network. We present PhaseLink, a framework based on recent advances in
deep learning for grid-free earthquake phase association. Our approach learns to
link phases together that share a common origin, and is trained entirely on tens of
millions of synthetic sequences of P- and S-wave arrival times generated using a
simple 1D velocity model. Our approach is simple to implement for any tectonic
regime, suitable for real-time processing, and can naturally incorporate errors in
arrival time picks. Rather than tuning a set of ad hoc hyperparameters to improve
performance, PhaseLink can be improved by simply adding examples of problem-
atic cases to the training dataset. We demonstrate the state-of-the-art performance
of PhaseLink on a challenging recent sequence from southern California, and
synthesized sequences from Japan designed to test the point at which the method
fails. These tests show that PhaseLink can precisely associate P- and S-picks to
events that are separated by
12
seconds in origin time. This approach is expected
to improve the resolution of seismicity catalogs, add stability to real-time seismic
monitoring, and streamline automated processing of large seismic datasets.
1 Introduction
When an earthquake is detected on different stations of a seismic network, it is often desirable to link
the observed seismic phases to the earthquake that caused them. Historically, this task was performed
by expert seismic analysts who would visually examine the data from different stations, identify
seismic phases, and group them together (cf. Figure 1). As the modern digital era began, seismic
networks started to accumulate data in real-time, and it became necessary to develop computer
algorithms to automatically process the data.
With the development of the landmark STA/LTA algorithm in seismology [
1
,
2
], it became possible
to detect earthquakes automatically for the first time. This simple method uses the ratio of two
moving averages to identify impulsive transient signals and has become the de facto standard for
earthquake detection around the world. One major shortcoming of the method is that it will not
only identify earthquakes when present, but also any other types of impulsive transient signals that
arXiv:1809.02880v1 [cs.LG] 8 Sep 2018
0.0
0.2
0.4
0.6
0.8
1.0
Latitude
0
100
200
300
400
500
600
Time (sec)
0
100
200
300
400
500
Distance (km)
P-wa
ves
S-wa
ves
Figure 1: Cartoon example of a phase association scenario. Left panel shows the discrete set of
picks for the entire network. The number of events is unknown. Right panel shows the output after
association and location. Picks colored black are not linked to an event, while colored picks share a
common origin.
seismometers record. This led to the development of the first phase association algorithms, which
examine combinations of triggers on different stations to see whether any set have arrival time patterns
consistent with those of earthquakes [
12
]. The association process therefore evolved from one of
simply grouping seismic phases together, to being ultimately responsible for deciding whether an
earthquake occurred.
To date, algorithms for phase association all operate using the same fundamental principle. The
region of interest is gridded and for each node therein, tentative phase detections within the network
are examined to see whether some subset back-projects to a coherent origin. This means that a grid
search must be conducted continuously for all new picks that are made. Typically grid associators
require extensive tuning of a large number of sensitive hyperparameters, and have numerous ad hoc
rules to stabilize potential problems that can arise. Over the years, they have become increasingly
sophisticated, with modern variants incorporating Bayesian estimates of pick uncertainties [
17
], or
multi-scale detection capabilities.
Today, seismologists strive to identify increasingly smaller events that are often at or below the
noise level. Resolving this level of detail requires not only increasing phase detection sensitivity, but
dealing with the dramatically larger volume of information to be processed in a reliable and rational
manner. In particular, since smaller events occur ever more frequently, and therefore are more closely
spaced in time, moving forward requires technology that can easily handle the most complicated
scenarios encountered at the present.
In recent years, deep learning has become state of the art in numerous domains of artificial intelligence
[
15
], including natural language processing [
28
], computer vision [
14
], and speech recognition [
3
],
and is focused on learning generalized representations of extremely large datasets. Deep learning has
been recently used in seismology, and has already shown considerable promise in performing various
tasks including similarity-based earthquake detection and localization [
18
], generalized seismic phase
detection [
21
], phase picking [
35
], first-motion polarity determination [
20
], detection of events in
2
laboratory experiments [
33
], seismic image sharpening [
16
], and predicting aftershock spatial patterns
[6].
In this paper, we present PhaseLink, which is a deep learning approach for grid-free earthquake phase
association. Our approach is built upon Recurrent Neural Networks (RNNs), which are designed to
learn temporal and contextual relationships in sequential data. We show how to design a training
objective that enables the trained RNN to accurately associate phase detections coming from multiple
temporally overlapping earthquakes. Another attractive feature of our approach is that is it trained
entirely from synthesized data using simple 1D velocity models.
1
Thus, our approach is easily
applicable to any tectonic regime by simply training on the synthesized data from the appropriate
model, and can also naturally incorporate errors in arrival time picks. The full source code will be
publicly available via the Southern California Earthquake Data Center <scedc.caltech.edu>.
2 Background on Recurrent Neural Networks
Artificial neural networks are systems that can discover complex non-linear relationships between
variables. Fundamentally, they successively transform a set of input values through matrix multi-
plication and non-linear activation functions into one or more output variables of interest [
8
]. The
outputs can be either continuous (regression) or discrete (classification). In supervised learning, the
parameters which characterize the non-linear mapping are learned by minimizing the prediction error
of the model against the ground truth. The standard type of neural network is today referred to as
a fully-connected neural network because each neuron is fully connected to each previous input.
Fully-connected networks are excellent at many classification and regression tasks, but have trouble
discovering structure in sequential datasets because they lack feedback mechanisms that can enable
information to propagate between successive elements of a sequence.
These shortcomings were addressed by the development of the recurrent neural network (RNN) [
11
].
RNNs allow for information to be passed between successive elements through the use of an internal
memory state. This state is dynamically modulated by gates that control what information is retained
along the way, and the parameters governing the gates themselves are learned through the training
process. The outputs of RNNs are very flexible, and could be a single valued output given an input
sequence, or a sequence of outputs. To date, RNNs have been applied to variety of settings, including
language translation [
28
], speech synthesis [
30
], speech recognition [
3
], image captioning [
34
], and
many others.
The most commonly employed variant of the RNN is the long short-term memory (LSTM) [
10
]
network. These networks have three gates that control the flow of information, and are useful because
they are not so susceptible to training issues related to diminishing propagation of information over
large sequences. In recent years, another variant called the gated recurrent unit (GRU) [
5
] has become
popular because it has only two gates instead of three, resulting in fewer parameters and faster
training. These types of RNNs are considered state of the art for many problems including speech
recognition and language translation.
Over the years, numerous improvements have been made to these basic types of RNN layers, and
one such important development was the bidirectional RNN layer [
23
]. This layer uses two RNNs
running in opposite directions so that information from both directions of the sequence is available to
make predictions. A common example where this is useful is word prediction, where if a word in the
middle of a sentence is missing, it is generally desirable to use the contextual information from the
entire sentence to make a prediction, rather than just the words leading up to the missing one.
3 Summary of Datasets Used
In this study, we use synthetically generated seismicity and phase sequences, and real phase data for
the 2016 Borrego Springs sequence [
19
], which occurred in southern California during 2016-06-01
to 2016-06-31 (last accessed August 2018). During this period, 1708 earthquakes were identified
by the Southern California Seismic Network (SCSN), and 73,353 phases were picked by seismic
analysts. The data are publicly available from the Southern California Earthquake Data Center
1
This paradigm is generically known as “sim-to-real” in the machine learning community [
27
,
24
,
29
,
7
,
32
].
3
<scedc.caltech.edu>. No waveform data were used. We also used synthetically generated seismicity
using the station configuration of the Hi-net seismic network in Japan.
4 PhaseLink Framework
The PhaseLink approach is designed to solve the phase association problem: given a sequence of
N
picks, determine how many earthquakes (if any) occurred, and which of the
N
picks belong to
each respective earthquake. Fundamentally, one can think of phase association as a (supervised)
clustering problem of clustering picks to earthquakes that generated them. In contrast to conventional
clustering, there is a specific temporal structure to our prediction task, and also the number of clusters
is unknown a priori. For instance, having multiple overlapping earthquakes implies detecting picks
coming from different “clusters”.
Figure 2 depicts the PhaseLink approach, which can be conceptually described in the following steps:
We are given an input stream of picks. Each pick has as attributes the location (latitude &
longitude) of the station that detected the pick, the time stamp, and phase type (Fig. 2, step
1).
The input pick stream is processed into a sequence of overlapping fixed-length sequential
prediction tasks (Fig. 2, step 2). In particular, the prediction task is whether each pick in
the input sequence belongs to the same earthquake that generated the first (root) pick in
the sequence, i.e., a sequential binary classification problem. We solve this fixed-length
prediction task using RNNs (Section 4.1), and we train the RNNs using synthetic data
(Section 4.3).
The overlapping predictions are then aggregated into a single set of clustering results of
picks to earthquakes (Section 4.2; Fig. 2, step 3).
By decomposing the problem in this way, PhaseLink can, in principle, handle any number of
overlapping clusters. Conceptually, the reduced prediction task is based around a reference point and
classifies a temporal neighborhood of points as belonging to the same cluster as the reference point.
A somewhat similar idea was proposed in supervised clustering approaches that utilize must-link and
cannot-link constraints [
31
,
4
], although those approaches are more geared towards learning a metric
space rather than directly solving the clustering problem. Furthermore, our PhaseLink approach can
exploit a natural temporal locality structure to further constrain the prediction task. Another benefit of
directly considering the co-clustering prediction problem is that we can tolerate false picks (those that
do not belong to any cluster/earthquake). A final benefit of this decomposition is that PhaseLink can
utilize off-the-shelf RNN implementations, which leads to significantly reduced system engineering
overhead.
4.1 RNN Architecture
We designed a deep RNN consisting of stacked bidirectional GRU layers (Table 1). The network
takes as input fixed-length sequences of picks, and outputs a sequence of identical length (Fig. 2, step
2). The output sequence is binary valued, with a value of 1 indicating that a given pick belongs to the
same event as the root pick (the first pick of the sequence,
Y
0
), and a value of 0 indicating that the
two picks are unrelated. A final sigmoid activation function is applied separately to each hidden state
output.
We apply the network to a sliding window of picks by incrementing over the entire sequence, shifting
the window by one pick at a time. For the remainder of this paper, we refer to a fixed length sliding
window of
n
p
picks as a
sub-sequence
. Here we use
n
p
= 500. After predictions have been made for
a sub-sequence we drop the root pick and take the next pick as the root for the new sub-sequence.
For each root pick we obtain a set of binary predictions about which of the following picks in the
sub-sequence are related to the root.
Each of the picks in a sub-sequence is characterized by five input features, resulting in an input
feature set with dimensions (
n
p
, 5). The first two features are the latitude and longitude coordinates
of the station that the pick was made on, which are both normalized to be in the range [0, 1] such that
4
4.56
4.84
4.93
5.11
33.57
33.56
33.57
33.49
-116.67
-116
.53
-116.22
-116.60
?
?
?
From same event as root pick?
?
?
Time
Station latitude
Station
lat.
Station lon
.
Time (
sec)
Phase type
P
S
S
5.23
5.62
5.65
33.49
33.59
33.35
-116.60
-116.76
-116.56
...
...
...
...
P
P
P
S
Root pick
Moving RNN window with 500 picks
1
1
0
1
1
0
2. In moving window, predict which
picks are from same event as root
RNN
1. Collect picks over a seismic network
Time
Distance (km)
4. Pick sequence is fully associated
3. Aggregate predictions for all windows
Event
x
x
A
A
x
A
B
B
B
C
C
C
x
C
...
1
0
0
0
0
0
0
0
0
0
0
0
0
0
...
1
0
0
0
0
0
0
0
0
0
0
0
0
...
1
1
0
1
0
0
0
0
0
0
0
0
...
1
0
1
0
0
0
0
0
0
0
0
...
1
0
0
0
0
0
0
0
0
0
...
1
0
0
0
0
0
0
0
0
...
1
1
1
0
0
0
0
0
...
1
1
0
0
0
0
0
...
1
0
0
0
0
0
...
1
1
1
0
1
...
1
1
0
1
...
1
0
1
...
1
0
...
new detections
1
...
1
...
x = false picks
A, B, C, ... = real picks
root pick
Moving
RNN
window
Backward
aggregation
Figure 2: Overview of PhaseLink algorithm. A sliding window of picks is iteratively presented to a
RNN, which outputs a binary sequence of equal length for each window. These output sequences
indicate which picks (if any) are from the same event as the first pick in the window. Each pick in
the sequence has five features: latitude, longitude, arrival time, phase type, and a binary padding
indicator. The results from all windows are then aggregated to determine distinct clusters of picks
(earthquakes detected).
the range spans the full dimensions of the seismic network. The third feature is the time of the pick,
which is defined relative to the root pick within the sub-sequence. Here, we normalize the time values
by a pre-defined maximum allowed value for picks to be included in a sub-sequence, which is chosen
to be 120 seconds. This value is somewhat arbitrarily chosen, but ends up being not too important.
The normalization ensures that this feature does not bias the training process. We discard any picks
within the sub-sequence that are larger than 120 s, and pad the remainder of the feature window with
zeros. The value of 500 picks is chosen loosely to correspond to the maximum number of picks
that we expect to have within any 120 s window, which could vary depending on the problem. The
penultimate feature is a binary value indicating the phase type, where a value of 0 means a P-wave,
5
Table 1: Model architecture
Layer
Neurons
Activation
Bidirectional GRU
200
sigmoid/tanh
Bidirectional GRU
200
sigmoid/tanh
Dense
1
sigmoid
and a value of 1 means an S-wave. Lastly, we have another binary indicator variable for whether a
given pick is a zero-padded placeholder.
4.2 Aggregating Predictions
We now describe the final stage of PhaseLink, where the link predictions from each sub-sequence
are aggregated to formally detect earthquakes. The output of the RNN is a prediction matrix that
describes the link between each pick of a sub-sequence and its root pick (Figure 2, step 3). In order
to assign picks to individual events, rather than to sub-sequence root picks, we cluster linked picks by
incrementing backwards over the prediction matrix. This is performed as follows:
For each sub-sequence, a cluster nucleates if at least
n
nuc
picks have predicted labels of 1.
Once a cluster has nucleated, the set intersection is separately determined between it and
every existing cluster.
The existing cluster with the most picks in common is identified, and if this number is
greater than
n
merge
, the two clusters are merged.
After performing these steps for all sub-sequences, each remaining cluster is retained if the
cluster size is at least
n
min
picks.
In this paper, we use
n
nuc
= 8, which was chosen to maximize the detection performance for the
datasets used herein; varying this hyperparameter in the range 4-8 leads to relatively little change in
performance on these datasets. Since the root pick is always linked to itself, the largest possible value
of
n
merge
=
n
nuc
1
. Here, we use this maximum value,
n
merge
= 7
. Since merging clusters is
performed by identifying the existing cluster with the most picks in common, there is the possibility
that a pick could end up in two separate clusters; however in our testing, this is extremely uncommon.
n
min
is the most sensitive of the three hyperparameters, and its effect on the performance is examined
in detail in the next section.
After applying the aforementioned steps, the PhaseLink algorithm is completed and the sequence
is fully associated. We note that no hypocenters have been determined during this process for any
events, whereas other associators jointly solve for a location as part of the detection process, which is
a more challenging problem. Thus far, we have only discussed the method and how it is to be used;
in the next section, we describe a scheme for generating the training data for the RNN.
4.3 Sim-to-Real Training
The RNN described in Section 4.1 could, in principle, be trained with real seismic phase data.
However, even in the most seismically active regions, the available data may barely be enough to
effectively train such a deep network. Since the network only requires phase arrival times and station
geometries we can instead generate synthetic training datasets of arbitrary size. The key intuition
here is that the supervised clustering problem solved by PhaseLink need not require fully realistic
earthquake data to train an accurate predictor.
4.3.1 Synthetic Data Generation
We develop a simple scheme for generating large datasets of synthetic pick sub-sequences using a 1D
layered model. The goal is for the neural network to learn the essential physics of wave propagation
from the synthetic data, so that this knowledge can be directly applied to real data. To do so we define
a set of rules from which random pick sub-sequences are generated. We use uniform distributions for
all random quantities and denote this distribution as
U
. The rules to generate a single sub-sequence
realization are as follows:
6
1. The initial number of events is chosen from
U
[0
,
20]
.
2.
A random hypocenter is initially assigned to all events. The latitude and longitude are each
drawn from
U
[0
,
1]
, while the depth is drawn from
U
[0
,
25]
km.
3.
At a probability of 10%, each event is then separately reassigned a new hypocenter to
produce events that overlap in time, but originate in different parts of the network.
4.
The first event is assigned an origin time from
U
[
60
,
60]
s, enabling the possibility of
the event to have an origin time before the sub-sequence starts.
5.
The origin times for all subsequent events are chosen such that the time between consecutive
events is
U
[3
,
20]
s
6. The maximum source-reciever distance for each event is drawn from
U
[20
,
100]
km.
7.
Arrival times are calculated with a 1D model for all source-receiver combinations within the
chosen maximum distance.
8. Picks are randomly discarded with Pr
= 0
.
5
to add variability to the station distribution.
9. Arrival time errors are added to each pick and drawn from
U
[
0
.
5
,
0
.
5]
s.
10.
A random number of false picks drawn from
U
[0
,
500]
are randomly distributed across
the network, with origin times drawn from
U
[0
,
120]
s.
11. Picks outside of the time window of
[0
,
120]
s are removed.
12.
All remaining picks are sorted in time and the first 500 are retained. If fewer than 500 exist,
the feature matrix is padded accordingly.
The ability to generate synthetic training data for the model to learn from has several important
advantages. First, since a simple 1D layered velocity model is used to calculate arrival times, the
method is easily applied to most tectonic regimes with relatively little knowledge required about the
velocity structure. Second, errors can be directly added to the synthetic arrivals such that the model
learns to deal with them in a rational way when examining real data. Third, an infinite amount of
training data can be generated, which can prevent overfitting and regularization issues during the
training process.
We produce two separate training datasets in this study. The first is for southern California, using
the exact station distribution from the 811 past and present stations of the SCSN (Figure 3). The 1D
velocity model used is from Hadley and Kanamori [
9
]. The second is for southwestern Japan, using
88 stations of the Hi-net seismic network (Figure 4). For the Japanese dataset, we use the 1D model
of Shibutani et al [26].
Examples of two sub-sequence realizations are shown in Figure 5. The labels of each sub-sequence,
Y
i
, depend on whether the first pick,
Y
0
, is associated to an earthquake or not. If it is, then it and
all picks associated with the event are given a label of 1, while all other picks are given a label of
0. Otherwise,
Y
0
is the only non-zero value. We then repeat all of these steps 12 million times to
generate a total of (up to) 6 billion picks.
4.3.2 Training the RNN
Given the generated datasets, we can train the RNN using any off-the-shelf machine learning package.
The 12 million sub-sequences are designed to represent a wide variety of possible phase arrival time
scenarios from all over southern California and Japan. From here, we can train the RNN to link
phases together. We randomly split our 12 million sub-sequences into training (75%) and validation
(25%) sets. We use a cross-entropy loss function and three NVIDIA GTX 1060 GPUs to train the
model in mini-batches of 96 using the Adam optimization algorithm [
13
]. On the synthetic validation
data, the model achieves classification performance of 99.92%. The training and validation loss
histories are shown in Figure 6.
5 Results
In this section, we examine the performance of PhaseLink under a variety of scenarios. All of the
tests are conducted in a controlled manner, such that ground truth is known for every single pick. This
enables a detailed assessment of the performance at the individual phase level for real sequences of
7
−121 ̊
−120 ̊
−119 ̊
−118 ̊
−117 ̊
−116 ̊
−115 ̊
32 ̊
33 ̊
34 ̊
35 ̊
36 ̊
37 ̊
Figure 3: Map of southern California and the SCSN station distribution. The entire region shown
defines the boundaries for generating training data.
picks, as well as sequences designed to test the point at which the method breaks down. It furthermore
allows for a rigorous direct comparison of PhaseLink with existing grid association methods.
5.1 Application to 2016 Borrego Springs sequence
We apply PhaseLink to the 2016 Borrego Springs sequence [
19
], which occurred in the San Jacinto
fault zone in southern California. For a controlled testing environment, we reconstruct the actual
sequence of 73,353 picks (see Data section), which correspond to 1708 earthquakes that are listed in
the SCSN catalog. Then we add in an equivalent number of false picks (73,353) uniformly distributed
over the seismic network in time, i.e., picks that do not belong to an earthquake. This results in 50%
of the picks in the sequence being false, but the effect is not uniform over time since the number of
events (and therefore the number of picks) decreases with time after the mainshock. This has the
overall effect of mimicking a real seismic network, where the system is dominated by real picks
during a swarm, and later dominated by false picks the rest of the time.
First, we examine the performance of the neural network alone, without the clustering step included
(Table 2). Precision is defined as the ratio of true positives to the true positives plus false positives.
Recall is defined as the ratio of true positives to the true positives plus false negatives. The high
precision for false picks (i.e. with true label = 0) shows that it is relatively rare for the network to
assign a false pick label to real picks. Similarly, the high recall suggests that false picks are rarely
assigned label 1. For the real picks (true label = 1), the high precision implies that the network rarely
incorporates false picks into sequences of real picks. The lower recall, on the other hand, implies that
it quite often discards real picks as false. Since the number of unrelated picks is much larger than that
of related picks, these mis-classifications affect the real pick recall much more than the false pick
precision.
However, this performance is only considering association relative to the root picks of individual sub-
sequences. As we will demonstrate below the clustering scheme described in section 3 successfully
8
132 ̊
133 ̊
134 ̊
135 ̊
136 ̊
34 ̊
35 ̊
36 ̊
Figure 4: Map of southwestern Japan and station distribution (green triangles). The region used for
generating synthetic events is indicated by the solid black line.
Table 2: RNN performance on validation dataset (individual phases)
True label
Precision
Recall
# samples
0
0.99
0.99
71116302
1
0.98
0.96
2236698
recovers many of these missed picks, since a pick only needs to be associated correctly in one of the
sub-sequences it appears in.
Figure 7 demonstrates the outstanding performance of the complete algorithm in the context of
detecting earthquakes, rather than individual phases. This precision-recall curve illustrates the
inherent trade-off when the minimum number of picks per cluster,
n
min
, is varied. To determine
whether the
k
th cluster of picks,
A
k
, corresponds to a successful event detection, we define the
Jaccard precision between it and all clusters of picks in the ground truth,
B
i
,
J
p
k
=
c
max
i
=1
A
k
B
i
A
k
B
i
.
(1)
Here,
c
is the total number of events in the ground truth. If
J
k
0
.
5
, we consider the detection
successful. This means that at least 50% of the picks in the predicted cluster are common with a
single event in the ground truth.
For all values of
n
min
, the precision is >0.996. However, raising
n
min
decreases the recall from 0.956
to ultimately 0.891, because real clusters that have fewer than
n
min
picks get discarded. High values
of
n
min
decrease the algorithm’s ability to detect and associate weakly recorded small earthquakes for
which only small numbers of phase detections are available. However, at least for this test sequence,
it appears that
n
min
8
is sufficient to get excellent association performance from the algorithm.
These performance numbers are in terms of event declarations, but it is also possible to evaluate the
performance of phase associations as well. To do this, we calculate the mean value of
J
p
over the
d
detected events,
̄
J
p
=
1
d
d
k
=1
J
p
k
.
(2)
9
Distance (km
)
Time (
sec)
120
100
80
60
40
20
0
Distance (km
)
All of these picks ar
e linked
to root
Root is unre
lated to other pi
cks
0
50
100
150
200
250
300
350
400
120
100
80
60
40
20
0
0
50
100
150
200
250
300
350
400
Figure 5: Examples of synthetic training sub-sequences. Red picks are linked to the root, while black
picks are unrelated to the root.
We further define the Jaccard recall as,
J
r
i
=
d
max
k
=1
A
k
B
i
A
k
B
i
,
(3)
and calculate the mean value of
J
r
for the dataset:
̄
J
r
=
1
c
c
i
=1
J
r
i
.
(4)
Together,
̄
J
p
and
̄
J
r
represent the precision and recall at an individual phase level, rather than an
event level. These values are also shown in Figure 7 for the same range of
n
min
values, indicating
that not only is the method detecting events well, but that it also reliably associates phases.
In developing a new method, it is also important to benchmark its performance against that of
existing methods. Here, we compare the performance of PhaseLink against the grid associator,
dbgrassoc
, from the Antelope Environmental Monitoring Software package (BRTT Inc.).
dbgrassoc
is currently used by real-time seismic networks around the world, as well as researchers working with
previously collected datasets in an offline mode. The program uses a pre-defined travel time grid that
is set up over the region in which earthquakes are to be detected. There are a number of sensitive
hyperparameters that control the detection process including the minimum number of picks, whether
S-waves are to be included and how to deal with them, travel time residual limits, and the clustering
time window. Once an event is detected, it also re-examines previous detections to see if the new
event should be merged with another, or extra phases can be added in. For this comparison, we use
the exact settings employed by a detection study in the San Jacinto fault zone [
22
], which are very
similar to those used internally for real-time operation by the Anza seismic network. This ensures
that
dbgrassoc
is correctly calibrated and that the comparison is fair.
We applied
dbgrassoc
to the same sequence of picks as used with PhaseLink, and the results are shown
in Figure 7. At an event level,
dbgrassoc
and PhaseLink have nearly identical precision (>0.996), but
the recall for PhaseLink is significantly higher (0.956 vs 0.919). When considering phase association
performance, rather than event detection performance,
dbgrassoc
has slightly higher precision (0.976
vs 0.9718). However, PhaseLink has a much higher recall of 0.955, whereas
dbgrassoc
has a value of
10