of 11
Article
https://doi.org/10.1038/s41467-023-43355-3
Seismic arrival-time picking on distributed
acoustic sensing data using semi-supervised
learning
Weiqiang Zhu
1,2
, Ettore Biondi
1
,JiaxuanLi
1
, Jiuxun Yin
1
,
Zachary E. Ross
1
& Zhongwen Zhan
1
Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake
monitoring and subsurface imaging. However, its distinct characteristics, such
as unknown ground coupling and high noise level, pose challenges to signal
processing. Existing machine learning models optimized for conventional
seismic data struggle with DAS data due t
o its ultra-dense spatial sampling and
limited manual labels. We introduce a se
mi-supervised learning approach to
address the phase-picking task of DAS
data. We use the pre-trained PhaseNet
model to generate noisy labels of P/S arrivals in DAS data and apply the
Gaussian mixture model phase a
ssociation (GaMMA) method to re
fi
ne these
noisy labels and build training datase
ts. We develop PhaseNet-DAS, a deep
learning model designed to process 2D
spatio-temporal DAS data to achieve
accurate phase picking and ef
fi
cient earthquake detection. Our study
demonstrates a method to develop deep learning models for DAS data,
unlocking the potential of integrat
ing DAS in enhancing earthquake
monitoring.
Distributed acoustic sensing (DAS) is a rapidly developing technology
that can turn a
fi
ber-optic cable of up to one hundred kilometers into
an ultra-dense array of seismic sensors spaced only a few meters apart.
DAS uses an interrogator unit to send laser pulses into an optical
fi
ber
and measure the Rayleigh back-scattering from the internal natural
fl
aws of the optical
fi
ber. By measuring the tiny phase changes between
repeated pulses, DAS can infer the longitudinal strain or strain rate
over time along a
fi
ber-optic cable
1
3
. Previous studies have demon-
strated that DAS can effectively record seismic waves
4
9
.Compared
with traditional forms of seismic acquisition, DAS has several potential
advantages in earthquake monitoring. It provides unprecedented
channel spacing of meters compared with tens-of-kilometers spacing
of seismic networks. DAS can also take advantage of dark
fi
bers (i.e.,
unused strands of telecommunication
fi
ber) at a potentially low cost.
Furthermore, DAS is suitable for deployment and maintenance in
challenging environments, such as boreholes, offshore locations, and
glaciers. New DAS interrogator units are becoming capable of longer
sensing ranges at a lower cost with the development of high-speed
Internet infrastructure
1
. Thus, DAS is a promising technology for
improved earthquake monitoring and is under active research. How-
ever, applying DAS to routine earthquake monitoring tasks remains
challenging due to the lack of effective algorithms for detecting
earthquakes and picking phase arrivals, coupled with the high data
volume generated by thousands of channels. The ultra-high spatial
resolution of
fi
ber-optic sensing is a signi
fi
cant advantage compared to
seismic networks but also presents a challenge for traditional data
processing algorithms designed for single- or three-component
seismometers. For example, the commonly used STA/LTA (short-
term averaging over long-term averaging) method
10
is ineffective for
DAS because DAS recordings are much noisier than dedicated seism-
ometer data due to factors such as cable-ground coupling and sensi-
tivity to anthropogenic noise. STA/LTA operates on a single DAS trace
and therefore does not effectively utilize the dense spatial sampling
provided by DAS. Template matching is another effective earthquake
Received: 17 February 2023
Accepted: 8 November 2023
Check for updates
1
Seismological Laboratory, Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA.
2
Berkeley Seismological
Laboratory, Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.
e-mail:
zhuwq@berkeley.edu
Nature Communications
| (2023) 14:8192
1
1234567890():,;
1234567890():,;
detection method, particularly for detecting tiny earthquake
signals
11
14
. However, the requirement of existing templates and high
computational demands limit its applicability for routine earthquake
monitoring
15
.
Deep learning, especially deep neural networks, is currently the
state-of-the-art machine learning algorithm for many tasks, such as
image classi
fi
cation, object detection, speech recognition, machine
translation, text/image generation, and medical image segmentation
16
.
Deep learning is also widely used in earthquake detection
17
22
for
studying dense earthquake sequences
23
28
and routine monitoring
seismicity
29
33
. Compared to the STA/LTA method, deep learning is
more sensitive to weak signals of small earthquakes and more robust
to noisy spikes that cause false positives for STA/LTA. Compared to the
template matching method, deep learning generalizes similarity-based
search without requiring precise seismic templates and is signi
fi
cantly
faster. Neural network models automatically learn to extract common
features of earthquake signals from large training datasets and are able
to generalize to earthquakes outside the training samples. For exam-
ple, the PhaseNet model, which is a deep neural network model trained
using earthquakes in Northern California, performs well when applied
to tectonic
24
,
25
,induced
23
,
26
, and volcanic earthquakes
34
,
35
globally.
One critical factor in the success of deep learning in earthquake
detection and phase picking is the availability of many phase arrival-
time measurements manually labeled by human analysts over the past
few decades. For example Ross et al.
18
collected ~1.5 million pairs of P
and S picks from the Southern California Seismic Network. Zhu and
Beroza
19
employed ~700k P and S picks from the Northern California
Seismic Network. Michelini et al.
36
built a benchmark dataset of ~1.2
million seismic waveforms from the Italian National Seismic Network.
Zhao et al.
37
formed a benchmark dataset of ~2.3 million seismic
waveforms from the China Earthquake Networks. Mousavi et al.
38
created a global benchmark dataset (STEAD) of ~1.2 million seismic
waveforms; Several other benchmark datasets are also developed for
training deep learning models
39
,
40
. Although many DAS datasets have
been collected
41
andmorecontinuetobecollected,mostofthese
datasets have not yet been analyzed by human analysts. Manually
labeling a large DAS dataset can be costly and time-consuming. As a
result, there are limited applications of deeplearning for DAS data.
Most works focus on earthquake detection using a small dataset
42
44
.
Accurately picking phase arrivals is an unsolved challenge for DAS
data, hindering its applications to earthquake monitoring.
There have been a number of approaches proposed to train deep
learning models with little or no manual labeling, such as data
augmentation
45
, simulating synthetic data
46
48
,
fi
ne-tuning and transfer
learning
49
,
50
, self-supervised learning
51
, and unsupervised learning
52
,
53
.
However, those methods have not proven effective in picking phase
arrival time on DAS data. One challenge is the difference in the
mathematical structures between seismic data and DAS data, i.e., ultra-
dense DAS arrays and sparse seismic networks, which complicate
implementation of model
fi
ne-tuning or transfer learning. Addition-
ally, phase arrival-time picking requires high temporal accuracy, which
is dif
fi
cult to achieve through self-supervised or unsupervised learning
without accurate manual picks. Semi-supervised learning provides an
alternative approach, which is designed for problems with limited
labeled data and abundant unlabeled data
54
,
55
. There are several ways
to utilize a large amount of unlabeled data as weak supervision to
improve model training. One example is the Noisy Student method
54
,
which consists of three main steps: (1) training a teacher model on
labeled samples, (2) using the teacher to generate pseudo labels on
unlabeled samples, and (3) training a student model on the combina-
tion of labeled and pseudolabeled data. Thus, the Noisy Student
method can leverage a substantial amount of unlabeled data to
improve model accuracy and robustness.
In this work, we present a semi-supervised learning approach for
training a deep learning model to pick seismic phase arrivals in DAS
data without needing manual labels. Despite the differences in data
modalities between DAS data (i.e., spatio-temporal) and seismic data
(i.e., time series), the recorded seismic waveforms exhibit similar
characteristics. Based on this connection, we investigate using semi-
supervised learning to transfer the knowledge learned by PhaseNet for
picking P and S phase arrivals from seismic data to DAS data. We
develop a new neural network model, PhaseNet-DAS, that utilizes
spatial and temporal information to consistently pick seismic phase
arrivals across hundreds of DAS channels. We borrow a similar idea of
pseudo labeling
56
to generate pseudo labels of P and S arrival picks in
DAS data in order to train deep learning models using unlabeled DAS
data. We extend the semi-supervised learning method to bridge two
data modalities of 1D seismic waveforms and 2D DAS recordings so
that we can combine the advantages of the abundant manual labels of
seismic data and the large volume of DAS data. We demonstrate the
semi-supervised learning approach by training two models. The
PhaseNet-DAS v1 is trained using pseudo labels generated by PhaseNet
to transfer phase picking capability from seismic data to DAS data. The
PhaseNet-DAS v2 is trained using pseudo labels generated by
PhaseNet-DAS v1 to further improve model performance similar to the
Noisy Student method. Unless speci
fi
ed otherwise, we default to using
the PhaseNet-DAS v2 model for evaluation in the following sections.
We test our method using DAS arrays in Long Valley and Ridgecrest,
CA, and evaluate the performance of PhaseNet-DAS in terms of num-
ber of phase picks, phase association rate, phase arrival time resolu-
tion, and earthquake detection and location.
Results
Phase picking performance
One challenge in picking phase arrivals in DAS data is the presence of
strong background noise, as
fi
ber-optic cables are often installed along
roads or in urban environments and DAS is highly sensitive to surface
waves. The waveforms of traf
fi
c signals have certain resemblance to
earthquake signals with sharp emergence of
fi
rst arrivals and strong
surface waves, which leads to many false detections by the pre-trained
PhaseNet model. Traf
fi
c signals are usually locally visible over short
distances of a few kilometers without clear body waves. In contrast,
earthquake signals tend to be much stronger and recorded by an entire
DAS array with both body and surface waves present. PhaseNet-DAS
uses both spatial and temporal information across multiple channels
of a DAS array, making it more robust to traf
fi
c noise. Figure
1
shows
four examples of earthquake signals that can be observed in sections
of the DAS array. Due to strong background noise, we can see that
PhaseNet detects many false P and S arrivals. However, PhaseNet-DAS
predictions have fewer false detections and are consistent across
channels with reduced variation in the picked arrival times. We apply
both models to all events of four DAS cables and compare the number
of associated picks, since picks that can be successfully associated are
more indicative of true positives. After applying the phase associator
GaMMA
57
, the rates of associated phase picks increase from 59% - 69%
for PhaseNet to 89% - 92% for PhaseNet-DAS (Table S1).
In addition to traf
fi
c noise, other factors such as poor ground
coupling and instrumental noise make the signal-noise ratio (SNR) of
DAS data generally lower than that of seismic data. The low SNR makes
it challenging to detect and pick phase arrivals on DAS data. The
PhaseNet model pre-trained on seismic data can detect high SNR
events, but struggles with low SNR events in DAS data (Fig.
2
). After re-
training using semi-supervised learning on DAS data, the PhaseNet-
DAS model signi
fi
cantly improves detections of low SNR events.
PhaseNet-DAS v1 detects 2
5 times more events than PhaseNet across
four DAS cables, and PhaseNet-DAS v2 enhances detection sensitivity
by an additional 25
50% compared to PhaseNet-DAS v1 (Fig.
2
).
Moreover, the number of phase picks per events also signi
fi
cantly
increases for both high and low SNR events after re-training (Fig. S1).
This demonstrates that the PhaseNe
t-DAS model, which is designed to
Article
https://doi.org/10.1038/s41467-023-43355-3
Nature Communications
| (2023) 14:8192
2
use coherent spatial information, can effectively detect weaker
earthquake signals recorded by DAS and pick P and S picks on more
DAS channels than the PhaseNet model, which is designed for
3-component seismic waveforms.
The noisy condition of DAS recording could also impact the
temporal precision of picked phase arrival-times for both manual
labeling and automatic algorithms. Because we lack manual labels of P
and S arrivals as benchmarks, we evaluate the temporal accuracy of
PhaseNet-DAS
s picks indirectly. First, we compared the automatically
picked phase arrival-times with the theoretical phase arrival-times
using a 1D velocity model
58
. For events within ~100 km, the automatic
picks have small time residuals within 2 s, while the time residuals
increase with epicenter distances (Fig. S2). This discrepancy arises not
from imprecise automatic picks, but from differences between the true
3D velocity model and the 1D velocity model we used. Then, we con-
ducted a more precise analysis of the automatically picked phase
arrival-times by comparing the differential arrival-times between two
events measured using waveform cross-correlation. Waveform cross-
correlation is commonly used for earthquake detection (known as
template matching or match
fi
ltering)
11
14
, measuring differential
travel-time
59
62
and relative polarity
63
. Cross-correlation achieves a
high temporal resolution of the waveform sampling rate or super-
resolution using interpolation techniques. We cut a 4-s time window
around the arrival picked by PhaseNet-DAS, applied a band-pass
fi
lter
between 1 Hz and 10 Hz, and calculated the cross-correlation between
event pairs. The differential time was determined from the peak of the
cross-correlation pro
fi
le. Because DAS waveforms are usually much
noisier than seismic waveforms and have low cross-correlation coef-
fi
cients, we further improved the robustness of differential time
measurements using multi-channel cross-correlation
64
,
65
to accurately
extract the peaks across multiple cross-correlation pro
fi
les. We selec-
ted 2539 event pairs and ~9 millions differential time measurements for
both P and S waves as the reference to evaluate the temporal accuracy
of PhaseNet-DAS picks. Figure
3
shows the statistics of these two
(a)
(b)
(d)
(c)
Fig. 1 | Examples of noisy picks predicted by PhaseNet and improved picks predicted by PhaseNet-DAS. a
d
Four examples with different signal-to-noise ratios. Each
sub-panel shows (i) DAS recordings of 30 s and 5000 channels, (ii) the PhaseNet picks, and (iii) the PhaseNet-DAS picks.
Article
https://doi.org/10.1038/s41467-023-43355-3
Nature Communications
| (2023) 14:8192
3
differential time measurements. If we assume the differential time
measurements by waveform cross-correlation are the ground truth,
the errors of differential time measurements by PhaseNet-DAS have a
mean of 0.001 s and a standard deviation of 0.06 s for P waves and a
mean of 0.005 s and a standard deviation of 0.25 s for S waves. For
comparison, the absolute arrival-time errors of the pre-trained Pha-
seNet model compared with manual picks have a mean of 0.002 s and
a standard deviation of 0.05 s for P waves and a mean of 0.003 s and a
standard deviation of 0.08 s for S waves
19
. Although the differential
time errors and absolute arrival-time errors can not be directly com-
pared, the similar scales of these errors demonstrate that we can
effectively transfer the high picking accuracy of the pre-trained Pha-
seNet model to DAS data.
Applications to earthquake monitoring
The experiments above demonstrate that PhaseNet-DAS can effec-
tively detect and pick P- and S-phase arrivals with few false positives,
high sensitivity, and precise temporal accuracy. These automatic
phase arrival-time measurements can be applied to many seismic
studies such as earthquake monitoring and seismic tomography. Here,
we further applied PhaseNet-DAS to earthquake monitoring. Following
a similar work
fl
ow of earthquake detection using seismic networks
66
,
we applied PhaseNet-DAS to DAS data of 11,241 earthquakes in the
earthquake catalogs of Northern California Seismic Network, Southern
California Seismic Network, and Nevada Seismic Network within 5
degrees from two Long Valley DAS arrays (Fig.
5
). These events were
fi
ltered based on an approximate scaling relation determined by Yin et
al.
67
. Because of different sensor coverages between seismic networks
and DAS cables, seismic signals from distant but small magnitude
events are expected to be too small to be detected by DAS, the abso-
lute number of earthquakes in the standard catalogs and those
detected by DAS can not be directly compared. To evaluate the
improvements from semi-supervised learning, we compared the
magnitude and distance distributions of earthquakes detected by
three models, PhaseNet, PhaseNet-DAS v1, and PhaseNet v2, in Fig.
4
and Fig. S3. PhaseNet-DAS signi
fi
cantly improves detection of both
small magnitude events near the DAS array and large magnitude
events at greater distances. We also plotted the approximate locations
of these detected earthquakes determined by phase association (Fig.
5
and Fig. S4). The locations of events within the Long Valley caldera,
which are close to the DAS array, can be well-constrained using these
automatic arrival-time measurements, while the earthquake locations
become less constrained with increasing epicentral distances due to
the limited azimuthal coverage of a single DAS array (Fig. S5). The
physical limitation in azimuth and distance coverage could be
addressed by combining seismic networks, deploying additional DAS
arrays, or designing speci
fi
c
fi
ber geometries in future research.
Lastly, we evaluated PhaseNet-DAS on continuous data to
demonstrate its potential applications in large-scale data mining and
real-time earthquake monitoring. We applied PhaseNet-DAS to 180 h
of continuous data from 2020/11/17 to 2020/11/25 using a 5000-
channel × 200-s window sampled at 100 Hz without overlap. As
PhaseNet-DAS is a fully convolutional network (Fig.
7
)andthecon-
volution operator is independent of input data size, it can be directly
applied to various time lengths and channel numbers subject to the
memory limitations of computational servers. The picked phase arri-
vals were associated using GaMMA in the same manner as above.
Fig. S6 shows the detected and associated picks from three models:
PhaseNet, PhaseNet-DAS v1, and PhaseNet-DAS v2. The results from
these models show a good consistency, while PhaseNet-DAS proves
more effective in detecting several times more picks. To assess the
potential for false positive events, we compared the associated
earthquakes with events in standard earthquake catalogs. The histo-
grams of temporal earthquake frequency in Fig. S7 reveal a good
correlation between events detected by the DAS array and seismic
networks. In particular, for events within ~0.5 degree of the DAS cable
(Fig. S7c), we can observe that earthquake frequencies vary from over
80 events to no events per 6-hour period. Given the background noise
generally does not change dramatically from day to day, this indicates
that these detections are less likely to be false detections from noise
sourcessuchastraf
fi
c. In addition to the high correlation with the
standard catalog, PhaseNet-DAS can detect 2
3timesmoreevents
using DAS alone, demonstrating the potential of combining
fi
ber-optic
networks to enhance the earthquake monitoring capability of con-
ventional seismic networks. The entire processing time of the con-
tinuous DAS data (180 hours and 10,000 channels, 1.8 million channel-
hours) was ~3.5 h using 8 GPUs (NVIDIA Tesla V100). The model pre-
diction of PhaseNet-DAS is fast considering the substantial size of DAS
data. Since the phase-picking task can be embarrassingly parallelized
by segmenting DAS data into windows, the model prediction can be
Fig. 2 | SNR distributions of detected events across four DAS arrays. a
Mammoth
north,
b
Mammoth south,
c
Ridgecrest north,
d
Ridgecrest south. The locations of
the four DAS array are shown in Fig.
8
. The SNR is calculated using two 5-s windows
before and after the theoretical P wave arrival time. The PhaseNet-DAS v1 and v2
models are from the
fi
rst and second iterations of the semi-supervised learning
procedures illustrated in Fig.
6
.
Article
https://doi.org/10.1038/s41467-023-43355-3
Nature Communications
| (2023) 14:8192
4