of 13
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude
Estimates for Earthquake Early Warning with a Filter Bank
by M.-A. Meier, T. Heaton, and J. Clinton
Abstract
Earthquake early warning (
EEW
) is a race against time. In particular, at
proximal sites to the epicenter (typically the most heavily affected sites), strong ground
motion starts shortly after the
P
-wave onset. For these sites, regional-type
EEW
systems
that wait until data from several stations are available before issuing a warning and that
require fixed data windows following a trigger are not fast enough. Single-station algo-
rithms, on the other hand, have high uncertainties that compromise their usefulness. In
this article, we propose that uncertainties of the earliest warning messages can be re-
duced substantially if the broadband frequency information of seismic signals is fully
exploited. We present a novel probabilistic algorithm for estimating
EEW
magnitudes.
The Gutenberg algorithm uses a filter bank for a time
frequency analysis of the real-
time signals and estimates the posterior probabilities of both magnitude and source
station distance directly from the observed frequency content. It starts off as a single-
station algorithm and then naturally evolves into a regional-type algorithm, as more data
become available. Using an extensive near-source waveform data set, we demonstrate
that the Gutenberg parameter estimates reach the estimation accuracy and precision of
existing regional-type
EEW
systems with only 3 s of data from a single station. The
magnitude estimates, however, saturate at a threshold magnitude that depends on the
available signal length that is used for the estimation, suggesting that current
EEW
mag-
nitude estimates (1) are observational rather than predictive and (2) have to be consid-
ered minimum estimates, depending on the amount of available data.
Introduction
Modern seismic networks with real-time data transmis-
sion streams allow the recognition of ongoing seismic events
as soon as the seismic
P
phases arrive at the nearest stations
to the epicenter. Earthquake early warning (
EEW
) systems
use the frequency content and absolute amplitudes of such
waveforms to estimate event magnitudes and locations in
near real time to warn more distant sites of imminent strong
ground motions (
Heaton, 1985
;
Allen and Kanamori, 2003
).
Over the past decade, several damaging earthquakes
have occurred in areas with operational
EEW
systems,
revealing both the great potential as well as the shortcomings
of current systems. For areas where seismic hazard predomi-
nantly comes from shallow crustal seismicity, one particular
insight from these real-world examples is that, in most cases,
the earliest warning messages are the most important ones.
By the time essential updates are available, strong shaking
has already begun at sites that will experience high ground-
motion intensities as the event unfolds. This is because
events in the magnitude range
M<
7
can produce severe but
localized damage
the high intensity ground motions are
confined to relatively small zones around the epicenter. At
the strongly affected proximal sites, the high-amplitude
ground motions from
S
waves arrive soon after the
P
waves
that are used to detect and characterize the event; conse-
quently, there is only a short time window for computing
and issuing timely warnings.
Only in the rare cases of very large events, in which
there are large areas that are far away from the epicenter but
close to the finite rupture zone, will there be long warning
times, allowing full use of the estimation updates. On the
other hand, for the more frequent
M<
7
events, regional
EEW
approaches, which use a fixed number of stations (typ-
ically 4 or 6) and fixed time windows (typically 3
4 s) (e.g.,
Cua and Heaton, 2007
;
Satriano
et al.
, 2011
;
Kuyuk
et al.
,
2013
) may be too slow in many scenarios.
Kuyuk and Allen
(2013)
analyzed the blind zone of a hypothetical system for
California that requires 4 s of data from at least one station
and a
P
-wave detection at four stations to trigger. Including
telemetry and processing delays of 4 s, the resulting blind
zone radii were 10
20 km in metropolitan San Francisco,
20
30 km in metropolitan Los Angeles, and
>
30
km in less
densely populated areas, depending mainly on network sta-
tion density. However,
according to the intensity prediction
equations of
Allen
et al.
(2012)
, it takes an
M>
6
:
5
earth-
2774
Bulletin of the Seismological Society of America, Vol. 105, No. 5, pp. 2774
2786, October 2015, doi: 10.1785/0120150098
quake to produce strong intensities (VI on the modified
Mercalli intensity scale) beyond a rupture distance of
30 km. This shows that, in most cases, sites with intensities
greater than or equal to VI would only get a timely warning
in the case of an
M>
6
:
5
event. Using a similar system in
Italy,
Picozzi
et al.
(2015)
conclude that positive warning
times at sites with intensity ground motions larger than
VII on the European Macroseismic Scale are only possible
for events with
M
w
>
6
:
5
.
Onsite-type approaches (e.g.,
Wu
et al.
, 2007
;
Böse,
Hauksson, Solanki, Kanamori, and Heaton, 2009
;
Böse,
Hauksson, Solanki, Kanamori, Wu, and Heaton, 2009
), on
the other hand, can provide faster warning information, but
they may not be accurate enough for many
EEW
applications.
An optimal
EEW
algorithm, however, would not belong to
either of these two categories. Instead, it would start charac-
terizing an event as soon as a first station triggers and then
assimilate all incoming information in a systematic way
and thus naturally evolve into a regional system (
Kanamori,
2005
). The uncertainties of the earliest earthquake character-
izations when little data are available are likely very high but
accurately represent the state of knowledge at a given instant.
A primary objective of an
EEW
algorithm, therefore,
should be to attain the lowest possible uncertainty levels as
soon as possible following the event detection. The useful-
ness of the system is then largely dependent on the level of
these early uncertainties. In this study, we suggest that these
early parameter uncertainties can be lowered substantially
with respect to existing algorithms, if the available waveform
information is more thoroughly exploited (
Simons
et al.
,
2006
). We introduce a new magnitude estimation algorithm
that is tailored to provide optimal estimates for the earliest of
warning messages when only small amounts of data are
available. The Gutenberg algorithm (
GBA
) performs a
real-time time
frequency analysis of available waveforms
via a minimum-phase-delay filter bank and, for each trig-
gered station, estimates Bayesian posterior probabilities for
magnitude and source
station distance combinations in an
empirical way. Its probabilistic formulation offers a natural
way to assimilate newly available data (from the same or
from newly triggered stations), as well as external parameter
constraints such as distance constraints from real-time
epicenter locations or prior information. Furthermore, the
algorithm quantifies the uncertainties of magnitude and
source
station distance (including their interdependence)
in a systematic way.
We first present an extensive near-source waveform data
set that we have compiled to train and test the algorithm. We
then explain the real-time time
frequency analysis via the
filter bank and show how the algorithm uses the real-time
frequency information for the magnitude estimation. This
is followed by an analysis of the resulting time-dependent
magnitude estimates and their uncertainties. We then discuss
what the algorithm performance implies for the possibilities
and the limitations of
EEW
at proximal sites.
Waveform Data Set for Algorithm
Training and Testing
Our parameter inference method is empirical. We
therefore have compiled an extensive waveform data set
that we use as training and test data. The goals were to best
quantify the dependency of the evolving peak ground-mo-
tion amplitudes on the
EEW
-relevant parameters magnitude
M
and hypocentral distance
R
.Foran
EEW
algorithm, the
focus is to collate the maximum available information for
near-source ground motions of large events. Because we
also want to effectively disti
nguish these ground motions
from other events, we include data from smaller events and
from more distant recordings. Furthermore, the algorithm is
designed for shallow crustal seismicity, as is predominant in
California, Turkey, or Switzerland; we therefore only in-
clude shallow seismicity data. We strove to use maximally
objective data selection criteria.
Japan
From the strong-motion networks K-Net and KiK-net,
we use all records of events with
M
JMA
5
:
0
and with
hypocentral distances
<
100
km. To exclude subduction
zone events, we excluded events with a focal depth
>
25
km
and all records of events for which the nearest record has a
hypocentral distance of
>
40
km. Only records from surface
stations were used. The resulting number of three-component
records is 8095. Japan Meteorological Agency (JMA) mag-
nitudes are used.
Southern California
From the Southern California Seismic Network data
(SCSN) archive, we collect all high-rate broadband record-
ings (seed channel names HH[ENZ]) and high-rate strong-
motion recordings (seed channel names H[NLG][ENZ]) with
hypocentral distances
<
100
km of events with catalog mag-
nitudes
M
4
from the time period 1 January 1990
8 August
2014 and
M
3
from the time period 1 January 2001
18
August 2014. Furthermore, we include all records with
hypocentral distances
<
50
km and
M
2
from the time
period 1 January 2012
18 August 2014. A total number of
55,454 three-component records was processed from southern
California. In addition to the waveforms, we have also down-
loaded phase picks where available. We use the preferred mag-
nitudes from the SCSN catalog, which are
M
L
and, for larger
events,
M
w
.
Next Generation Attenuation-West 1
From the Next Generation Attenuation-West 1 (NGA-
West 1) strong-motion data set (
Chiou and Youngs, 2008
),
we use all traces we are certain include the
P
-wave onset
(911 out of a total of 3551 three-component recordings).
For all of the used records we use
M
w
magnitudes.
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude Estimates for EEW
2775
We merge all these records into a common data format
and preprocess them with a causal fourth-order Butterworth
high-pass filter with a corner frequency of 0.075 Hz. We
integrate accelerograms to velocity and then run all traces
through a short-term average/long-term average
P
-phase
picker (e.g.,
Baer and Kradolfer, 1987
). We subsequently op-
timize the obtained picks with the SBPx algorithm (see
Data
and Resources
), which maximizes a ratio of integrated
weighted amplitudes before and after a number of candidate
picks. We perform several automated quality checks to as-
sure high data quality. In particular, we discard the following
records: (1) records with signal-to-noise ratios
<
20
dB,
(2) records without all three components, (3) records with
peak ground velocity (
PGV
) that deviate by
>
3
σ
from the
ground-motion prediction equations (
GMPE
s) of
Cua (2005)
,
(4) records with sampling rates
<
100
samples
=
s, and (5) re-
cords that might be clipped based on the maximum reported
counts. In this study, we have treated the waveforms from all
regions equally. The resulting waveform data set consists of
64,460 records, with three-component traces each, from a
total of 5852 events (Fig.
1
). A summary of the data set is
given in Table
1
.
Filter Bank
We then pass all 193,380 (

3
×
64
;
460
) traces of the
data set through a filter bank. The filter bank consists of nine
separate one-octave-wide causal fourth-order Butterworth
band-pass filters with corner frequencies between 0.09375
and 48 Hz (Fig.
2a
). The filters are applied in the time domain
and can be run in real time on a sample-by-sample basis. The
output
y
k
of a Butterworth filter
h

a
n
;b
n

with input wave-
form
x
k
is a linear combination of the
N

1
previous input
and the
N
previous output samples:
y
k

X
N
n

0
a
n
x
k
n
X
N
n

1
b
n
y
k
n
;

1

in which
k
is the sample index;
a
n
and
b
n
are the filter coef-
ficients; and
N
is the filter order, which is
N

4
in this case.
Because Butterworth filters are minimum-phase-delay
filters, they are optimal for
EEW
purposes. Low frequencies
are delayed more than high frequencies (Fig.
2b
), but this
relative delay corresponds to the minimum that is possible in
any causal operation. This way the information from each
frequency band becomes available at the earliest possible
point in time.
For each velocity waveform trace of the data set, we obtain
nine band-pass-filtered seismograms, on each of which we
measure peak absolute amplitudes as a function of time since
the
P
-wave onset. We term these maximum absolute ampli-
tudes
narrowband peak ground velocities,
PGV
as
PGV
nb
j

t

,
in which subscript
j
denotes the filter band (
j

1
;
...
;
9
). For
each trace, we update the
PGV
nb
j

t

in 0.5 s intervals following
the trigger and save two
PGV
nb
j

t

values per station per time
step
one for the combined horizontal components (arithmetic
mean) and one for the vertical component. The respective hori-
zontal peak values for each individual component need not oc-
cur on the same sample
k
.
Real-Time Parameter Inference
Based on a set of real-time observations
PGV
nb
j

t

from
some individual target waveform, we then want to jointly
estimate the most probable parameter values for magnitude
M
and source
station distance
R
. We do this with a memory-
020406080100
2
3
4
5
6
7
8
Hypocentral Distance [km]
Catalog Magnitude
kNet & kikNet
NGA West 1
SCSN
Figure
1.
Catalog magnitudes and hypocentral distances of the
64,460 three-component records from a total of 5852 events.
Table 1
Number of Three Component Records from Japanese Strong-Motion Data (K-Net and KiK-
net), Next Generation Attenuation-West 1 (NGA-West 1), and the Southern California Seismic
Network (SCSN) in Various Distance and Magnitude Ranges
K-Net and KiK-net
NGA-West 1
SCSN
R<
25
km
R
25
km
R<
25
km
R
25
km
R<
25
km
R
25
km
7
M<
8
16
276
11
94
0
5
6
M<
7
94
1484
15
556
0
2
5
M<
6
433
5792
16
152
87
444
4
M<
5
0
0
12
55
569
5551
3
M<
4
0
0
0
0
2995
31,120
2
M<
3
0
0
0
0
4569
10,112
2776
M.-A. Meier, T. Heaton, and J. Clinton
based empirical approach. We term
target trace
the wave-
form for which we want to estimate the parameters; for each
target trace, we use the
PGV
nb
j

t

values of all other traces in
the data set as training data, under exclusion of all traces that
belong to the same event as the target trace.
Bayesian Problem Statement
Let
p

M; R
j
PGV
nb
j

t

denote the probability that an
earthquake of magnitude
M
at distance
R
was the source
of an observed vector of narrowband peak ground velocities,
PGV
nb
j

t

; this probability density function (
PDF
) is usually
called the Bayesian posterior. Bayes theorem tells us that the
joint posterior
PDF
for the two parameters is given by
p

M; R
j
PGV
nb
j

t
 
p

PGV
nb
j

t
j
M; R

p

M; R

p

PGV
nb
j

t

p

PGV
nb
j

t
j
M; R

p

M; R

;

2

in which
p

PGV
nb
j

t
j
M; R

is the probability of the ob-
served
PGV
nb
j

t

, given the magnitude and the distance; this
PDF
is usually called the Bayesian likelihood.
p

M; R

is a
probability model for seismicity as a function of magnitude
and distance; this
PDF
is usually called the Bayesian prior.
For the case of a single station observing seismicity with ep-
icentral probabilities that are uniformly distributed in space
and with magnitudes that are described by the Gutenberg
Richter relation (
Cua, 2005
):
p

M; R

p

M

p

R

R
10
bM
:

3

The
R
term in equation
(3)
simply that reflects the area in a
ring of width
dR
grows as
RdR
. Equations
(2)
and
(3)
tell us
that, given the observed ground-motion data
PGV
nb
j

t

, the
most probable magnitude and distance (the posterior) is dif-
ferent from the magnitude and distance of an event that pro-
duces motions that best match
PGV
nb
j

t

(the likelihood).
To clarify, let us assume a case in which the location of an
event is already known and that all we want to do is determine
the magnitude. Suppose we recorded
PGV
nb
j

t

and this is the
expected ground motion for an event having magnitude
M

t

.
That is, assume
d
PGV
nb
j

t
j
max
p

PGV
nb
j

t
j
M

t

.Forex-
ample, suppose we measured an amplitude of
100
mm
=
sand
that, at this distance, this is the amplitude expected to be pro-
ducedbyan
M
6.0 event, according to the likelihood function.
Given that we have just recorded this
100
mm
=
s data, equa-
tions
(2)
and
(3)
tell us that the most probable magnitude is
<
6
:
0
, say 5.6. That is because there is scatter in the relation-
ship between magnitude and shaking amplitude, and smaller
events are far more frequent than larger ones. In our imaginary
case, if we measured
100
mm
=
s, then it is more likely that it is
a larger than expected ground motion from one of several
M
5.6 s than it is an average record from an
M
6.0 that occurs
less frequently.
The distance prior and the magnitude priors work in op-
posite directions. The magnitude prior tells us that, given the
shaking, a most probable magnitude is less than is predicted
by a
GMPE
because small earthquakes are more numerous.
The distance prior, on the other hand, implies that, given a
recorded ground motion, the event is likely to be more distant
than would be derived from a
GMPE
(just simple geometry);
and, therefore, the magnitude is likely to be higher than what
is expected from a
GMPE
.
In this study, we estimate the probabilities of the param-
eters
M
and
R
in an empirical way, based on a data set in
which the data are
to first order
naturally distributed ac-
cording to the prior distributions in equation
(3)
(Table
1
). As
we will see in the following section, these empirical proba-
bility estimates directly correspond to Bayesian posterior
probabilities rather than Bayesian likelihoods.
Magnitude Estimation
We illustrate the parameter inference method in Figure
3
,
using a near-source record of the 2014
M
w
6.0 South Napa
0
2
4
6
8
10
Dirac Input Signal
0.4 − 0.8 Hz
0
2
4
6
8
10
12
14
16
18
Time [s]
0.1 − 0.2 Hz
0.2 − 0.4 Hz
0.8 − 1.5 Hz
1.5 − 3 Hz
24 − 48 Hz
12 − 24 Hz
3 − 6 Hz
6 − 12 Hz
(b)
10
−1
10
0
10
1
10
2
0.1
0.707
1
Frequency [Hz]
Amplitude
(a)
24 − 48 Hz
12 − 24 Hz
6 − 12 Hz
3 − 6 Hz
1.5 − 3 Hz
0.75 − 1.5 Hz
0.375 − 0.75 Hz
0.1875 − 0.375 Hz
0.09375 − 0.1875 Hz
Envelope
Figure
2.
(a) Frequency and (b) impulse responses of the nine filter bank passbands.
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude Estimates for EEW
2777
earthquake (hypocentral distance 16 km) as an example tar-
get trace. We detect the
P
arrival (
pick
) for the target trace
with a real-time picking algorithm. At any given time
t
after
the pick we observe
PGV
nb
target
;j

t

(Fig.
3a
). We then scan the
training data set to identify the
n
traces that at time
t
had the
most similar
PGV
nb

t

values (Fig.
3b
);
n
is a free parameter,
and we use
n

30
. We define similarity as the squared dif-
ference between the log amplitudes of target and training
traces, summed over the
j

1
;
...
;f
frequency bands:
Δ
PGV
nb
i

t

X
f
j

1

log
10

PGV
nb
target
;j

t

log
10

PGV
nb
i;j

t

2
;

4

in which
i
is the trace index of the training data set. We then
use the magnitudes
M
tr
and hypocentral distances
R
tr
of the
identified most-similar training traces (Fig.
3c
) to construct the
probability function for the
EEW
parameters of the target rec-
ord. Note that we identify the
n
most similar traces for vertical
and horizontal traces separately;
M
tr
and
R
tr
are hence vectors
of lengths
2
n
. We approximate the distribution of the resulting
set

M
tr
;R
tr

with a coarsely discretized bivariate Gaussian
density function
p

M;
log
10

R
j
PGV
nb
target
;j

t

N

μ
;
Σ

,
in which
μ
h
M
tr
i
;
h
log
10

R
tr
i
T
are the mean magnitudes
and logarithmic distances of the set and
Σ
is the corresponding
covariance matrix (Fig.
3d
).
We use the resulting
p

M;
log
10

R
j
PGV
nb

t

as the
empirical
PDF
of the
EEW
parameters of the target record
and maximize it to find the maximum
a posteriori
estimates
^
M
map
and
^
R
map
. Furthermore, we compute univariable mar-
ginal probability functions
p
m

M

t

and
p
m

log
10

R

t

by numerically integrating out either the distance or the mag-
nitude parameter from the bivariate
PDF
. The maxima of the
bivariate and of the marginal
PDF
s are equivalent.
In summary, we are asking the question: Given a re-
corded
PGV
nb
j

t

, what is the most frequent

M; R

combi-
nation in a catalog of records with similar
PGV
nb
j

t

values?
12 − 24Hz
1.5 − 3Hz
2
4
6
8
10
12
0.4 − 0.8Hz
Time since origin time [s]
Frequency bands [Hz]
24−48
6−12
1.5−3
0.4−0.8
0.1−0.2
10
−6
10
−4
10
−2
10
0
PGV
nb
[m/s]
5
10
20
50
100
2
3
4
5
6
7
8
Hypocentral Distance [km]
Magnitude
0
0.2
0.4
5
10
20
50
5
10
20
50
100
0
0.2
0.4
p
m
(R)
Hypocentral Distance [km]
MAP
MAP wo/ distance constraint
Catalog values
Relative log−posterior
−4
−3
−2
−1
0
(a)
Filtered velocity
(b)
(c)
(f)
(d)
2
3
4
5
6
7
8
Magnitude
3
4
5
6
7
8
Magnitude
p
m
(M)
(e)
PGV
nb
of 30 most similar traces
PGV
nb
of target trace
Figure
3.
Parameter estimation scheme with the example of the vertical component of a near-source record (16 km hypocentral distance)
of the 2014
M
w
6.0 South Napa earthquake, using 1 s of data: (a) three of the nine filtered velocity waveforms and peak amplitudes (
PGV
nb
j
,
red crosses) since the
P
-wave onset. The red and blue lines give the start and end of the used time window. (b)
PGV
nb
j
of the South Napa record
in all nine frequency bands (red crosses) and
PGV
nb
j
values of the 30 most similar records in the data set (thin lines). (c) Magnitudes and
hypocentral distances of those most-similar traces (black circles), earthquake catalog parameter values (blue lines), and most probable param-
eter estimates (maximum
a posteriori
[MAP]) obtained using only
PGV
nb
values (red diamond) and using an additional distance constraint
(red star). (d) Bivariate relative log-posterior probability density function (
PDF
) fitted to the parameter values in (c). (e,f) Marginal distribution
functions for magnitude and hypocentral distance, respectively, and MAP estimates (red stars). The dashed red line in (f) shows the additional
distance constraint that was constructed as described in the
Enhancing Magnitude Estimates with Additional Parameter Constraints
section.
2778
M.-A. Meier, T. Heaton, and J. Clinton
If the catalog contained data that were distributed uniformly
in magnitude and distance (i.e.,
p

M; R

constant), then,
according to equation
(2)
, our procedure would reveal the
likelihood function. That is, in this unrealistic case,
p

M; R
j
PGV
nb
j

t
 
p

PGV
nb
j

t
j
M; R

. Our procedure
would then be analogous to determining a
GMPE
for our fil-
ter bank values. However, because we applied only minimum
selection criteria for compiling the data set, such as maxi-
mum distance and minimum magnitude cut-offs, the data are
approximately represented according to their natural distri-
bution; as a consequence, the priors in equation
(3)
are an
integral part of the estimation procedure. In terms of the ex-
ample from the previous section in which the
100
mm
=
s ob-
served ground-motion velocity was more likely from above-
average records of one of the several
M
5.6 events than from
an average record of an
M
6.0 event, when we search the data
set for the
n
most similar records, we are more likely to find
those above-average
M
5.6 records because there are more of
them in the data set than
M
6.0 records. Likewise, records
with large source
station distances are naturally overrepre-
sented in the data set relative to records with short distances,
in agreement with equation
(3)
. Our empirical parameter es-
timation procedure therefore implicitly considers both the
magnitude and the distance prior distributions and directly
returns the desired posterior distribution function.
Combining Magnitude Estimates from Multiple
Stations
Combining magnitude estimates from multiple stations is
straightforward because of the probabilistic formulation of the
algorithm: the joint marginal magnitude probability function is
obtained by multiplying single-station marginal probabilities:
p
k
m

M; t

Y
k
s

1
p
s
m

M; t

;

5

in which
p
s
m

M; t

is the marginal magnitude probability func-
tion of the
s
th out of
k
stations. Stations that have longer snip-
pets of available data because they are closer to the epicenter
will tend to have narrower marginal probability functions and
therefore have stronger influence on the joint maximum
a pos-
teriori
estimate
^
M
map
.
Enhancing Magnitude Estimates with Additional
Parameter Constraints
The obtained magnitude estimates
^
M
map
are entirely
based on the transient frequency content of the waveforms
and do not require a hypocenter location. However, if other
independent constraints on either of the two parameters are
available, they can be combined with the estimates in much
the same way that we combine estimates from multiple sta-
tions (equation
5
). There are numerous candidate constraints
that could be used in this sense: parameter estimates from
other
EEW
algorithms that run in parallel, distance estimates
from a real-time location algorithm (e.g., based on the con-
cept of not-yet-arrived data;
Zhou, 1994
;
Cua and Heaton,
2007
;
Satriano
et al.
, 2008
),
S
P
onset times
t
S
P
, the wave-
form-based distance estimation method of
Odaka
et al.
(2003)
, or distance priors from seismicity observations (e.g.,
Kagan and Knopoff, 1981
;
Meier
et al.
, 2014
) and from
proximity to mapped faults. Because distance and magnitude
uncertainties are inherently coupled, any constraint that re-
duces the distance uncertainty will also reduce magnitude
uncertainties. A schematic illustration of the parameter infer-
ence framework with multiple stations, additional indepen-
dent constraints, and prior information is shown in Figure
4
.
In a full
EEW
system, increasingly well-constrained dis-
tance estimates would become available over time through a
real-time event location algorithm, helping to reduce magni-
tude uncertainties. For this study, however, we neither com-
puted such additional distance constraints nor relocated the
processed events. Instead, we simulate the distance con-
straints and show what effect they have on the magnitude
estimations. All results from this study are shown with and
without using such additional constraints.
To imitate realistic distance constraints, we construct a
distance
PDF
p

log
10

R

by perturbing the catalog hypocen-
tral distance of the target record with a perturbation
Δ
R
that is randomly sampled from a normal distribution. The
width of the distribution depends on the number of triggered
stations: for magnitude estimates of which less than three
triggered stations are available, we use
σ
R

20
km. If three
or more stations are used for the estimation, we set
σ
R

10
km. The distance
PDF
p

R

is then constructed as
p

R

N

μ
R
;
σ
2
R

, in which
μ
R

R
catalog

Δ
R
is the per-
turbed center of the distribution. We transform this function
p

R

into the log domain to obtain
p

log
10

R

, multiply it
with the bivariate parameter
PDF
(Fig.
3d
), and use the result-
ing updated bivariate
PDF
to estimate
^
M
map

t

via the marginal
magnitude
PDF
. The chosen values for
σ
R
of 20 and 10 km
represent conservative estimates; they were chosen to avoid
artificially increasing the algorithm performance.
Performance of the Real-Time Parameter Inference
To evaluate the performance of the algorithm at different
stages of an ongoing event, we define the time-dependent
magnitude residual
M
rsd

t

M
catalog
^
M
map

t

, in which
^
M
map

t

is the time-dependent maximum
a posteriori
mag-
nitude estimate and
M
catalog
is the reported catalog magni-
tude. In the following section, we analyze these residuals
and how they change as increasing amounts of data become
available over time.
Parameter Inference with Data from 1 Station
The first parameter estimation can be made as soon as the
first station has recorded 0.5 s of data since the onset of the
direct
P
phase. The left column of Figure
5a,c,e
,and
g
shows
M
rsd

t

from single-station parameter estimates for all 64,460
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude Estimates for EEW
2779
records of the data set as a function of catalog magnitude for
t

0
:
5
, 1, 3, and 10 s after the
P
onset. For the entire data set,
the mean residual is
0
:
03

0
:
51
with 1 s of data from one
station, indicating that information from the first second of
a single record is diagnostic of the event magnitude. However,
there is a clear magnitude saturation for events with
M>
6
,
and records with
M<
3
have a bias toward magnitude over-
estimation. As the record length increases, the distribution of
M
rsd
becomes only slightly narrower (
0
:
01

0
:
47
at 3 s), and
the underestimation of large events persists. The overestima-
tion at low magnitudes is produced by a set of records with
very short hypocentral distances. For such records, the algo-
3
4
5
6
7
8
0
0.2
0.4
Magnitude
5
10
20
50
2
3
4
5
6
7
8
Magnitude
5
10
20
50
100
0
0.2
0.4
L
p
(R)
Hypocentral Distance [km]
L
p
L
p
p
m
(M,t)
3
2
3
4
5
6
7
8
0
0.2
0.4
Magnitude
5
10
20
50
2
3
4
5
6
7
8
Magnitude
5
10
20
50
100
0
0.2
0.4
L
p
(R)
Hypocentral Distance [km]
p
m
(M,t)
2
2
3
4
5
6
7
8
0
0.2
0.4
p
m
(M,t)
Magnitude
5
10
20
50
2
3
4
5
6
7
8
Magnitude
5
10
20
50
100
0
0.2
0.4
p
m
(R,t)
Hypocentral Distance [km]
1
1
R
Location pdf
p
m
(R,t)
1
Location prior
3
4
5
6
7
8
0.4
Magnitude
00.2
p
m
(M,t)
Joint marginal posterior
2
nd
Station
1
st
Station
3
rd
Station
Before hypocenter
location is available
Epicenter Location
Magnitude Estimation
After hypocenter
location is available
(a)
(b)
(c)
(d)
Figure
4.
Schematic illustration of how the parameter estimates of the Gutenberg algorithm (
GBA
) can be embedded in a full probabilistic
earthquake early warning system. The gray arrows represent information flow. (a) For each triggered station, the
GBA
produces a magnitude
and a distance
PDF
. (b) The magnitude
PDF
s can be combined to obtain the joint marginal posterior magnitude
PDF
. (c) Before an independent
hypocenter location estimate (star) is available, the
GBA
distance
PDF
s may be used for a rough early location estimate. As soon as an early
standard location estimate is available, the corresponding source
station distances can in turn be used to update the bivariate parameter
PDF
and thereby improve the magnitude estimates. At any stage, prior information (e.g., based on fault proximity models [d]) may be used to
improve the estimates.
2780
M.-A. Meier, T. Heaton, and J. Clinton
rithm tends to overestimate the distance and, as a conse-
quence, overestimates the magnitude. Although the overall
scatter is relatively low, there are substantial numbers of re-
cords with high absolute residuals of
>
1
magnitude unit. A
similar behavior has also been reported for the single-station
onsite algorithm (
Kanamori, 2005
). Such misclassifications
could translate into false alerts in an operational
EEW
system.
The saturation magnitude increases with increasing rec-
ord length. The saturation reflects the fact that the
PGV
nb
values of events with
M>
6
:
5
at, say, 3 s are not significantly
higher than those of an
M
6
:
5
event. It is only when much
longer time windows are available that the algorithm can ac-
curately classify
M
7+ records. For events with
M>
7
:
5
,
even 10 s are not long enough, suggesting that the rupture
process is still ongoing for events of such high magnitudes.
If we assume that additional distance constraints are
available (compare with the
Enhancing Magnitude Estimates
with Additional Parameter Constraints
section), the bias is-
sues exhibited by the estimates solely based on
PGV
nb

t

are
reduced (right column in Fig.
5b,d,f,h
), and the overall dis-
tribution becomes narrower (
0
:
05

0
:
41
with 3 s of data);
here we used
σ
R

20
km for simulating the independent
distance constraint.
Parameter Inference with Data from 2 to
n
Stations
In densely instrumented areas, multiple stations typi-
cally record the
P
-wave onset before the first station has
recorded more than 4 s (
Kuyuk and Allen, 2013
). A majority
of estimates in Figure
5
therefore, in fact, are not relevant for
EEW
because at the time they become available, more proxi-
mal stations will already have triggered the estimations. In
Figure
6
, we present the
M
rsd

t

as they would be computed
in a real-time
EEW
system, using only the most proximal re-
cords for each earthquake. At first only the nearest station to
the epicenter provides data until, over time, data from addi-
tional stations become available and influence the estimates.
We compute
M
rsd

t

at the instant when the first, second,
Figure
5.
Magnitude residuals from single-station estimates for all 64,460 records of the data set using 0.5, 1, 3, and 10 s of data based on
PGV
nb

t

values, without additional distance constraints (left column) and with additional distance constraints (right column). The color
represents the catalog hypocentral distance for each record. Because of the large number of data points, many points are overprinted. The
white lines give the 16th and the 84th percentiles (
1
σ
) evaluated in 0.2 magnitude unit bins to give a more concise impression of the dis-
tribution.
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude Estimates for EEW
2781
third, and tenth stations have recorded 1 s of data following
the
P
onset. From the more proximal stations, we use what-
ever data are available at that instant. For the multistation
estimates, we use equation
(5)
to obtain the joint marginal
posterior magnitude distribution and
^
M
map

t

.
Figure
6
reveals that with the most proximal records
alone, only a small number of events are strongly misclassi-
fied. Using 1 s of data from the second station (Fig.
6c
), 4.6%
of events have absolute magnitude residuals
j
M
rsd
j
>
1
,a
majority of which are events with catalog magnitudes
M
catalog
<
3
. Furthermore, combining two stations slightly re-
duces the scatter to
0
:
28

0
:
43
magnitude units (using 1 s
on the second station). Adding the independent distance
constraints for both stations decreases the scatter to
M
rsd
to
0
:
18

0
:
36
. Although the average bias was negligible
for all single-station estimates (Fig.
5
), the bias for the proxi-
mal records lies between
∼−
0
:
1
and
∼−
0
:
35
when small
numbers of stations are used. This is caused by the biased
M<
3
events, which have higher relative weight in this
statistic than in Figure
5
, because the numerous (unbiased)
far-source records of the higher magnitude events are not in-
cluded here. For the records with
M
3
, the mean residuals
are 0.1 and 0.07 with data from the second and third stations,
respectively.
As soon as data from two stations are available, the dis-
tance estimates
^
R
map
obtained from maximizing the marginal
distance
PDF
s could be used to roughly constrain the epicen-
ter. Because the
PGV
nb

t

values are independent of the
P
-on-
set times, the four observations (two
PGV
nb
-based distance
estimates and two
P
-onset times) might be enough for an early
location estimate. However, this exercise has not yet been con-
ducted and would go beyond the scope of this article.
Once three or more stations have triggered, the
EEW
parameters
M
and
R
become increasingly better constrained.
Furthermore, at this stage, enough
P
-phase arrival times are
available to make rough estimations of the epicenter using
an equal differential time norm location scheme (
Zhou,
1994
;
Satriano
et al.
, 2008
). If we include the additional
Figure
6.
Magnitude residuals for all 5852 events of the data set when the first, second, third, and tenth stations have recorded 1 s since
the
P
-wave onset, based solely on the observed
PGV
nb
values (left column) and using an additional distance constraint (right column).
Histograms show the distribution of residuals over the entire magnitude range. The color indicates the amount of data that was available
on the first station at the time of parameter estimation. The white lines give the 16th and the 84th percentiles in 0.2 magnitude unit bins. The
events are not shown if their total number of records is smaller than the number of records used in a subfigure. For example, there are only
three records of the
M
w
7.9 Denali, Alaska, earthquake, so this event no longer appears on subplots (g) and (h).
2782
M.-A. Meier, T. Heaton, and J. Clinton
distance constraint using
σ
R

10
km (right column of
Fig.
6b,d,f,h
), the residual distribution is very narrow
(
0
:
08

0
:
29
with 1 s of data from the third station), the
overestimation of small-magnitude events with near-source
recordings becomes negligible, and the number of strongly
misclassified events amounts to only
0
:
6%
of cases. The
underestimation of large magnitudes, however, persists even
when the location is well constrained.
Discussion
To provide
EEW
alerts at proximal sites to the epicenter,
alerts need to be based on less data from fewer stations than
what can be used for alerts at more distant target sites. How-
ever, using less data is expected to translate into larger
parameter uncertainties. The Gutenberg magnitude estima-
tion algorithm minimizes these uncertainties by maximally
exploiting the available data, and it systematically quantifies
them.
The algorithm is based on the principle that ground-mo-
tion amplitudes at different frequencies are, to first order,
governed by the event magnitude and the source
station dis-
tance, which are the two key parameters needed for
EEW
.
Consequently, observed amplitudes can in turn be used to
infer these parameters. In any individual frequency band,
there is a trade-off between the two
EEW
parameters in that
an observed amplitude can be caused by a large event at a
large distance or by a comparatively smaller event at a
smaller distance. The
GBA
resolves this trade-off by jointly
inferring the two parameters from multiple frequency bands:
high (relative) amplitudes at high frequencies are only ob-
served if the source
station distance is short; high (relative)
amplitudes at low frequencies are only observed if the mag-
nitude is high. By requiring similarity of amplitudes in all
frequency bands (equation
4
), the algorithm finds the unique
distance
magnitude combination that is in agreement with
the observations across all frequencies. We chose the name
Gutenberg algorithm
because the practice of inferring
earthquake location and magnitudes from different frequency
bands of a single record resembles the habit of the late Beno
Gutenberg, who would use numerous recording instruments
from the same site in Pasadena, each of which was sensitive
to a specific frequency range, to analyze teleseismic events
from around the globe. The algorithm relies only on the fre-
quency content of the real-time waveforms. It does not re-
quire additional information such as an epicenter location
estimate. Because of its probabilistic formulation, however,
any form of independent constraint on event magnitudes and
source
station distances is readily incorporated. The prob-
abilistic formulation furthermore facilitates the natural evo-
lution from an onsite- to a regional-type algorithm. Event
characterizations are started as soon as the first station has
triggered and are then perpetually updated as more informa-
tion becomes available. The continuous uncertainty quanti-
fications that come with every parameter estimate allow end
users to decide whether or not a certain damage mitigation
action should be initiated.
Although this parameter inference scheme requires more
computational effort than a
τ
max
p
-or
P
d
-based inversion, it is
feasible in real time. The processing of 1 s of a single wave-
form through the filter bank takes on average
5
×
10
4
sona
MacBook Pro laptop with a 2.5 GHz Core i7 processor. With
a training data set of
190
;
000
preprocessed training wave-
forms, the parameter estimation with two horizontal and one
vertical target traces takes 0.02 s on the same processor, that
is, updates at a rate of 0.5 s are possible.
The performance of the algorithm depends mainly on
the amount of data available at a given time and on the mag-
nitude of the target event. Over the entire magnitude range,
the parameter estimates reach higher precision at earlier
times than existing algorithms, although for most algorithms
only a little information on estimation uncertainties is typi-
cally published.
Brown
et al.
(2011)
provide detailed uncer-
tainty estimations and report similar residual distributions for
the single-station estimates (
0
:
1

0
:
5
using 3 s of data
from the first station, versus
0
:
01

0
:
47
from the
GBA
).
When information from multiple stations are combined,
the residuals of
Brown
et al.
(2011)
are not reduced substan-
tially (e.g.,
0
:
1

0
:
5
using 3 s of data from the first four
stations), whereas the corresponding Gutenberg estimation
residuals reduce to
0
:
04

0
:
25
with the same amount
of data (not shown). However, the studies are not directly
comparable because they used different data sets.
For all estimations, a distinct magnitude saturation is ob-
served, in agreement with what has been reported by
Ryde-
lek
et al.
(2007)
. The onset of magnitude saturation depends
on the time-window length that was used for the estimation.
For windows
<
4
s, saturation starts at
M
6
:
5
. Higher mag-
nitudes are estimated accurately only when longer time
windows are used. We interpret this as an indication that
amplitude-based
EEW
magnitude estimates are not in fact
predictive, but observational. A magnitude can be recognized
(and accurately estimated) if the maximum of the source time
function has already been reached and is mapped into the
portion of the seismogram that is used to make the estima-
tion. This is an important observation in the context of an
ongoing debate on whether earthquake rupture is a determin-
istic process (
Ellsworth and Beroza, 1995
;
Olson and Allen,
2005
;
Colombelli
et al.
, 2014
) or not (
Rydelek and Horiuchi,
2006
;
Rydelek
et al.
, 2007
). The results from this study
support the nondeterministic case. This has important conse-
quences for
EEW
in that, depending on the length of the time
windows used, high magnitude estimates have to be consid-
ered minimum estimates until more data become available.
The upside is that whether or not an estimate is a minimum
estimate can be evaluated from the lengths of the available
data snippets.
Furthermore, for events with
M
above the saturation
magnitude of
6
:
5
,
EEW
algorithms are faced with another
issue: the finiteness of the source mechanism as well as the
complexities of slip distributions become important for
The Gutenberg Algorithm: Evolutionary Bayesian Magnitude Estimates for EEW
2783