DeepGEM: Generalized Expectation-Maximization
for Blind Inversion
Angela F. Gao
1
Jorge C. Castellanos
2
Yisong Yue
1
Zachary E. Ross
2
Katherine L. Bouman
1
1
Computing and Mathematical Sciences, California Institute of Technology
2
Caltech Seismological Laboratory, California Institute of Technology
{afgao, jcastellanos, yyue, zross, klbouman} @ caltech.edu
Abstract
Typically, inversion algorithms assume that a forward model, which relates a
source to its resulting measurements, is known and fixed. Using collected indirect
measurements and the forward model, the goal becomes to recover the source.
When the forward model is unknown, or imperfect, artifacts due to model mis-
match occur in the recovery of the source. In this paper, we study the problem
of blind inversion: solving an inverse problem with unknown or imperfect knowl-
edge of the forward model parameters. We propose DeepGEM, a variational
Expectation-Maximization (EM) framework that can be used to solve for the un-
known parameters of the forward model in an unsupervised manner. DeepGEM
makes use of a normalizing flow generative network to efficiently capture complex
posterior distributions, which leads to more accurate evaluation of the source’s
posterior distribution used in EM. We showcase the effectiveness of our DeepGEM
approach by achieving strong performance on the challenging problem of blind
seismic tomography, where we significantly outperform the standard method used
in seismology. We also demonstrate the generality of DeepGEM by applying it to
a simple case of blind deconvolution.
1 Introduction
Physics-based inversion methods typically recover an unknown source from indirect measurements
by assuming that the source and measurements are related via a known forward model [
15
,
7
,
30
]. For
example, non-blind deconvolution algorithms often assume that a measured blurry image is related to
its true sharp image via a known spatially-invariant blur kernel [
9
]; and traditional seismic inversion
methods assume that the spatially-varying velocity of the Earth’s interior is known
a priori
when
solving for an earthquake’s hypocenter [
30
]. However, these “known” forward models are generally
idealized and ignore intricacies of the systems that are either hard to capture or simply unknown.
Inversion algorithms with forward
model mismatch
result in biased reconstructions. For instance, bias
is regularly seen in non-blind deconvolution results, where reconstruction artifacts are often present
due to the use of an incorrect blur kernel [9].
In order to reduce the effects of model mismatch, in this paper we tackle the problem of
blind
inversion
: solving an inverse problem without knowledge of the underlying forward model parameters.
When considering how learning can help, a natural first idea might be to learn a direct map from
measurements to the desired source via supervised learning. However, such a “model-free” approach
is generally not practical, due to the lack of available ground truth training data. For example,
the blur kernel caused by handheld camera shake cannot be reproduced to get a training set of
35th Conference on Neural Information Processing Systems (NeurIPS 2021), .
sharp-blurry pairs to train a deconvolution approach. In seismic tomography, synthetic earthquakes
cannot be placed densely throughout the interior of the Earth to measure the ground response as a
function of hypocenter location. An alternative approach, which we adopt, is to develop unsupervised
methods that treat the true forward model as something that is unobserved and must be inferred.
An additional consideration is that we must solve for the true forward model from a single dataset,
without knowledge of the true source that produces the measurements
In this paper, we propose Deep Generalized Expectation-Maximization (DeepGEM) for solving
blind inverse problems. Using the indirect measurements as input, DeepGEM jointly estimates
the source and forward model that together produce the observed measurements. DeepGEM is a
variational inference based framework that makes use of deep learning machinery to easily capture and
optimize complex probabilistic distributions that cannot be easily integrated in analytic Expectation-
Maximization (EM) solutions. Our proposed framework is generic and can be applied to blind
inversion problems described by differentiable forward models. In Section 4 we showcase the
effectiveness of our DeepGEM approach on the challenging problem of blind seismic tomography,
where we significantly outperform methods used in seismology. We also demonstrate the generality
of DeepGEM by applying it to blind deconvolution in Section 5.
True Earth Velocity
Travel Time
Measurements
(true source
locs
.)
Baseline
Velocity Recon.
(recovered source
locs
.)
DeepGEM
Velocity Recon.
(recovered source
locs
.)
Velocity km/s
seconds
Velocity km/s
Velocity km/s
(a)
(b)
(c)
(d)
5 km
Figure 1:
DeepGEM applied to blind seismic tomography.
(a) A simulated cross section of the
Earth’s interior (velocity structure), along with the locations of receivers on the surface (red triangles)
that collect measurements. (b) The time it takes for a wave traveling from a source below the surface
to reach the specified receiver is visualized for each location in the region of interest. The overlaid
dots represent the true locations of simulated earthquakes and indicate the measured travel times
that constrain optimization. (c) The subsurface velocity reconstruction obtained using a baseline
approach optimized with the help of a seismologist. Note that the bright anomaly is missing from this
reconstruction. Overlaid dots represent the inferred earthquake locations. (d) DeepGEM reconstructed
subsurface velocity and inferred earthquake locations. Note that DeepGEM is able to accurately
recover the gradient of the velocity field as well as partially recover the central anomaly.
2 Background and related work
The joint optimization of a forward model with source recovery is a very challenging ill-posed
problem, leading to many possible solutions that are hard to disambiguate. For example, in blind
deconvolution the blurry image observed can be equivalently explained by a sharp source image
convolved with an extended kernel or a blurry source image convolved with a impulse kernel; to
prefer one solution over another, additional information, such as image priors, must be considered.
Previous work on inversion in poorly characterized systems focused on limited contexts, such as
spatially-invariant blind deconvolution [
21
,
22
,
14
,
10
] and CT with a simple rotational error [
35
].
These methods tend to be highly specialized to each application domain, and cannot be easily
generalized. In contrast, we propose a flexible model-based Bayesian framework that can be applied
across multiple differentiable blind inversion problems.
2.1 Model-based Bayesian inversion
In model-based inversion, unobserved sources
x
and observed measurements
y
are related through
a forward model:
y
=
f
(
x
)
. When the parameters of the underlying model are unknown, one can
parameterize the forward model as
f
θ
(
x
)
and then solve for the true model parameters
θ
∗
. A common
approach is to solve a
maximum a posteriori
(MAP) objective: either MAP
θ,x
or MAP
θ
.
2
MAP
θ,x
solves for the optimal point estimate of the pair
{
ˆ
θ,
ˆ
x
}
that maximizes a joint objective:
{
ˆ
θ,
ˆ
x
}
= arg max
θ,x
[log
p
(
θ,x
|
y
)] = arg max
θ,x
[log
p
(
y
|
θ,x
) + log
p
(
θ
) + log
p
(
x
)]
.
(1)
In practice, formal probabilistic definitions of
p
(
θ
)
and
p
(
x
)
are often unknown and replaced with
regularization terms (e.g., total variation, sparsity) [
22
,
21
]. Although
MAP
θ,x
provides a straightfor-
ward approach to solve for
θ
, the energy landscape of Eq. 1 is typically poorly behaved due to the
ill-posed nature of the problem [
22
]. As a result, optimization is likely to get stuck in a (bad) local
minimum.
MAP
θ
attempts to smooth the energy landscape by reducing the number of parameters that must be
optimized. This is done by solving for parameters of the forward model,
θ
, that perform best under
the full volume of possible
x
interpretations:
ˆ
θ
= arg max
θ
[log
p
(
θ
|
y
)] = arg max
θ
[
log
(
∫
x
p
(
y
|
θ,x
)
p
(
x
)
dx
)
+ log
p
(
θ
)
]
.
(2)
Since this marginalization integral is often intractable, Expectation-Maximization (EM) algorithms
have long been used for solving
MAP
θ
efficiently [
11
]. EM is an iterative algorithm that alternates
between: “E”-Step) calculating the posterior of
x
conditioned on the current estimated forward model
parameters
θ
(
t
−
1)
; and M-Step) updating
θ
to maximize the expected value of the log likelihood:
θ
(
t
)
= arg max
θ
[
E
x
∼
p
(
x
|
y,θ
(
t
−
1)
)
[log
p
(
y
|
θ,x
)] + log
p
(
θ
)
]
.
(3)
The advantage of
MAP
θ
over
MAP
θ,x
optimization for the blind deconvolution problem was de-
scribed in [
22
] and its success demonstrated via EM optimization in [
21
]. However, it is important
to note that evaluating the expectation in of Eq.
(3)
over complex distributions is often intractable.
For instance, the authors of [
21
] were forced to restrict the posterior distribution to a Gaussian distri-
bution. Alternatively, stochastic EM methods [
5
,
8
] bypasses the need to evaluate the expectation
directly, approximating it by sampling the distribution. In this paper, we solve
MAP
θ
using complex
distributions parameterized by deep neural networks.
Forward model parameterization
Model-based inversion requires that the parametric form of
f
θ
(
x
)
is well matched with the true forward model, which is not always known. Alternatively, neural
networks can be used to approximate the forward model, where
θ
represents the network weights.
This setup is flexible, in that it can be used to approximate arbitrarily complex forward models, with
the downside that it is often not interpretable when the parameters have no physical meaning. In
this work, we develop and make use of interpretable, physically-motivated deep neural networks to
parameterize
f
θ
(
·
)
for the problems of blind seismic tomography and blind deconvolution.
2.2 Deep variational distributions
We are interested in solving blind inverse problems using the EM algorithm to optimize the
MAP
θ
objective. Optimizing in this fashion requires the use of the posterior distribution
p
(
x
|
y,θ
(
t
−
1)
)
in
evaluating Eq. 3. As inverse problems are often ill-posed, we expect that the posterior distribution of
source
x
conditioned on the forward model parameters
θ
is likely to be complex and perhaps even
multi-modal. Therefore, it is important to be able to parameterize a flexible family of distributions to
best estimate this conditional posterior.
Deep Probabilistic Imaging (DPI) [
32
] uses a normalizing flow-based generative model,
G
φ
(
·
)
, to
solve for the uncertainty of an unknown source
x
given a fixed forward model
f
(
x
)
and measurements
y
. DPI solves the variational objective:
ˆ
φ
= arg min
φ
KL
(
q
φ
(
x
)
||
p
(
x
|
y
)) = arg min
φ
E
x
∼
q
φ
(
x
)
[
−
log
p
(
y
|
x
)
−
log
p
(
x
) + log
q
φ
(
x
)]
,
(4)
where
q
φ
(
x
)
is the implicit distribution defined by
G
φ
(
z
)
for
z
∈R
|
x
|
∼N
(0
,
1)
1
, and
log
p
(
y
|
x
)
∝
||
y
−
f
(
x
)
||
2
+
c
when there exists
i.i.d.
Gaussian noise on the measurements,
y
. After inference,
q
φ
(
x
)
can be efficiently sampled by evaluating
G
φ
(
z
)
for a Gaussian sample
z
.
The DPI variational objective is equivalent to the Variational Autoencoder [
17
,
28
] objective, except
with a fixed decoder,
f
(
x
)
. In practice, vanilla VAEs constrain the posterior to be a Gaussian
1
|
x
|
is the dimension of
x
3
distribution, relying on the reparameterization trick for tractable optimization. Alternatively, DPI
uses flow-based networks to efficiently sample and directly compute
q
φ
(
x
)
[
12
,
13
,
18
]. DPI’s use
of a flow-based network allows for complicated and multi-modal posterior distributions constrained
only by the space of possible invertible network architectures. Our proposed DeepGEM approach
utilizes similar tools to model flexible distributions over
x
, while simultaneously learning the forward
model parameters
θ
.
3 Methods
We propose a deep variational EM approach (DeepGEM) that optimizes the
MAP
θ
objective in Eq. 2
to recover the parameters of a forward model
f
θ
(
x
)
using only a single static set of measurements
y
.
Once learned, the updated forward model can then be used to estimate the posterior distribution of
the unknown source,
x
. DeepGEM iterates between two stages that are inspired by the standard EM
algorithm: (1) an “E”-step that learns a variational distribution,
q
φ
(
t
)
(
x
)
, to approximate the posterior
distribution of
x
given the current forward model parameters
θ
(
t
−
1)
, and (2) an M-step (refer to
Eq. 3) that solves for
θ
(
t
)
that maximizes the expected value of the log likelihood function of
θ
, with
respect to the posterior distribution
q
φ
(
t
)
(
x
)
estimated in the prior “E”-step. Each step is alternated
and solved to convergence. Since this is an EM algorithm (up to the variational approximation), our
method inherits properties of EM, including convergence to a local minimum of Eq. 2.
3.1 “Expectation” step (“E”-step)
The goal of DeepGEM’s “E”-step is to solve for the posterior distribution
p
(
x
|
y,θ
(
t
−
1)
)
that facilitates
optimizing Eq. 3. Because this posterior distribution can be very complex, and even multi-modal, we
propose to use a flexible variational approach to learn the parameters
φ
of an approximate posterior
distribution
q
φ
(
x
)
. The variational distribution
q
φ
(
x
)
can then be used to evaluate Eq. 3 via efficient
sampling.
Using DPI we solve for a flexible variational distribution,
q
φ
(
x
)
that well approximates the posterior
distribution
p
(
x
|
y,θ
(
t
−
1)
)
. DPI uses a normalizing flow network,
G
φ
(
z
)
, with input
z
∈
R
|
x
|
, where
x
=
G
φ
(
z
)
∼
q
φ
(
x
)
when
z
∼N
(0
,
1
)
. Normalizing flow networks allow for exact computation of
the log-likelihood
log
q
φ
(
x
)
needed to solve
φ
(
t
)
= arg min
φ
KL
(
q
φ
(
x
)
||
p
(
x
|
y,θ
(
t
−
1)
))
≈
arg min
φ
1
N
N
∑
n
=1
[
−
log
p
(
y
|
θ
(
t
−
1)
,x
n
)
−
log
p
(
x
n
) + log
q
φ
(
x
n
)]
for
x
n
=
G
φ
(
z
n
)
, z
n
∼N
(0
,
1
)
,
(5)
(as derived from Eq. 4) for a batch size of
N
where
log
p
(
x
)
is a prior on the source and
log
p
(
y
|
x,θ
(
t
)
)
is the data likelihood. When assuming the measurements
y
experience
i.i.d
additive Gaussian noise
with standard deviation
σ
y
,
log
p
(
y
|
θ
(
t
)
,x
n
) =
1
2
σ
2
y
‖
y
−
f
θ
(
t
)
(
x
n
))
‖
2
+
c
.
3.2 Maximization step (M-step)
The goal of DeepGEM’s M-step is to use the parameterized approximate posterior distribution,
q
φ
(
t
)
(
x
)
, from the prior “E”-step to update
θ
, the parameters of the unknown forward model
f
θ
(
·
)
.
This is achieved by sampling from the learned normalizing flow network,
G
φ
(
t
)
(
·
)
, to stochastically
solve Eq. 3 :
θ
(
t
)
≈
arg max
θ
[
1
N
N
∑
n
=1
[log
p
(
y
|
θ,x
n
)] + log
p
(
θ
)
]
for
x
n
=
G
φ
(
t
)
(
z
n
)
, z
n
∼N
(0
,
1
)
,
(6)
where
p
(
θ
)
is a prior on the forward model. This prior can be used to encourage the forward model
parameters to remain close to an initial model
̃
θ
by defining
log
p
(
θ
)
∝||
θ
−
̃
θ
||
2
+
c
.
4
4 Case study: blind seismic tomography
Two fundamental seismic inverse problems are spatially localizing an earthquake’s hypocenter (also
referred to as source localization) and tomographic reconstruction of the Earth’s subsurface [
30
].
These problems are interconnected: source localization relies on knowing how fast waves propagate
through different regions of the Earth’s interior, referred to as the Earth’s
velocity
structure. However,
in standard seismological practice, due to difficulties in solving these problems jointly, they are
generally treated separately: source localization is performed initially using oversimplified velocity
models [
30
], and then the tomography problem is performed by taking those best-fitting hypocenters
as ground truth [
26
]. This approach typically results in the need to over-regularize the inverse
problem by smoothing out high-frequency information [
2
], and can only be improved by carefully
incorporating other forms of information such as waveform-derived quantities [
16
,
23
,
4
,
24
] or
by performing costly experiments such as controlled explosions. In contrast, we demonstrate our
DeepGEM approach on blind seismic tomography, solving for the subsurface velocity (parameterized
by
θ
) when the source hypocenters,
x
, are unknown. Measurements,
y
, used to constrain the inverse
problem are the time it takes for the first wave to propagate from its source to a receiver on the Earth’s
surface, referred to as a
travel time
measurement (refer to Fig. 1).
4.1 Seismic tomography background
Physics of earthquake source localization:
The earthquake source location, also called the hypocen-
ter, is the location where the earthquake nucleates [
30
]. The source location can be triangulated using
travel times from multiple receivers. However, when there are very few receivers (
<
3
in 3D,
<
2
in
2D), the source localization is ill-posed and there exists multiple equally optimal solutions.
Physics of travel time tomography:
Travel time tomography is a method for reconstructing the
velocity structure using the absolute arrival times of earthquake waves from the earthquake to the
receiver [
30
]. Given perfect knowledge of the earthquake locations
x
and receivers
r
at every
position in the ground, the exact velocity can be computed by solving the Eikonal equation
V
(
r
) =
‖∇
r
T
(
x,r
)
‖
−
1
2
, where
T
(
x,r
)
is the travel time from an earthquake to a receiver. There exists an
analytical solution to the Eikonal equation; however, seismologists often use simplifications, such as
straight ray tomography, for efficiency [2].
Deep Learning for travel time tomography:
EikoNet [
31
] implicitly solves the Eikonal equation
[
33
] by learning a mapping from a source-receiver pair
(
x,r
)
to its associated travel time, learning
θ
such that
f
θ
(
x,r
)
≈
T
(
x,r
)
where
T
(
x,r
)
is the true travel time. The velocity structure can then be
extracted from the learned EikoNet by solving the Eikonal equation through automatic differentiation.
In particular, the computed velocity
V
(
s
)
at location
s
is
‖∇
s
f
θ
(
x,s
)
‖
−
1
2
. Note that this computed
velocity depends on the source location
x
. Ideally, the velocity should be invariant to the source
location; however, in practice, this is only true when EikoNet is trained with densely-sampled
(
x,r
)
pairs. Additionally, the travel time from a source to a receiver,
T
(
x,r
)
, should be identical to the
travel time from a receiver to a source,
T
(
r,x
)
, but remains unconstrained by EikoNet.
Homogenous
Gradient
Layer
DeepGEM
ℒ
Θ
Prior Models
Receiver(s)
True source
Prior mean of source
Posterior conditioned on Θ
Earthquake Source Localization Visualizations
(a)
(b)
10 km
Figure 2: (Left) A visualization of the homogeneous, gradient, and layer models used in
p
(
θ
)
’s
L
θ
term. (Right) Visualizations used to describe source configurations and earthquake source posteriors.
(a) Visualizations of the true and initialized source locations are plotted as stars (true) connected to
circles, which indicate the expected source locations according to
p
(
x
)
. Note that the expected source
locations deviate significantly from the true locations. (b) Visualizations of the learned posterior
distribution,
q
φ
(
x
|
y,θ
)
, for each source are plotted as colored histograms and overlaid with stars (of
the same color) indicating the true source location.
5
DeepGEM with Uncertain Source Locs.
No
ℒ
Θ
Prior
Homogenous
Gradient
Layer
Baselines
MAP
Θ, x
w/ Gradient Prior
Straight Ray
w/ Gradient
Init.
Known
Source
Locs
.
w/ No
ℒ
Θ
Prior
9 sources
25 sources
49 sources
True & Prior
Source
Locs
.
Figure 3:
DeepGEM reconstructions significantly outperform baselines, and improve with
more sources and better
L
θ
prior models.
Each row corresponds to the reconstructions obtained
from a single noisy observation of travel times, where the true Earth velocity is shown in Fig. 1(a).
Results shown are simulated using 20 surface receivers and a varying number of sources (9, 25,
and 49) in a uniform grid. Columns 3-6 show DeepGEM results obtained using different
L
θ
priors.
Note that results improve as the
L
θ
prior becomes closer to the true velocity structure, and as the
number of sources increases. As a reference, column 2 shows the velocity reconstruction obtained
using DeepGEM under fixed, perfectly known source locations. Columns 7-8 show results obtained
by the baseline approaches. The velocity reconstruction MSE is included in the top right of each
reconstruction. DeepGEM substantially outperforms both straight ray and
MAP
θ,x
baselines. Note
that DeepGEM also sometimes outperforms the second column, where the source locations are
known; this is due to the fact that DeepGEM is sometimes able to absorb measurement error due to
its overparameterization. See the supplemental material for more examples.
4.2 DeepGEM setup for blind seismic tomography
For blind seismic tomography, we parameterize the forward model using a modification of EikoNet,
f
θ
(
x,r
)
, with unknown source location
x
and known receiver location
r
as inputs and
y
≈
T
(
x,r
)
,
the absolute travel time between
x
and
r
, as output. In order to solve Eqs. 5 and 6 for an updated
forward model, we must define priors
p
(
x
)
and
p
(
θ
)
. The prior over source locations,
p
(
x
)
, is often
well defined, typically a Gaussian distribution
N
( ̄
x,σ
x
)
with a standard deviation of
σ
x
∼
2
km and
mean
̄
x
.
We construct a prior over the forward model,
p
(
θ
)
, that encourages EikoNet to learn a velocity close
to
̃
V
(
s
)
. Additionally, as discussed above, there are constraints specific to seismic tomography that
are not explicitly enforced through EikoNet’s architecture: (1) velocity reconstruction invariance
with respect to the source location, and (2) travel time symmetry between sources and receivers. We
augment the prior
p
(
θ
)
to include these constraints, implemented as soft constraints:
log
p
(
θ
) =
λ
θ
L
θ
︷
︸︸
︷
∑
r
∈R
,
s
∈S
||
̃
V
(
s
)
−
V
r
(
s
)
||
2
+
λ
V
L
V
︷
︸︸
︷
∑
r
i
,r
j
∈R
,
s
∈S
∥
∥
V
r
i
(
s
)
−
V
r
j
(
s
)
∥
∥
2
+
λ
T
L
T
︷
︸︸
︷
∑
r
∈R
,
s
∈S
‖
T
(
s,r
)
−
T
(
r,s
)
‖
2
.
(7)
The velocity constraint is represented through
L
V
and travel time constraint through
L
T
, with
corresponding hyperparameters
λ
V
and
λ
T
.
S
is a set of points sampled uniformly from the region
of interest,
R
is the set of all receiver locations, and
V
r
(
s
) =
||∇
s
f
θ
(
r,s
)
||
−
1
2
.
Implementation details:
In our experiments we define a realistic prior over unknown source
locations as
p
(
x
) =
N
( ̄
x,σ
x
)
, where
̄
x
∼N
(
x,σ
x
)
and
σ
x
= 2
km. We assume the measurements
y
are sampled from a Gaussian distribution with mean
̃
y
– true travel times computed using the
package
eikonalfm
[
29
,
33
] – and a realistic standard deviation of
σ
y
= 0
.
2
seconds. To simulate
real world passive tomography, we assume receivers are located only along the surface (the top edge
of the 2D image) of the region of interest, which is
20
km
×
20
km, and sources can be anywhere
within the region of interest. Before DeepGEM optimization, we first initialize
θ
using maximum
likelihood estimation assuming
̄
x
is the true source location.
6
Source
Locs
. 1
Source
Locs
. 2
Source
Locs
. 3
Source
Locs
. 4
Source
Locs
. 5
Known
Source
Locs
. w/
Gradien
t
ℒ
Θ
Prior
Uncertain
Source
Locs
. w/
Gradient
ℒ
Θ
Prior
Source
Locs
.
Posterior
Source
Count
Velocity
Error
(km/s)
Source
Loc. Error
(m)
9
0.52
±
0.063
62.88
±
21.77
25
0.42
±
0.043
36.62
±
4.93
49
0.44
±
0.181
33.78
±
4.56
100
0.27
±
0.021
30.38
±
1.43
Table 1: Velocity and source
localization error on random
source configurations
Figure 4:
DeepGEM consistently recovers prominent features across various earthquake
source configurations.
Row 2 shows DeepGEM results obtained using different random source
configurations, where the true Earth velocity is shown in Fig. 1(a). As a reference, row 1 shows the
velocity reconstruction obtained using DeepGEM under fixed, perfectly known source locations. As
can be seen by the table, both velocity and localization error decrease with an increasing number
of sources in the region of interest. Each mean and standard deviation is computed using 5 random
realizations of true source configurations.
The posterior distribution of the source locations,
x
, is estimated using a Real-NVP network
G
φ
(
·
)
with 16 affine coupling layers. An updated EikoNet (described in the supplemental material) has been
modified to parameterize
f
θ
(
x
)
. This EikoNet is pretrained with samples from the prior
p
(
x
)
as input
and the simulated travel time measurements as output. We use Adam as the optimizer [
36
] with a
batch size of 32 and an E-step learning rate of 1e-3 and M-step learning rate of 5e-5. Hyperparameters
(
λ
T
,λ
V
,λ
θ
)
were empirically chosen by inspecting the loss on a grid search over hyperparameters.
Results presented have been run with 10 EM iterations, each with 800 “E”-step epochs and 2000
M-step epochs. Each DeepGEM model takes
∼
6 hours on a NVIDIA Quatro RTX 5000.
4.3 Results
Comparison to Baseline Approaches:
We compare results from DeepGEM to results obtained
using a baseline run by a seismologist. This iterative baseline alternates between source inversion
and straight ray tomography, and is the standard approach used for blind tomography. Further detail
on this baseline is provided in the supplemental material. The gradient model shown in Fig. 2 is used
to perform the initial source inversion for this baseline; nonetheless, we find that the solution quickly
diverges. Therefore, we rely on the expertise of the domain expert to decide when to terminate
the optimization. As seen in Fig. 3, DeepGEM consistently outperforms this human-in-the-loop
baseline across all source configurations. In Fig. 3 we also compare with a
MAP
θ,x
solution.
MAP
θ,x
is consistently outperformed by the DeepGEM
MAP
θ
approach across all source configurations.
Additional results highlighting the sensitivity of
MAP
θ,x
to hyperparameters and initialization are
shown in the supplemental material.
Sensitivity to
L
θ
Prior Choice:
We evaluate DeepGEM’s recovery of the true velocity structure
shown in Fig. 1 using one of three different
L
θ
priors shown in Fig. 2: homogeneous, gradient, and
layer, as well as
L
θ
= 0
. The homogeneous model takes on value of 6.419 km/s, the average velocity
value of the true velocity structure. The gradient captures the increasing velocity as depth increases,
and the layer model represents the true model without the added anomaly. As shown in Fig. 3, the
mean squared error tends to decrease with the gradient and layer
L
θ
prior models, which are closer to
the true velocity structure.
Sensitivity to Source Configuration:
We evaluate results with sources that are both uniformly
and randomly spaced throughout the region of interest. Improved performance is expected when
the number of sources is increased and/or when sources are well distributed. To better understand
DeepGEM’s performance we introduce an ablation, where the velocity structure is learned by training
f
θ
(
·
)
with access to perfect source locations
x
(i.e.,
p
(
x
)
is a delta function). As is shown in Fig 3,
even when training with true source locations, the anomaly is not well resolved until
∼
49 sources.
7