SIAM/ASA J. U
NCERTAINTY
Q
UANTIFICATION
©
2023 Society for Industrial and Applied Mathematics
and American Statistical Association
Vol. 11, No. 2, pp. 480--513
Convergence Rates for Learning Linear Operators from Noisy Data
*
Maarten V. de Hoop
\dagger
, Nikola B. Kovachki
\ddagger
, Nicholas H. Nelsen
\S
,
and
Andrew M. Stuart
\S
Abstract.
This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The
training data comprises pairs of random input vectors in a Hilbert space and their noisy images under
an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known
basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues given
the data. Adopting a Bayesian approach, the theoretical analysis establishes posterior contraction
rates in the infinite data limit with Gaussian priors that are not directly linked to the forward
map of the inverse problem. The main results also include learning-theoretic generalization error
guarantees for a wide range of distribution shifts. These convergence rates quantify the effects of data
smoothness and true eigenvalue decay or growth, for compact or unbounded operators, respectively,
on sample complexity. Numerical evidence supports the theory in diagonal and nondiagonal settings.
Key words.
operator learning, linear inverse problems, Bayesian inference, posterior consistency, statistical
learning theory, distribution shift
MSC codes.
62G20, 62C10, 68T05, 47A62
DOI.
10.1137/21M1442942
1. Introduction.
The supervised learning of operators between Hilbert spaces provides a
natural framework for the acceleration of scientific computation and discovery. This frame-
work can lead to fast surrogate models that approximate expensive existing models or to
the discovery of new models that are consistent with observed data when no first principles
model exists. To develop some of the fundamental principles of operator learning, this paper
concerns (Bayesian) nonparametric linear regression under random design. Let
H
be a real
infinite-dimensional Hilbert space and
L
be an unknown---possibly unbounded and in general
densely defined on
H
---self-adjoint linear operator from its domain in
H
into
H
itself. We
study the following linear operator learning problem.
*
Received by the editors August 30, 2021; accepted for publication (in revised form) November 1, 2022; published
electronically May 11, 2023.
https://doi.org/10.1137/21M1442942
Funding:
The first author is supported by the Simons Foundation under the MATH + X program, U.S. De-
partment of Energy, Office of Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division
under grant DE-SC0020345, the National Science Foundation (NSF) under grant DMS-1815143, and the corporate
members of the Geo-Mathematical Imaging Group at Rice University. The third author is supported by the NSF
Graduate Research Fellowship Program under grant DGE-1745301. The fourth author is supported by NSF (grant
DMS-1818977) and AFOSR (MURI award FA9550-20-1-0358---Machine Learning and Physics-Based Modeling and
Simulation). The second, third, and fourth authors are supported by NSF (grant AGS-1835860) and ONR (grant
N00014-19-1-2408).
\dagger
Simons Chair in Computational and Applied Mathematics and Earth Science, Rice University, Houston, TX
77005 USA (mdehoop@rice.edu).
\ddagger
NVIDIA AI, NVIDIA, Santa Clara, CA 95051 USA (nkovachki@nvidia.com).
\S
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125 USA
(nnelsen@caltech.edu, astuart@caltech.edu).
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
480
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING LINEAR OPERATORS FROM NOISY DATA
481
Main Problem.
Let
\{
x
n
\} \subset
H
be random design vectors and
\{
\xi
n
\}
be noise vectors. Given
the training data pairs
\{
(
x
n
,y
n
)
\}
N
n
=1
with sample size
N
\in
\BbbN
, where
(1.1)
y
n
=
Lx
n
+
\gamma \xi
n
for
n
\in \{
1
,...,N
\}
and
\gamma >
0
,
find an estimator
L
(
N
)
of
L
that is accurate when evaluated outside of the samples
\{
x
n
\}
.
The estimation of
L
from the data (1.1) is generally an ill-posed linear inverse problem
[31]. In principle, the chosen reconstruction procedure should be consistent: the estimator
L
(
N
)
converges to the true
L
as
N
\rightarrow \infty
. The rate of this convergence is equivalent to the
sample complexity
of the estimator, which determines the efficiency of statistical estimation.
The sample complexity
N
(
\varepsilon
)
\in
\BbbN
is the number of samples required for the estimator to
achieve an error less than a fixed tolerance
\varepsilon >
0. It quantifies the difficulty of Main Problem.
In modern scientific machine learning problems where operator learning is used, the de-
mand on data from different operator learning architectures often outpaces the availability
of computational or experimental resources needed to generate the data. Ideally, theoretical
analysis of sample complexity should reveal guidelines for how to reduce the requisite data
volume. To that end, the broad purpose of this paper is to provide an answer to the question:
what factors can reduce sample size requirements for linear operator learning
?
Our goal is not to develop a practical procedure to regress linear operators between infinite-
dimensional vector spaces. Various methods already exist for that purpose, including those
based on (functional) principal component analysis (PCA) [13, 26, 39]. Instead, we aim to
strengthen the rather sparse but slowly growing theoretical foundations of operator learning.
We overview our approach to solve Main Problem in subsection 1.1. We summarize one
of our main convergence results in subsection 1.2. In subsection 1.3, we illustrate examples to
which our theory applies. Subsection 1.4 surveys work related to ours. The primary contri-
butions of this paper and its organization are given in subsections 1.5 and 1.6, respectively.
1.1. Key ideas.
In this subsection, we communicate the key ideas of our methodology at
an informal level and distinguish our approach from similar ones in the literature.
1.1.1. Operator learning as an inverse problem.
We cast Main Problem as a Bayesian
inverse problem with a
linear operator as the unknown object to be inferred from data
. Suppose
the input training data
\{
x
n
\}
from (1.1) are independent and identically distributed (i.i.d.)
according to a (potentially unknown) centered probability measure
\nu
on
H
with a finite
second moment. Let \Lambda :
H
\rightarrow
H
be the covariance operator of
\nu
with orthonormal eigenbasis
\{
\phi
k
\}
. Let the
\{
\xi
n
\}
be i.i.d.
\scrN
(0
,
Id
H
) Gaussian white noise processes independent of
\{
x
n
\}
.
Writing
Y
= (
y
1
,...,y
N
),
X
= (
x
1
,...,x
N
), and \Xi = (
\xi
1
,...,\xi
N
) yields the concatenated data
model
(1.2)
Y
=
K
X
L
+
\gamma
\Xi
.
The forward operator of this inverse problem is
K
X
:
T
\mapsto \rightarrow
(
Tx
1
,...,Tx
N
). Under a Gaussian
prior
L
\sim \scrN
(0
,
\Sigma ), the solution is the Gaussian posterior
L
|
(
X,Y
). For a fixed orthonormal
basis
\{
\varphi
j
\}
of
H
, it will be convenient to identify (1.2) with the countable inverse problem
(1.3)
y
jn
=
\sum
\infty
k
=1
x
kn
L
jk
+
\gamma \xi
jn
for
j
\in
\BbbN
and
n
\in \{
1
,...,N
\}
,
where
\xi
jn
\mathrm{i}\mathrm{i}\mathrm{d}
\sim \scrN
(0
,
1),
x
kn
=
\langle
\phi
k
,x
n
\rangle
H
, and
L
jk
=
\langle
\varphi
j
,L\phi
k
\rangle
H
. See subsection 2.2.2 for details.
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
482
M. DE HOOP, N. KOVACHKI, N. NELSEN, AND A. STUART
1.1.2. Comparison to nonparametric inverse problems.
In contrast, most theoretical
studies of Bayesian inverse problems concern the
estimation of a vector
f
\in
H
1
from data
(1.4)
Y
\prime
=
Kf
+
N
-
1
/
2
\xi ,
where
\xi
\sim \scrN
(0
,
Id
H
2
)
and
K
:
H
1
\rightarrow
H
2
is a known bounded linear operator between Hilbert spaces
H
1
and
H
2
.
This is a signal in the white noise model. Under a prior on
f
, the asymptotic behavior of the
posterior
f
|
Y
\prime
as the noise tends to zero (
N
\rightarrow \infty
) is of primary interest. Many analyses of
(1.4) consider the
singular value decomposition
(SVD) of
K
[6, 8, 23, 44, 65]. Projecting
f
into its coordinates
\{
f
k
\}
in the basis of right singular vectors
\{
\phi
\prime
k
\}
of
K
and writing
\{
Y
\prime
j
\}
for
observations of the stochastic process
Y
\prime
on the basis of left singular vectors of
K
yields
(1.5)
Y
\prime
j
=
\kappa
j
f
j
+
N
-
1
/
2
\xi
j
for
j
\in
\BbbN
,
where the
\{
\xi
j
\}
are i.i.d.
\scrN
(0
,
1) and
\{
\kappa
j
\}
are the singular values of
K
. Obtaining a sequence
space model of this form is always possible if
K
is a compact operator [23, sect. 1.2].
Some notable differences between the traditional inverse problem (1.4) and the operator
learning inverse problem (1.2) are evident. Equation (1.2) is directly tied to (functional)
regression, while (1.4) is not. The unknown
f
is a vector while
L
is an unknown operator.
A more major distinction is that
K
in (1.4) is deterministic and arbitrary, while
K
X
in (1.2)
is a
random forward map
defined by point evaluations. Their sequence space representations
also differ. Equation (1.5) is diagonal with a singly indexed unknown
\{
f
j
\}
, while (1.3) is
nondiagonal (because the SVD of
K
X
was not invoked) with a doubly indexed unknown
\{
L
jk
\}
. Thus, our work deviates significantly from existing studies.
1.1.3. Diagonalization leads to eigenvalue learning.
The technical core of this paper
concerns the sequence space representation (1.3) of Main Problem in the ideal setting that a
diagonalization of
L
is known.
Assumption
1.1
(diagonalizing eigenbasis given for
L
).
The unknown linear operator
L
from
Main Problem is diagonalized in the known orthonormal basis
\{
\varphi
j
\}
j
\in
\BbbN
\subset
H
.
Under this assumption and denoting the eigenvalues of
L
by
\{
l
j
\}
, (1.3) simplifies to
(1.6)
y
jn
=
\langle
\varphi
j
,x
n
\rangle
H
l
j
+
\gamma \xi
jn
for
j
\in
\BbbN
and
n
\in \{
1
,...,N
\}
.
In general, the random coefficient
\langle
\varphi
j
,x
n
\rangle
H
depends on every
\{
x
kn
\}
k
\in
\BbbN
from (1.3). To sum-
marize, under Assumption 1.1 we obtain a white noise sequence space regression model with
correlated random coefficients
. Inference of the full operator is reduced to only that of its
eigenvalue sequence. Equation (1.6) is at the heart of our analysis of linear operator learning.
The convergence results we establish for this model may also be of independent interest.
Our proof techniques in this diagonal setting closely follow those in the paper [44], which
studies posterior contraction for (1.5) in a simultaneously diagonalizable Bayesian setting.
However, our work exhibits some crucial differences with [44] which we now summarize.
(D1) (
forward map
) The coefficients
\{ \langle
\varphi
j
,x
n
\rangle
H
\}
in our problem (1.6) are random variables
(r.v.s), while in [44] the singular values
\{
\kappa
j
\}
in (1.5) are fixed by
K
. Also, the law of
\{ \langle
\varphi
j
,x
n
\rangle
H
\}
may not be known in practice; only the samples
\{
x
n
\}
may be given.
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING LINEAR OPERATORS FROM NOISY DATA
483
(D2) (
link condition
) Unlike in [44], our prior covariance operator \Sigma
is not linked to the SVD
of the forward map
K
X
. That is, we do not assume simultaneous diagonalizability.
(D3) (
prior support
) The Gaussian prior we induce on
\{
l
j
\}
is supported on a (potentially)
much larger sequence space in the scale
\scrH
s
(relative to
\{
\varphi
j
\}
with
s
\in
\BbbR
),
1
instead of
just the space
\ell
2
(
\BbbN
;
\BbbR
) (relative to
\{
\phi
\prime
k
\}
) charged by the prior on
\{
f
j
\}
in [44].
(D4) (
reconstruction norm
) Solution convergence for (1.6) is in
\scrH
-
s
norms relative to
\{
\varphi
j
\}
,
while only the
\ell
2
(
\BbbN
;
\BbbR
) norm relative to
\{
\phi
\prime
k
\}
(i.e., the
H
1
norm) is considered in [44].
These differences deserve further elaboration.
Item (D1): If
x
n
\in
H
almost surely (a.s.), then
\langle
\varphi
j
,x
n
\rangle
H
\rightarrow
0 a.s. as
j
\rightarrow \infty
in (1.6), just
as
\kappa
j
\rightarrow
0 if
K
in (1.4) is compact. However, we later observe that our
K
X
is not compact
.
Item (D2): The authors in [44] assume that the eigenbasis of the prior covariance of
f
is
precisely
\{
\phi
\prime
k
\}
, the right singular vectors of
K
in (1.4). This direct link condition between
the prior and
K
ensures that the implied prior (and posterior) on
\{
f
j
\}
is an infinite product
measure. Our analysis of (1.6) still induces an infinite product prior on
\{
l
j
\}
without using the
SVD of the forward operator
K
X
. Instead, we make mild assumptions that only weakly link
K
X
to the prior covariance operator \Sigma . See (1.7) for a relevant smoothness condition.
Item (D3): The reason we work with a sequence prior having support on sets larger than
\ell
2
is to include
unbounded operators
(with eigenvalues
|
l
j
| \rightarrow \infty
as
j
\rightarrow \infty
) in the analysis.
2
Item (D4): Only the
H
1
estimation error is considered in [44] because the unknown quan-
tity is a vector
f
\in
H
1
. Since our unknown is an operator, we also consider the
prediction
error
[21] on new test inputs (see subsection 2.2.5). This relates to the
\scrH
-
s
norms in (D4).
1.2. Main result.
Here and in the following we assume that there is some fixed ground
truth operator that underlies the observed output data.
Assumption
1.2
(true linear operator).
The data
Y
, observed as
\{
y
jn
\}
in (1.6), are generated
according to (1.2) for a fixed self-adjoint linear operator
L
=
L
\dagger
with eigenvalues
\{
l
\dagger
j
\}
.
Under (1.6), we study the performance of the posterior
\{
l
j
\} |
(
X,Y
) (and related point
estimators) for estimating the true
\{
l
\dagger
j
\}
in the limit of infinite data. The following concrete
theorem is representative of more general convergence results established later in the paper.
Theorem 1.3 (asymptotic convergence rate with Gaussian design).
Suppose Assumptions
1.1
and
1.2
hold with
\{
l
\dagger
j
\} \in \scrH
s
for some
s
\in
\BbbR
. Let
\nu
=
\scrN
(0
,
\Lambda )
be a Gaussian measure satisfying
(1.7)
c
-
1
1
j
-
2
\alpha
\leq \langle
\varphi
j
,
\Lambda
\varphi
j
\rangle
H
\leq
c
1
j
-
2
\alpha
for all sufficiently large
j
\in
\BbbN
for some
c
1
\geq
1
and
\alpha >
1
/
2
. Let
\bigotimes
\infty
j
=1
\scrN
(0
,\sigma
2
j
)
be the prior on
\{
l
j
\}
in
(1.6)
with variances
\{
\sigma
2
j
\}
satisfying
c
-
1
2
j
-
2
p
\leq
\sigma
2
j
\leq
c
2
j
-
2
p
for all sufficiently large
j
\in
\BbbN
for some
c
2
\geq
1
and
1
The Sobolev-like sequence Hilbert spaces
\scrH
s
=
\scrH
s
(
\BbbN
;
\BbbR
) are defined for
s
\in
\BbbR
by
\scrH
s
(
\BbbN
;
\BbbR
)
:
=
\{
v
:
\BbbN
\rightarrow
\BbbR
|
\sum
\infty
j
=1
j
2
s
|
v
j
|
2
<
\infty \}
.
They are equipped with the natural
\{
j
s
\}
-weighted
\ell
2
(
\BbbN
;
\BbbR
) inner-product and norm. We will usually interpret
these spaces as defining a smoothness scale [38, sect. 2] of vectors relative to the orthonormal basis
\{
\varphi
j
\}
of
H
.
2
Note, however, that unbounded operators with continuous spectra [25] are beyond the scope of this paper.
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
484
M. DE HOOP, N. KOVACHKI, N. NELSEN, AND A. STUART
0
2
4
6
8
10
12
α
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Convergence Rate Exponent
Unbounded
Bounded
Compact
(a) Varying
α
and
s
0
1
2
3
4
5
6
7
α
′
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
(b) Varying
α
′
and
s
−
6
−
4
−
2
0
2
4
6
z
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Inverse
Forward
(c) Varying
p
,
α
, and
s
Figure 1.
Fundamental principles of linear operator learning. The theoretical convergence rate exponents
(from Theorem
3.5
) corresponding to unbounded
(
-
\Delta
, s <
-
2
.
5)
, bounded
(
Id
, s <
-
1
/
2)
, and compact
((
-
\Delta )
-
1
, s <
1
.
5)
true operators are displayed (see principle
(P1)
and subsection
4.1
). With
p
=
s
+ 1
/
2
,
Figure
1(a) (
\alpha
\prime
= 4
.
5)
and Figure
1(b) (
\alpha
= 4
.
5)
illustrate the effects that varying input training data and test
data smoothness have on convergence rates, respectively (principles
(P2)
and
(P3)
). Figure
1(c)
shows that
learning the unbounded ``inverse map""
-
\Delta (
with
\alpha
=
\alpha
\prime
= 4
.
5)
is always harder than learning the compact
``forward map""
(
-
\Delta )
-
1
(
with
\alpha
=
\alpha
\prime
= 2
.
5)
as the shift
z
=
p
-
s
-
1
/
2
in prior regularity is varied (subsection
1.4
).
p
\in
\BbbR
. Denote by
P
D
N
the posterior distribution for
\{
l
j
\}
arising from the observed data
D
N
:
= (
X,Y
)
. Fix
\alpha
\prime
\in
[0
,\alpha
+ 1
/
2)
. If
min
\{
\alpha ,\alpha
\prime
\}
+ min
\{
p
-
1
/
2
,s
\}
>
0
, then there exists a
constant
C >
0
, independent of the sample size
N
, such that
(1.8)
\BbbE
D
N
\BbbE
\{
l
(
N
)
i
\}
\infty
i
=1
\sim
P
D
N
\sum
\infty
j
=1
j
-
2
\alpha
\prime
|
l
\dagger
j
-
l
(
N
)
j
|
2
\leq
CN
-
(
\alpha
\prime
+\mathrm{m}\mathrm{i}\mathrm{n}
\{
p
-
1
/
2
,s
\}
\alpha
+
p
)
for all sufficiently large
N
. The first expectation in
(1.8)
is over the joint law of
D
N
.
Equation (1.8) shows that, on average, posterior sample eigenvalue estimates converge in
\scrH
-
\alpha
\prime
to the true eigenvalues of
L
\dagger
in the infinite data limit. The hypothesis (1.7), which
controls the regularity of the data
\{
x
n
\}
, is immediately satisfied if, e.g., \Lambda is a Mat\'ern-like
covariance operator with eigenvectors
\{
\varphi
j
\}
. Theorem 1.3, whose proof is in Appendix A, is a
consequence of Theorem 3.3, which is valid for a much larger class of input data measures.
Nonetheless, Theorem 1.3 nearly tells the whole story. The convergence rate exponent in
(1.8) shows that the regularity of the ground truth, data, and prior each have an influence
on sample complexity. Figure 1 summarizes this complex relationship. The figure, and this
paper more generally, reveals three fundamental principles of (linear) operator learning:
(P1) (
smoothness of outputs
) The ground truth operator becomes statistically more efficient
to learn whenever the smoothness of its (noise-free) outputs
increases
. Moreover, as
the degree of smoothing of the operator
increases
, sample complexity improves.
(P2) (
smoothness of inputs
)
Decreasing
the smoothness of input training data improves sam-
ple complexity (in norms that do not depend on the training distribution itself).
3
(P3) (
distribution shift
) As the smoothness of samples from the input test distribution
in-
creases
, average out-of-distribution prediction error improves.
3
If the norm used to measure error depends on the training data distribution, this may no longer be true.
For example, in-distribution error (train and test on the same distribution) would correspond in Theorem 1.3
to setting
\alpha
\prime
=
\alpha
(see section 2.2.5). In this case, increasing
\alpha
would improve sample complexity.
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy
LEARNING LINEAR OPERATORS FROM NOISY DATA
485
Below, we discuss how the principles (P1) to (P3) manifest in Theorem 1.3 and Figure 1.
Item (P1): In Theorem 1.3, the left side of (1.8) is equivalent to the expected prediction
error over some input test distribution (see subsection 2.2.5 for details). Increasing
\alpha
\prime
increases
the regularity of test samples. Assuming for simplicity that
s
=
p
-
1
/
2, the convergence rate
in (1.8) is
N
-
(
\alpha
\prime
+
s
)
/
(
\alpha
+
s
+1
/
2)
as
N
\rightarrow \infty
. Thus, besides large
\alpha
\prime
, it is beneficial to have
large regularity exponents
\alpha
\prime
+
s
of the operator's evaluation on sampled test inputs or large
regularity exponents
s
of the true operator's eigenvalues. Indeed, Figures 1(a) to 1(c) suggest
that unbounded operators (whose eigenvalues grow without bound) are more difficult to learn
than bounded (eigenvalues remain bounded) or compact ones (eigenvalues decay to zero).
Item (P2):
Training inputs
with low smoothness are favorable. This is quantified in
Theorem 1.3 by decreasing
\alpha
, which means that the
\{
x
n
\}
become ``rougher"" (Figure 1(a)).
Item (P3): Figure 1(b) illustrates that increasing
\alpha
\prime
in Theorem 1.3 improves the error.
We reinforce Items (P1) to (P3) throughout the rest of the paper.
1.3. Examples.
Although quite a strong assumption, the known diagonalization from
Assumption 1.1 is still realizable in practice. For instance, there may be prior knowledge
that the data covariance operator commutes with the true operator (and hence shares the
same eigenbasis) or that the true operator obeys known physical principles (e.g., commutes
with translation or rotation operators). Regarding the latter, in [62] the authors infer the
eigenvalues of a differential operator closure for an advection-diffusion model from indirect
observations. As in [72], the operator could be known up to some uncertain parameter.
This is the case for several smoothing forward operators that define commonly studied linear
inverse problems, including the severely ill-posed inverse boundary problem for the Helmholtz
equation with unknown wavenumber parameter [8, sect. 5] or the inverse heat equation with
unknown scalar diffusivity parameter [72, sect. 6.1]. In both references, the eigenbases are
already known. Thus, our learning theory applies to these uncertain operators: taking
s
and
p
large enough in (1.8) yields prediction error rates of convergence as close to
N
-
1
as desired.
More concretely, the theory in this paper may be applied directly to the following examples.
1.3.1. Blind deconvolution.
Periodic deconvolution on the
d
-dimensional torus
\BbbT
d
is a
linear inverse problem that arises frequently in the imaging sciences. The goal is to recover a
periodic signal
f
:
\BbbT
d
\rightarrow
\BbbC
from noisy measurements
y
=
\mu
\ast
f
+
\eta ,
where
\mu
\ast
f
:
=
\int
\BbbT
d
f
(
\cdot -
t
)
\mu
(
dt
) and
\eta
is noise,
of its convolution with a filter
\mu
. The filter may be identified with a periodic signal or
more generally with a signed measure [72, sect. 6.2]. However,
\mu
is sometimes unknown;
this leads to
blind
or
semiblind deconvolution
. One path forward is to first estimate the
smoothing operator
K
\mu
:
f
\mapsto \rightarrow
\mu
\ast
f
from many random (
f,y
) pairs under the given model. By
the known translation-invariance of the problem,
K
\mu
is diagonalized in the complex Fourier
basis. Inference is then reduced to estimating the Fourier coefficients
\{
\mu
j
\}
of
\mu
, which are the
eigenvalues of
K
\mu
. Since
\{
\mu
j
\} \in \scrH
s
for some
s
\in
\BbbR
, Theorem 1.3 provides a convergence rate.
1.3.2. Radial EIT.
Electrical impedance tomography
(EIT) is a noninvasive imaging pro-
cedure that is used in medical, industrial, and geophysical applications [57]. Abstractly, EIT
Copyright
©
by SIAM and ASA. Unauthorized reproduction of this article is prohibited.
Downloaded 07/10/23 to 131.215.70.177 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy