of 10
29
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
Supplementary information for:
Kernel Learning for Robust Dynamic Mode
Decomposition: Linear and Nonlinear Disambiguation
Optimization (LANDO)
by
Peter J. Baddoo, Benjamin Herrmann, Beverley J. McKeon, and, Steven L. Brunton
A. Pseudocodes
Here we present pseudocodes for the dictionary learning method and batch learning methods
described in the main body of the paper.
Algorithm 1
Sparse ALD dictionary learning with Cholesky updates
The operation count for each step is included on the right
Inputs
: data matrix
X
, kernel
k
, sparsification tolerance
Output
: the sparse dictionary
̃
X
Optional: randomly permute the columns of
X
for
t
=1
!
m
do
Select new sample
x
t
Compute
̃
k
t
1
with (
3.11
)
O
(
n
̃
m
t
)
Compute
t
with backsubstitution (
3.10
)
O
(
̃
m
2
t
)
Compute
t
using (
3.9
)
O
(
̃
m
t
)
if
t
(almost linearly dependent)
then
Maintain the dictionary:
D
t
=
D
t
1
O
(
̃
m
2
t
)
else if
t
>
(not almost linearly dependent)
then
Update the dictionary:
D
t
=
D
t
1
[
{
x
t
}
Update the Cholesky factor
C
t
using (
3.13
)
O
(
̃
m
2
t
)
end if
end for
Algorithm 2
Learning the model and analysing the linear component
Inputs
: data matrices
X
and
Y
, kernel
k
, dictionary tolerance
Outputs
: model
f
, constant
c
, linear component
L
, nonlinear component
N
, eigenvectors
, and
eigenvalues
Build the dictionary
̃
X
according to algorithm
1
Solve
argmin
̃
W
k
Y
̃
W
k
(
̃
X
,
X
)
k
F
(for example,
̃
W
=
Y
k
(
̃
X
,
X
)
)
Define
S
according to (
4.7
)
Form the model as
f
(
x
)=
̃
W
k
(
̃
X
,
x
)
Form
c
,
L
, and
N
according to section
4(a)
and a choice of base state
Compute the eigendecomposition of
ˆ
L
according to lemma
1
Form the eigenvectors
and eigenvalues
according to (
4.8
)
B. Glossary of terms
Table
1
provides the nomenclature used throughout the paper. We have attempted to maintain
consistency with DMD [
15
,
24
], SINDy [
4
], and KRLS [
7
] where possible, although several changes
were made to unify the notation. Importantly, the features are denoted by
in this work, whereas
30
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
they are denoted by
in SINDy. Similarly, the kernel weights are denoted by
W
in this work,
whereas they are denoted by
in KRLS.
Table 1: A summary of terms used in the paper.
Symbol
Type
Meaning
n
number
State dimension
m
number
Total number of samples
N
number
Dimension of nonlinear feature space
x
n
-vector
State vector
y
n
-vector
Model output (e.g.
y
=
̇
x
or
y
k
=
x
k
+1
)
X
n
m
matrix
Right data matrix
Y
n
m
matrix
Left data matrix
F
(
x
)
function
Underlying system
f
(
x
)
function
Approximate learned model
c
n
-vector
Constant shift in model
L
n
n
matrix
Linear component of true operator
N
function
Purely nonlinear component of operator
A
n
n
matrix
DMD best-fit operator
k
(
·
,
·
)
function
Kernel function
(
·
)
function
Nonlinear (implicit) feature space
t
continuous variable
time
t
discrete index
Snapshot number
number
Dictionary sparsification parameter
̃
tilde
Indicates quantity is connected to the dictionary
̃
m
t
number
Number of samples in dictionary at time
t
̃
K
t
̃
m
t
̃
m
t
matrix
Kernel matrix of dictionary elements at time
t
̃
C
t
̃
m
t
̃
m
t
matrix
Cholesky decomposition of
̃
K
t
̃
m
number
Final number of samples in the dictionary
t
̃
m
t
t
matrix
Matrix that approximately maps samples before time
t
onto the
dictionary
ˆ
hat
Indicates quantity is projected onto POD subspace
x
n
-vector
Base state (e.g. statistical mean or equilibrium solution)
C. Comparison to related data-driven methods
In this section we provide further details of related data-driven methods. The comparison is
summarised in figure
10
.
31
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
(i) Sparse identification of nonlinear dynamics
The SINDy algorithm [
4
] was developed based on the observation that many complex dynamical
systems may be expressed as systems of differential equations with only a few terms, so that they
are
sparse
in the feature space
(
x
)
. Thus, it is possible to solve for an expansion of the dynamics
in (
2.3
) with only a few nonzero entries in
, corresponding to the active terms in the
(
x
)
that are
present in the dynamics. Solving for the sparse vector of coefficients, and therefore the dynamical
system, is achieved through the following optimization
argmin
k
Y
(
X
)
k
F
+
k
k
0
.
(A 1)
The
k
·
k
0
term is not convex, although there are several relaxations that yield accurate sparse
models. The SINDy algorithm has also been extended to include partially known physics [
17
],
such as conservation laws and symmetries, dramatically improving the ability to learn accurate
models with less data. It is also possible with SINDy to disambiguate the linear and nonlinear
model contributions, enabling linear stability analyses, even for strongly nonlinear systems.
However, the feature library
(
x
)
scales poorly with the state dimension
n
, so SINDy is typically
only applied to relatively low-dimensional systems. A recent tensor extension to SINDy [
9
]
provides the ability to handle much larger libraries, which is very promising. In the present work,
we use kernel representations to obtain tractable implicit models that may be queried to extract
structure, such as the disambiguated linear terms.
(ii) Extended DMD
The extended DMD [
26
] was developed to improve the approximation of the Koopman operator
by augmenting the DMD vector
x
with nonlinear functions of the state, similar to the feature
vector
(
x
)
above. However, instead of modeling
x
k
+1
as a function of
(
x
k
)
, as in SINDy,
eDMD models the evolution of
(
x
k
+1
)
, which results in a much larger regression problem.
argmin
k
(
Y
)
(
X
)
k
F
.
(A 2)
This approach was then kernelized [
27
] to make the algorithm computationally tractable.
(a) Connection to exact dynamic mode decomposition
We now elucidate the connection between the present work and exact dynamic mode
decomposition of [
24
] which was introduced in section
2(a)
. In particular, we demonstrate that
exact DMD can be viewed as a special case of the present work when there is no sparsity
promotion and the kernel is linear. The linear kernel is
k
(
u
,
v
)=
u
v
, so the implicit feature space
is simply
(
x
)=
x
. As such, the full model is the linear map
f
(
x
)=
Lx
where
L
is (
4.7
)
L
=
̃
W
̃
X
.
(A 3)
In exact DMD there is no sparsity promotion so the dictionary used in our algorithm is full:
=
I
and
̃
X
=
X
. Moreover,
̃
W
is given by (
3.3
) so
L
=
Y
k
(
X
,
X
)
X
.
(A 4)
Expanding the kernel yields
L
=
Y
X
X
X
=
YX
(
X
)
X
=
YX
(A 5)
which is identical to the linear operator
A
from (
2.11
) defined by exact DMD. In (
A5
) we used
the identities for the Moore–Penrose pseudoinverse
(
M
M
)
=
M
(
M
)
and
M
(
M
)
M
=
M
for any matrix
M
.
Similarly, the eigenmodes computed by exact DMD are equivalent to those defined in lemma
1
in the special case of a linear kernel without sparsity promotion.
32
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
Figure 10: A comparison of methods for model discovery, including DMD [
21
,
23
],
extended/kernel DMD [
26
,
27
]), SINDy [
4
], and the proposed LANDO framework.
D. Online learning variant
In this section we derive the rank-one update equations used in the online regression algorithm.
The derivation is equivalent to that presented in [
7
] except we consider vector-valued outputs and
do not apply the inverted kernel matrix explicitly. A summary of the procedure may be found in
the pseudocode in algorithm
3
.
We define
Y
t
=
2
6
4
|||
y
1
y
2
···
y
t
|||
3
7
5
In the feature space, we may express all the samples up to time
t
as
t
=
̃
t
t
+
res
t
(A 1)
where
t
=
2
6
4
|||
1
2
···
t
|||
3
7
5
2
R
̃
m
t
t
(A 2)
maps the
̃
m
t
dictionary elements into the
t
feature vectors with small residual error
res
t
. By
causality, the lower triangular elements of
t
are zero. Only the online version of the algorithm
(see appendix
D
) uses
t
explicitly, and only requires
t
at time
t
. Thus,
t
can be overwritten
at each iteration to save memory.
The minimisation problem, without regularisation, at time
t
is
argmin
̃
W
t
Y
t
̃
W
t
k
(
̃
X
t
,
X
t
)
2
F
= argmin
̃
W
t
Y
t
̃
W
t
̃
t
t
2
F
.
(A 3)
33
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
The representation (
A1
) allows us to approximate the above as
argmin
̃
W
t
Y
t
̃
W
t
̃
t
̃
t
t
2
F
= argmin
̃
W
t
Y
t
̃
W
t
̃
K
t
t
2
F
.
(A 4)
The minimiser of the above is
̃
W
t
=
Y
t
̃
K
t
t
=
Y
t
t
̃
K
t
=
Y
t
t
t
t
1
̃
K
1
t
.
(A 5)
In the above, we have used the fact that, by construction,
̃
K
t
has full column rank and
t
has
full row rank.
When a new sample is considered, it falls into two cases as outlined in section
3(a)
. Either the
sample is almost linearly dependent on the current dictionary elements, or it is not. The updating
equations are different in each case and we derive them below. In what follows, it is convenient
to define
P
t
=
t
t
1
(A 6)
and
h
t
=
t
P
t
1
1+
t
P
t
1
t
.
(A 7)
Case I: Almost linearly dependent
If the new sample is almost linearly dependent on the dictionary elements then the dictionary is
not updated:
D
t
=
D
t
1
. Since the dictionary doesn’t change, neither does the kernel matrix so
̃
K
t
=
̃
K
t
1
. The update rule for
is simply
t
=
h
t
1
t
i
.
(A 8)
Thus,
t
t
=
t
1
t
1
+
t
t
, which corresponds to a rank-1 update. Accordingly, the
matrix inversion lemma says that the update rule for
P
t
is
P
t
=
P
t
1
P
t
1
t
t
P
t
1
1+
t
P
t
1
t
=
P
t
1
P
t
1
t
h
t
.
(A 9)
We may now define the update rule for
̃
W
t
. Since
Y
t
t
=
Y
t
1
t
1
+
y
t
t
,
(A 10)
applying (
A9
) to (
A5
) produces
̃
W
t
=
Y
t
t
P
t
̃
K
1
t
=
Y
t
1
t
1
+
y
t
t
(
P
t
1
P
t
1
t
h
t
)
̃
K
1
t
.
(A 11)
Expanding the brackets yields
̃
W
t
=
Y
t
1
t
1
P
t
1
Y
t
1
t
1
P
t
1
t
h
t
+
y
t
t
P
t
̃
K
1
t
.
(A 12)
Since we are not adding an element to the dictionary, the kernel matrix and its inverse remain the
same:
̃
K
1
t
=
̃
K
1
t
1
. Additionally, from the regression in the previous iteration we have
̃
W
t
1
=
Y
t
1
t
1
P
t
1
̃
K
1
t
1
. Thus, (
A 12
) can be expressed as
̃
W
t
=
̃
W
t
1
+
y
t
t
P
t
̃
W
t
1
̃
K
t
1
t
h
t
̃
K
1
t
.
(A 13)
Finally, on use of
h
t
=
t
P
t
and (
3.10
), the update rule for
̃
W
t
is
̃
W
t
=
̃
W
t
1
+
y
t
̃
W
t
1
̃
k
t
1
h
t
̃
K
1
t
.
(A 14)
As discussed in section
3(a)
, the product
h
t
̃
K
1
t
should be computed with two backsubstitutions
with the Cholesky factor
C
t
.
34
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
Algorithm 3
Online learning algorithm
The operation count for each step is included on the right
Inputs
: data matrices
X
and
Y
, kernel
k
, dictionary tolerance
Outputs
: model
f
, constant
c
, linear component
L
, nonlinear component
N
, eigenvectors
, and
eigenvalues
Optional: randomly permute the columns of
X
and
Y
with the same permutation
for
t
=1
!
m
do
Select new sample pair
(
x
t
,
y
t
)
Compute
t
according to algorithm
1
Compute
t
using (
3.10
)
O
(
̃
m
2
t
)
if
t
(almost linearly dependent)
then
Maintain the dictionary:
D
t
=
D
t
1
Update
̃
W
t
using (
A 14
)
O
(
n
̃
m
t
)
Compute
h
t
using (
A7
)
O
(
̃
m
2
t
)
Update
P
t
using (
A9
)
O
(
̃
m
2
t
)
else if
t
>
(not almost linearly dependent)
then
Update the dictionary:
D
t
=
D
t
1
[
{
x
t
}
Update the Cholesky factor
C
t
using (
3.13
)
O
(
̃
m
2
t
)
Update
P
t
using (
A 15
)
O
(
̃
m
t
)
Update
̃
W
t
using (
A 17
)
O
(
n
̃
m
t
)
Form the model
f
according to (
3.4
)
end if
end for
Define
S
according to (
4.7
)
Form
L
,
c
and
N
according to section
4(a)
Compute the eigendecomposition of
ˆ
L
according to lemma
1
Form the eigenvectors
and eigenvalues
according to (
4.8
)
Case II: Not almost linearly dependent
In this case, the new vector is not almost linearly dependent. Accordingly, we must add
x
t
to the
dictionary:
D
t
=
D
t
1
[
{
x
t
}
.
The update rules for
t
and
P
t
are simply
t
=
"
t
1
0
0
1
#
and
P
t
=
"
P
t
1
0
0
1
#
.
(A 15)
The update rule for
̃
W
t
is
̃
W
t
=
Y
t
t
P
t
̃
K
1
t
=
h
Y
t
1
t
1
P
t
1
y
t
i
̃
K
1
t
.
(A 16)
Finally, (
3.12
) allows us to express the update rule as
̃
W
t
=
̃
W
t
1
+
̃
W
t
1
̃
k
t
1
y
t
t
t
̃
W
t
1
̃
k
t
1
y
t
1
t
.
(A 17)
This completes the derivation of the equations for the online regression algorithm. Alternatives to
this proposed dictionary learning procedure include randomised methods [
20
,
25
] and the recent
method in [
9
].
E. Learning control laws
Our algorithm may also be used to simultaneously learn control laws and governing equations
from data. An active control variable can significantly alter the behaviour of a dynamical system
35
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
and thus further disguise the underlying dynamics [
14
,
18
]. In many practical scenarios – such as
epidemiological modelling of disease spread where the control variables could be the distribution
of vaccinations – it is infeasible to gather data on the unforced system so the effects of control must
be disambiguated from the data [
19
]. This strategy can be used to uncover the dynamics of the
unforced system, which can then inform design of effective control strategies. In such systems the
underlying dynamics take the form
y
=
F
(
x
,
u
)
(A 1)
where
u
is the control variable. Similarly to
x
and
y
, the values of
u
are known at each
snapshot time. As in dynamic mode decomposition with control (DMDc, [
18
]), we may write
the supplemented state vector as
!
=[
x
u
]
so that (
A1
) may be expressed as
y
=
G
(
!
)
.
(A 2)
Our task is now to learn a model
g
that approximates the underlying system defined by
G
.
We can use ideas explained in section
4(d)
to design a suitable kernel. In particular, we can
exploit the fact that kernels are closed under direct sums (
4.17
). Unless we believe that there are
nonlinear pairings between the control variable and the state space, we can assume a kernel of
the form
k
(
!
,
!
0
)=
k
x
(
x
,
x
0
)+
k
u
(
u
,
u
0
)
.
(A 3)
For example, we may have reason to believe that
y
is generated by quadratic interactions between
the states
x
whereas the control variable has only a linear effect. Then, the kernel
k
(
!
,
!
0
)=(
x
x
0
)
2
+(
u
u
0
)
(A 4)
induces the appropriate feature space. The algorithm of section
3
can then be applied with
X
replaced by the augmented data matrix
=
X
U
to learn a model of the form
g
(
!
)=
̃
W
k
(
̃
,
!
)
(A 5)
where
̃
is the augmented dictionary matrix. If the kernel takes the form (
A3
) then the
reconstruction/prediction model is
g
(
!
)=
̃
W
x
k
x
(
̃
X
,
x
)+
̃
W
u
k
u
(
̃
U
,
u
)
(A 6)
where
̃
=
h
̃
X
̃
U
i
and
̃
W
=[
̃
W
x
̃
W
u
]
. The unforced system can then be modelled by setting
̃
W
u
=
0
. Furthermore, we can also compute local linear models (i.e., DMD models) of the
unforced system using the ideas of section
4(a)
. The analysis follows exactly except
̃
W
k
(
̃
X
,
x
)
in
section
4(a)
is replaced with
̃
W
x
k
x
(
̃
X
,
x
)
. Note that if the kernel is taken to be
k
(
!
,
!
0
)=
!
!
0
=(
x
x
0
)+(
u
u
0
)
(A 7)
then we recover the original DMDc formulation [
18
].
These ideas are also valid for kernels that don’t take the form (
A3
) but the algebra is slightly
more involved and we therefore do not report the results here.
F. Sensitivity to noise
All machine learning algorithms must be understood in the context of their sensitivity to noise.
To explore the effects of noise, we applied our learning framework to noise-contaminated data
generated by the Lorenz system from section
6(a)
. The data are contaminated with Gaussian noise
of magnitude
5%
of the variance of the original data; the noisy training data is visualised in the left
panel of figure
11
. We use the same parameters as section
6(a)
, but the system is now integrated
to
t
=50
. Unlike section
6(a)
, we assume that we only have access to snapshot measurements
of
x
and velocity measurements
̇
x
are unavailable. Therefore, we approximate the derivative
̇
x
from the noisy snapshot data with a total-variation regularisation scheme [
5
]. Then, we use
36
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
True mo del
Noisy measurements
Learned model
-10 10
0
1
-1 8.485
-8.485
-8.485
-2.667
-9.947
9.981
0.025
1.000
-0.794
8.334
-8.216 -8.639
-2.559
Linear models
at agree
5% Gaussian noise,
50,000 snapshots
Figure 11: Implicit learning of the Lorenz system in the presence of noise. Initial measurements
are corrupted by
5%
Gaussian noise. The velocity vector
̇
x
is computed by differentiating the data
measurements using total variation regularised differentiation [
5
]. The reconstruction captures
the qualitative features of the original Lorenz system, and also accurately reproduces the local
linear model at
x
. The trajectories are colored by the adaptive time step, with red indicating a
smaller time step.
clean data
1% noise
5% noise
10% noise
Figure 12: Identifying the spectrum of Burgers equation in the presence of noise over 20 trials. The
first three eigenvalues are plotted in black circles
and their approximations are plotted in blue
crosses
.
the algorithm of section
3
to learn the Lorenz system with a quadratic kernel. The results of the
learned model are illustrated in the middle panel of figure
11
, along with the true local linear
model, both evaluated at the equilibrium point
x
=
h
p
(
1)
p
(
1)
1
i
T
.
The reconstructed trajectory shows good qualitative agreement with the true model, and the
local linear model is a good approximation to the true linearisation. The accuracy of these
approximations usually improves as more samples are added.
We also demonstrate the effect of noise on identifying the spectrum of the viscous Burgers’
equation from section
5(c)
. Here, the kinematic viscosity is
=0
.
1
and the equations are
integrated to
t
=4
. The data are snapshots of the solution which are then corrupted by Gaussian
noise of varying magnitude. No velocity measurements
̇
x
are used and we do not de-noise the
data. Figure
12
plots the first three (repeated) eigenvalues learned by the algorithm for 20 trials
with a quadratic kernel. The first two eigenvalues are recovered accurately for small values of
37
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
noise but the sensitivity to noise increases for eigenvalues of larger magnitude. Again, these
approximations can be improved by adding more samples, or by a suitable de-noising.
Further experiments indicate that, in the absence of de-noising, the algorithm is robust to
noise in
Y
but relatively sensitive to noise in
X
. These observations can be explained through
the linear algebra and statistics underlying our kernel learning framework. The impact of noise
in
X
is felt in two forms. Firstly, noise may cause elements to be included in the dictionary
which may otherwise have been excluded. The dictionary is independent of
Y
and is therefore
unaffected by noise in
Y
. If the sparsification parameter is not chosen carefully then noise in
X
may cause the dictionary to become dense. Secondly, noise causes errors in the final regression,
whether performed online or in batch. As is typical of least-squares regressions, our algorithm is
an unbiased estimator when the noise is restricted to
Y
. This is because least-squares methods
implicitly assume that there are no “errors in variables” [
10
]. This assumption becomes invalid
when
X
is contaminated by noise. As such, when noise is present, the naïve pseudoinverse (
3.14
)
solution may prove insufficient. For example, a similar issue arise in DMD when there is noise in
X
, and several approaches have been proposed to mitigate the effects of noise [
1
3
,
6
,
12
,
22
], for
example solving the regression problem with total least squares (TLS) [
12
]. Experiments with
TLS in our present setting were found to be unsuccessful, in part because the TLS problem
is unstable [
11
] but also because our problem is nonlinear. In particular, TLS only guarantees
the best solution to
argmin
A
k
AX
Y
k
F
only when the errors in
X
and
Y
are column-wise
independent and identically distributed with zero mean and covariance matrix
2
I
[
13
, chapter
8]. Therefore, there are no guarantees of the effectiveness of TLS for our problem, which is
argmin
̃
W
k
̃
W
k
(
̃
X
,
X
)
Y
k
F
. In the case of nonlinear dynamics, whether with SINDy or our
kernel approach, the noise in
X
is stretched and transformed through the nonlinearity, adding
nonlinear correlations.
In summary, we have demonstrated that our algorithm can remain effective in the presence
of noise. The Lorenz example in figure
11
indicates that the algorithm is effective when
applied to derivatives computed from noisy data by total-variation regularisation. Additionally,
experiments and theory suggest that the present framework is insensitive to noise in
Y
but
moderately sensitive to noise in
X
. In addition to total-variation regularisation, there are several
other methods that could be deployed to limit the effects of noise. For example, one technique
could combine the KRLS algorithm with a Kalman filter, as explored in [
16
]. Another filtering
approach would be to use the optimal hard threshold criterion for singular values of [
8
]. Fully
addressing the challenge of noise is an area of active ongoing research and will be the focus of
future work.
References
1
T. Askham and J. N. Kutz. Variable projection methods for an optimized dynamic mode
decomposition.
SIAM J. Appl. Dyn. Syst.
, 17(1):380–416, 2018.
2
Omri Azencot, Wotao Yin, and Andrea Bertozzi. Consistent dynamic mode decomposition.
SIAM J. Appl. Dyn. Syst.
, 18(3):1565–1585, 2019.
3
Shervin Bagheri. Effects of weak noise on oscillating flows: Linking quality factor, Floquet
modes, and Koopman spectrum.
Physics of Fluids
, 26(9):094104, 2014.
4
S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by
sparse identification of nonlinear dynamical systems.
Proc. Natl. Acad. Sci.
, 113(15):3932–3937,
2016.
5
Rick Chartrand. Numerical Differentiation of Noisy, Nonsmooth Data.
ISRN Appl. Math.
,
2011:1–11, 2011.
6
Scott T.M. Dawson, Maziar S. Hemati, Matthew O. Williams, and Clarence W. Rowley.
Characterizing and correcting for the effect of sensor noise in the dynamic mode
decomposition.
Exp. Fluids
, 57(3):42, 2016.
7
Y. Engel, S. Mannor, and R. Meir. The kernel recursive least-squares algorithm.
IEEE Trans.
Signal Process.
, 52(8):2275–2285, 2004.
38
rspa.royalsocietypublishing.org Proc R Soc A 0000000
..........................................................
8
Matan Gavish and David L. Donoho. The optimal hard threshold for singular values is
4
/
p
3
.
IEEE Trans. Inf. Theory
, 60(8):5040–5053, 2014.
9
Patrick Gelß, Stefan Klus, Ingmar Schuster, and Christof Schütte. Feature space approximation
for kernel-based supervised learning. Technical report, 2020.
10
G. H. Golub and C. F. Van Loan.
Matrix Computations
, volume 3. JHU Press, 2013.
11
Gene H. Golub and Charles F. van Loan. An analysis of the total least squares problem.
SIAM
J. Numer. Anal.
, 17(6):883–893, 1980.
12
Maziar S Hemati, Clarence W Rowley, Eric A. Deem, and Louis N. Cattafesta. De-biasing the
dynamic mode decomposition for applied Koopman spectral analysis.
TCFD
, 31(4):349–368,
2017.
13
S Van Huffel and J Vandewalle.
The total least squares problem: computational aspects and analysis
.
SIAM, 1991.
14
Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Sparse identification of nonlinear
dynamics for model predictive control in the low-data limit.
Proceedings of the Royal Society
of London A
, 474(2219), 2018.
15
J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor.
Dynamic Mode
Decomposition: Data-Driven Modeling of Complex Systems
. SIAM, 2016.
16
Weifeng Liu, Ii Park, Yiwen Wang, and José C. Principe. Extended kernel recursive least
squares algorithm.
IEEE Trans. Signal Process.
, 57(10):3801–3814, 2009.
17
J-C Loiseau and S. L. Brunton. Constrained sparse Galerkin regression.
J. Fluid Mech.
, 838:42–
67, 2018.
18
J. L. Proctor, S. L. Brunton, and J. N. Kutz. Dynamic mode decomposition with control.
SIAM
J. Appl. Dyn. Syst.
, 15(1):142–161, 2016.
19
J. L. Proctor and P. A. Eckhoff. Discovering dynamic patterns from infectious disease data
using dynamic mode decomposition.
International health
, 7(2):139–145, 2015.
20
Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines.
NIPS
,
3(4):5, 2007.
21
Clarence W. Rowley, Igor Mezi ́c, Shervin Bagheri, Philipp Schlatter, and Dan S. Henningson.
Spectral analysis of nonlinear flows.
J. Fluid Mech.
, 641:115–127, 2009.
22
Isabel Scherl, Benjamin Strom, Jessica K Shang, Owen Williams, Brian L Polagye, and Steven L
Brunton. Robust principal component analysis for particle image velocimetry.
Physical Review
Fluids
, 5(054401), 2020.
23
P. J. Schmid. Dynamic mode decomposition of numerical and experimental data.
J. Fluid Mech.
,
656:5–28, 2010.
24
J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz. On dynamic mode
decomposition: Theory and applications.
J. Comput. Dyn.
, 1(2):391–421, 2014.
25
Christopher K.I. Williams and Matthias Seeger. Using the nyström method to speed up kernel
machines. In
Adv. Neural Inf. Process. Syst.
, 2001.
26
M. O. Williams, I. G. Kevrekidis, and C. W Rowley. A data-driven approximation of the
Koopman operator: extending dynamic mode decomposition.
J. Nonlinear Sci.
, 25(6):1307–
1346, 2015.
27
M. O. Williams, C. W. Rowley, and I. G. Kevrekidis. A kernel-based method for data-driven
Koopman spectral analysis.
J. Comput. Dyn.
, 2(2):247–265, 2015.