of 23
royalsocietypublishing.org/journal/rspa
Research
Cite this article:
Baddoo PJ, Herrmann B,
McKeonBJ,NathanKutzJ,BruntonSL.2023
Physics-informed dynamic mode
decomposition.
Proc.R.Soc.A
479
: 20220576.
https://doi.org/10.1098/rspa.2022.0576
Received: 1 September 2022
Accepted: 23 January 2023
Subject Areas:
applied mathematics, computational
mathematics, fluid mechanics
Keywords:
machine learning, dynamic mode
decomposition, data-driven dynamical
systems
Author for correspondence:
Steven L. Brunton
e-mail:
sbrunton@uw.edu
Electronic supplementary material is available
online at
https://doi.org/10.6084/m9.figshare.
c.6423942
.
Physics-informed dynamic
mode decomposition
Peter J. Baddoo
1
, Benjamin Herrmann
2
,
Beverley J. McKeon
3
,J.NathanKutz
4
and
Steven L. Brunton
5
1
Department of Mathematics, Massachusetts Institute of
Technology, Cambridge, MA 02139, USA
2
Department of Mechanical Engineering, University of Chile,
Beauchef 851, Santiago, Chile
3
Graduate Aerospace Laboratories, California Institute of
Technology, Pasadena, CA 91125, USA
4
Department of Applied Mathematics, and
5
Department of
Mechanical Engineering, University of Washington, Seattle,
WA 98195, USA
PJB,
0000-0002-8671-6952
;BJM,
0000-0003-4220-1583
;
JNK ,
0000-0002-6004-2275
;SLB,
0000-0002-6565-5118
In
this
work,
we
demonstrate
how
physical
principles—such as symmetries, invariances and
conservation laws—can be integrated into the
dynamic
mode decomposition
(DMD). DMD is a widely used
data analysis technique that extracts low-rank modal
structures and dynamics from high-dimensional
measurements. However, DMD can produce models
that are sensitive to noise, fail to generalize outside
the training data and violate basic physical laws. Our
physics-informed DMD (piDMD) optimization, which
may be formulated as a Procrustes problem, restricts
the family of admissible models to a matrix manifold
that respects the physical structure of the system.
We focus on five fundamental physical principles—
conservation, self-adjointness, localization, causality
and shift-equivariance—and derive several closed-
form solutions and efficient algorithms for the
corresponding piDMD optimizations. With fewer
degrees of freedom, piDMD models are less prone to
overfitting, require less training data, and are often
less computationally expensive to build than standard
2023 The Authors. Published by the Royal Society under the terms of the
Creative Commons Attribution License
http://creativecommons.org/licenses/
by/4.0/
, which permits unrestricted use, provided the original author and
source are credited.
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
2
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
DMD models. We demonstrate piDMD on a range of problems, including energy-preserving
fluid flow, the Schrödinger equation, solute advection-diffusion and three-dimensional
transitional channel flow. In each case, piDMD outperforms standard DMD algorithms in
metrics such as spectral identification, state prediction and estimation of optimal forcings and
responses.
1. Introduction
Integrating partial knowledge of physical principles into data-driven techniques is a primary goal
of the scientific machine learning (ML) community [
1
]. Physical principles—such as conservation
laws, symmetries and invariances—can be incorporated into ML algorithms in the form of
inductive biases, thereby ensuring that the learned models are constrained to the correct physics.
Recent successful examples of ML algorithms that have been modified to respect physical
principles include neural networks [
2
10
], kernel methods [
11
,
12
], deep generative models
[
13
], data assimilation [
14
,
15
] and sparse regression [
16
20
]. These examples demonstrate that
incorporating partially known physical principles into ML architectures can increase the accuracy,
robustness and generalizability of the resulting models, while simultaneously decreasing the
required training data. In this work, we integrate knowledge of physical principles into one of
the most widely used methods in data-driven dynamical systems research: the dynamic mode
decomposition (DMD) [
21
25
].
DMD is a data diagnostic technique that extracts coherent spatial–temporal patterns from
high-dimensional time-series data [
21
,
25
]. Although DMD originated in the fluid dynamics
community [
21
], the algorithm has since been applied to a wealth of dynamical systems including
in epidemiology [
26
], robotics [
27
,
28
], neuroscience [
29
], quantum control [
30
], power grids [
31
]
and plasma physics [
32
,
33
]. Despite its widespread successes, DMD is highly sensitive to noise
[
34
36
], fails to capture travelling wave physics, and can produce overfit models that do not
generalize. We demonstrate that integrating physics into the learning framework can help address
these challenges.
Suppose that we are studying a dynamical system defined by
x
j
+
1
=
F
(
x
j
) for an unknown
function
F
. Given a collection of measurements
{
x
1
,
...
,
x
m
}
, DMD identifies the best low-rank
linear approximation of
F
. In other words, DMD seeks a low-rank matrix
A
such that
x
j
+
1
Ax
j
for
j
=
1,
...
,
m
. Arranging the data measurements into matrices
X
and
Y
(see (2.1) for details)
allows us to phrase the above formally as
arg min
rank (
A
)
r
||
Y
AX
||
F
.
(1.1)
After approximately
1
solving (1.1), the DMD process computes the dominant spectral
properties of the learned linear operator [
21
25
].
The rank-
r
constraint in (1.1) is motivated by the assumed modal structure of the system but
does not account for other important physical properties. For example, one limitation of DMD
is that the solution of (1.1) lies within the span of
Y
, so the learned model can fail to generalize
outside the training regime. We address this limitation and others by embedding partial physics
knowledge into the learning process.
In this work, we incorporate physical principles into the optimization in (1.1) by constraining
the solution matrix
A
to lie on a matrix manifold:
arg min
A
M
||
Y
AX
||
F
.
(1.2)
1
Although the problem (1.1) admits an exact solution [
37
,
38
], the vast majority of DMD implementations solve (1.1)
approximately
by projecting the problem into the
r
-dimensional subspace spanned by the first
r
principal components of
X
.For
that reason, we focus on comparing our new method to the commonly used exact DMD [
39
] and optimized DMD algorithms
[
40
] as opposed to the method of [
38
].
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
3
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
The matrix manifold
M
is dictated by the known physics of the system at hand. For example, we
may select
M
such that its members satisfy certain symmetries that are known to be obeyed by
the system. We call (1.2)
physics-informed
DMD (piDMD) as the optimization integrates underlying
knowledge of the system physics into the learning framework.
2
Again, the spectral features of
the solution of (1.2) can be computed to give insight into the modes that dominate the system
and their dynamics. Similarly to DMD, these modes can usually be computed without forming
A
explicitly. The low-rank DMD regression (1.1) is a special case of piDMD where
M
is the manifold
of matrices with rank bounded by
r
. However, piDMD models are not always low rank (see,
for example, §4d) but nevertheless have few degrees of freedom due to the physics-informed
constraints. Constraining the solution matrix improves the generalizability of the learned model,
reduces sensitivity to noise and reduces the demand for large training sets, as evidenced by a
series of applications in §4. Similarly to standard DMD, our goal is to decompose the data into
its most important modes; as such, our results focus on comparing the spectrum of the learned
operator
A
with that of the true underlying operator.
The optimization problem (1.2) is well known in the statistical literature as a
Procrustes problem
[
41
53
], which comprises of finding the optimal transformation between two matrices subject to
certain constraints on the class of admissible transformations. According to Greek mythology,
Procrustes was a bandit who would stretch or amputate the limbs of his victims to force them to
fit onto his bed. Herein,
X
plays the role of Procrustes’ victim,
Y
is the bed and
A
is the ‘treatment’
(stretching or amputation).
3
Procrustes problems (1.2) seek to learn the treatment
A
that best
relates the data measurements
X
and
Y
. Our recasting of the DMD regression as a Procrustes
problem is a new connection, and is the basis of piDMD. This perspective enables us to leverage
the substantial extant work on Procrustes problems into new application areas. The literature
contains many exact solutions for Procrustes problems, including notable cases of orthogonal
matrices [
49
], and symmetric matrices [
45
].
In a similar vein to the present work, [
54
] deployed techniques from group theory and
representation theory to derive a data-driven method to study the Koopman operator for
equivariant dynamical systems. By contrast, piDMD does not work with the Koopman operator
directly but permits a broader class of physical principles, including those that cannot be
expressed as equivariances.
A significant challenge in physics-informed ML is to develop implementations that
incorporate known physics but also scale to higher dimensions [
1
]. One contribution of this
work is to develop several exact solutions, in terms of standard linear algebra objects, of
physically relevant Procrustes problems. Where possible, we use matrix factorizations and
low-rank representations to alleviate the computational burden of implementation; it is rarely
necessary to form
A
explicitly when computing its spectral properties or other diagnostics. For
complicated matrix manifolds, an exact solution of the Procrustes problem (1.2) may be intractable
and we must resort to algorithmic approaches. Fortunately, optimization on matrix manifolds is
amaturefield[
55
], and many algorithms are implemented in the open source software package
‘Manopt’ [
56
].
The remainder of the paper is arranged as follows. In §2, we provide background information
on DMD and Procrustes problems. Then, in §3, we describe the broad framework of piDMD. We
consider a range of applications in §4, with a focus on shift-equivariant, conservative, self-adjoint,
local and causal systems. Section 5 concludes with a discussion of the limitations of piDMD
and suggests several future research directions. Further information is available in electronic
supplementary material.
2. Mathematical background
In this section, we provide further details on DMD and Procrustes problems. Throughout the
article, we assume that we have access to
m
snapshots pairs of
n
features each:
{
(
x
j
,
y
j
),
j
=
2
An open source implementation of piDMD is available in M
ATL AB
at
www.github.com/baddoo/piDMD
.
3
This terminology was first introduced by Hurley & Cattell [
41
].
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
4
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
1,
...
,
m
}
. For example,
x
j
may be a discretized fluid flow field at time
t
j
and
y
j
may be the flow
field at the next time step
t
j
+
1
. It is convenient to arrange the data into
n
×
m
snapshot matrices of
the form
X
=
|| |
x
1
···
x
m
|| |
=
̃
x
1
.
.
.
̃
x
n
and
Y
=
|| |
y
1
···
y
m
|| |
=
̃
y
1
.
.
.
̃
y
n
(2.1)
so that the
i
th rows of
X
and
Y
are the measurements of the
i
th features and the
j
th columns are
the
j
th temporal snapshots. Henceforth, we use
̃
·
to represent a row vector.
(a) Dynamic mode decomposition
DMD was initially proposed as a dimensionality reduction technique that extracts dominant
spatio-temporal coherent structures from high-dimensional time-series data [
21
]. In particular,
DMD identifies the leading-order spatial eigenmodes of the matrix
A
in (1.1), along with a
linear model for how the amplitudes of these coherent structures evolve in time. DMD has been
applied to a range of systems, as summarized in the monograph [
24
] and the recent review by
Schmid [
25
].
To address the challenges associated with DMD, researchers have derived many variations on
the original algorithm, including sparsity promoting DMD [
57
], DMD with control [
58
], noise-
robust variants [
34
36
,
40
], recursive DMD [
59
], online DMD [
60
,
61
], low-rank DMD [
38
]and
versions for under-resolved data in space or time [
62
64
]. At present, the most widely used
variant is ‘exact DMD’ [
23
], which phrases the DMD solution in terms of the Moore–Penrose
pseudoinverse. Owing to its simplicity and widespread use, most of the comparisons in this paper
are made between exact DMD and piDMD.
It is assumed that each
x
j
and
y
j
are connected by an unknown dynamical system of the form
y
j
=
F
(
x
j
); for discrete-time dynamics
y
j
=
x
j
+
1
and for continuous-time dynamics
y
j
=
̇
x
j
.DMD
aims to learn the dominant behaviour of
F
by finding the best linear approximation for
F
given
the data and then performing diagnostics on that approximation. Thus, DMD seeks the linear
operator
A
that best maps the snapshots in the set
{
x
j
}
to those in the set
{
y
j
}
:
y
j
Ax
j
for
j
=
1,
...
,
m
.
(2.2)
Expressed in terms of the snapshot matrices in (2.1), the linear system in (2.2) becomes
Y
AX
,
(2.3)
and the optimization problem for
A
is given by (1.1). The minimum-norm solution for
A
is given
by
A
=
YX
=
YV
Σ
U
,
(2.4)
where † indicates the Moore–Penrose pseudoinverse [
65
]and
X
=
U
Σ
V
is the singular value
decomposition.
In many applications, the state dimension
n
is very large, and forming or storing
A
explicitly
becomes impractical. Instead, we use a rank-
r
approximation for
A
, denoted by
ˆ
A
,where
r

n
.To
form
ˆ
A
, we construct the optimal rank-
r
approximation for
X
using the truncated singular value
decomposition [
66
]:
X
U
r
Σ
r
V
r
. We then project
A
onto the leading
r
principal components of
X
as
ˆ
A
=
U
r
AU
r
=
U
r
YV
r
Σ
1
r
.
(2.5)
It is now computationally viable to compute the eigendecomposition of
ˆ
A
as
ˆ
A
ˆ
Ψ
=
ˆ
ΨΛ
.
(2.6)
The eigenvectors of
A
can be approximated from the reduced eigenvectors by [
23
]
Ψ
=
YV
Σ
1
ˆ
Ψ
.
(2.7)
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
5
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
This eigendecomposition is connected to the Koopman operator of the system, and allows
reconstructions and predictions [
22
,
24
,
67
,
68
]. For example, for a discrete-time system (i.e.
y
k
=
x
k
+
1
) with evenly spaced samples in time, if the eigenvectors form a linearly independent
set then
x
j
=
ΨΛ
j
1
b
,
(2.8)
where the vector
b
contains the weights of the modes in the initial condition:
b
=
Ψ
x
1
. From the
above, it is clear that the eigenvalues
Λ
govern the temporal behaviour of the system and the
eigenvectors
Ψ
are the spatial modes.
(b) Procrustes problems
Procrustes problems (1.2) comprise finding the optimal transformation between two matrices
subject to certain constraints on the class of admissible transformations [
44
,
45
,
49
]. The
minimization is usually phrased in the Frobenius norm. Procrustes analysis finds relevance in
a wealth of fields including sensory analysis [
69
], data visualization [
70
], neural networks [
71
],
climate science [
72
] and solid mechanics [
45
]. A summary is available in the monograph [
44
].
The earliest and most common formulation is the ‘orthogonal Procrustes problem’ [
44
,
49
].
Suppose that we have two sets of measurements
X
and
Y
that we know are related by an unitary
(orthogonal) transformation (i.e. a rotation or reflection). The goal is to learn the best unitary
transformation that relates the data measurements. Thus,
A
is constrained to be a unitary matrix
and the minimization problem is
arg min
A
A
=
I
||
Y
AX
||
F
.
(2.9)
The solution to (2.9) was derived by Schönemann [
49
]as
A
=
U
YX
V
YX
,
(2.10)
where
U
YX
Σ
YX
V
YX
=
YX
is a full singular value decomposition. Alternatively,
A
=
U
P
,where
U
P
H
P
=
YX
is a polar decomposition. The solution is unique if and only if
YX
is full-rank. As
we show in §4b, a unitary matrix corresponds to an energy-preserving operator.
There are many solutions for Procrustes problems available in the literature for different matrix
constraints [
45
]. When exact solutions are not possible, algorithmic solutions can be effective [
56
].
3. Physics-informed dynamic mode decomposition
Incorporating physics into ML algorithms involves supplementing an existing technique with
additional biases. Usually, ML practitioners take one of three approaches [
1
]. First, observational
biases can be embedded through data augmentation techniques; however, augmented data are
not always available, and incorporating additional training samples can become computationally
expensive. Second, physical constraints can be included by suitably penalizing the loss function
[
8
]. Third, inductive biases can be incorporated directly into the ML architecture in the form of
mathematical constraints [
18
]. As noted by Karniadakis
et al
.[
1
], this third approach is arguably
the most principled method since it produces models that strictly satisfy the physical constraints.
piDMD falls into the final category. piDMD incorporates physical principles by constraining the
matrix manifold of the DMD regression problem (1.2).
4
We exchange the low-rank representation
of
A
typically sought in model order reduction in favour of alternative or additional matrix
structures more relevant to the physical problem at hand. In
figure 1
, we illustrate the matrix
structures used in this paper, along with their corresponding physical principles and references
for the optimal solutions.
Analysing a system with the piDMD framework requires four steps: modelling, interpretation,
optimization and diagnostics:
4
Although the focus of this paper is on the underlying regression problem, we are still primarily concerned with decomposing
the data into the form (2.8); hence, piDMD is a modal decomposition algorithm.
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
6
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
2
2
=
=
physics:
modal
matrix: low-rank
solution:
Schmid [21]
conservative
unitary
Schönemann
[49]
self-adjoint
symmetric
Higham [45]
shift-equivariant
circulant/Toeplitz/Hankel
new solutions
causal
upper triangular
new solutions
local
tri-diagonal
new solutions
multi-dimensional
block matrices
new solutions
dynamic mode decomposition
physics-informed dynamic mode decomposition (piDMD)
Figure 1.
Visual illustrations of the types of matrices, their corresponding physical principles and references to the solutions of
the corresponding optimization problem (1.2).
(i) Modelling: When studying a system from a data-driven perspective, it is common to have
some partial knowledge of the underlying physics of the system. This first step in piDMD
asks the user to summarize the known or suspected physical properties of the system at
hand.
(ii) Interpretation: Having determined the physical principle we would like to enforce, we
must translate these laws into the matrix manifold to which the linear model should be
constrained. This step involves knowledge of the data collection procedure such as the
spatial grid or the energy inner product.
(iii) Optimization: Equipped with a target matrix manifold, we may now solve the relevant
Procrustes problem (1.2). By suitably modifying the optimization routine, we can
guarantee that the resulting model satisfies the physical principle identified in step (i).
(iv) Diagnostics: The final step of piDMD involves extracting physical information from the
learned model
A
. For example, one may wish to analyse the spectrum or modes, compute
the resolvent modes [
73
], perform predictions, or investigate other diagnostics. Similarly
to the original DMD formulation, the learned modes can be used to decompose the data
into the form (2.8).
These steps are expanded on in electronic supplementary material, §A. As an example,
consider the travelling wave solution explored in the
figure 2
. We begin with the physical
principle that the system is shift equivariant (modelling), and this lead us to seek a circulant
matrix model (interpretation). We find a solution to the corresponding Procrustes problem
(optimization) and investigated the predictive behaviour of the model (diagnostics).
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
7
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
training data
standard DMD:
shift-equivariance:
test data
physics-informed DMD:
true system
physical principle
modal structure:
rank (
A
)
optimization
predictions
truth
rank (
A
)
A
Y
F
F
AX
A
arg min
Y
A
circulant
AX
arg min
Figure2.
Comparingstandarddynamicmodedecomposition(specifically,optimizedDMD[
40
])andphysics-informeddynamic
mode decomposition applied to the advection equation with incomplete data. The data are contaminated with 2% additive
Gaussian noise. Having trained DMD models, we then perform predictions for different initial conditions. Standard DMD fails
whereaspiDMDproducesafaithfulprediction.Inthefigure,
S
φ
representstheshiftoperator.TheperformanceofDMDimproves
ifthedatacoversafullcycleofthepulse.
While each of the four steps can be challenging, the optimization step is often the most difficult.
Moreover, any solutions are peculiar to the matrix manifold under consideration: the symmetric
Procrustes problem is largely unconnected to, say, the tridiagonal Procrustes problem. Thus,
in this article, we present many new solutions for (1.2) with different physics-informed matrix
manifold constraints. When exact solutions are not possible (e.g. if the manifold constraint is
quite complicated), there are sophisticated algorithmic solutions available [
56
].
Standard DMD exploits the low-rank structure of
A
to efficiently perform diagnostics on the
learned model. Some of the manifolds we consider (such as tridiagonal and upper triangular)
do not have an obvious or useful low-rank approximation. Instead, these matrices often have
an alternative structure that can be exploited to perform fast diagnostic tests. For example,
tridiagonal matrices rarely have a meaningful low-rank approximation, but nevertheless admit
a fast eigendecomposition and singular value decomposition [
65
].
Some of the matrix manifolds we consider (such as symmetric, triangular, tridiagonal or
circulant) can be phrased as linear equality constraints and can, in principle, be solved with linear-
equality constrained least squares [
65
]. However, the number of equality constraints needed is
O
(
n
2
) so the resulting least-squares matrix will have
O
(
n
2
) rows, which is intractably large in
most applications. Thus, we avoid phrasing the piDMD constraints in terms of linear equality
constraints and instead exploit properties of the matrix manifold to find solutions that can be
efficiently implemented (see §4a and §4c, for example).
DMD uses the available degrees of freedom in the model to optimize the reconstruction of the
data in a low-dimensional subspace. This can result in overfitting since the algorithm does not
respect global properties of the system; as such, the modes usually fail to characterize unseen
test data. In piDMD, we exchange some of the available degrees of freedom in the model to
enforce a global physical structure. This regularization produces models that are suboptimal in
the reconstruction norm but are more generalizable to unseen datasets. The generalizability of
piDMD is observed in each of our test cases as the algorithm is able to more accurately learn the
true eigenvalues of the underlying system. We emphasize that we are not claiming that the low-
rank constraint is always suboptimal. In fact, the low-rank constraint is often best and should be
used as a first attempt when analysing data. However, if prior physical knowledge of the system
is available then a physics-informed manifold may be more appropriate.
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
8
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
system
data
standard DMD
physics-informed DMD
physics
(
matrix structure
)
cylinder flow
causal
(
upper triangular
)
conservative
(
unitary
)
Schrödinger equation
two-dimensional
advection
self-adjoint
(
Hermitian
)
spectrum
shift-equivariant
(
block circulant
)
0
01
1
Volterra integro-
differential equation
local
(
tri-diagonal
)
advection-diffusion
spectrum
shift-equivariant
(
block circulant
)
channel flow
0
0
0
0
t
t
= 0
t
= 0
t
= 12
x
t
= 0
AA
A
A
*
AA
*
A
A
AA
AA
t
ξ
ξ
ν
ξ
ξ
1
K
(
ξ, ν
) u (
ν,
t
) d
ν
ξ
ν
ψψ
(
t
)(
t
)
Figure 3.
A comparison of the models learned by exact DMD and piDMD for a range of applications. piDMD identifies the
spectrumofthetrueoperatorwithhigheraccuracythanexactDMD.Thestructureofthemodelmatricesisalsoillustrated;piDMD
models are generally more coherent than those learned by exact DMD. Details of problem sizes are given in the corresponding
sections in the electronic supplementary material. In the spectrum subplots, we plot the true eigenvalues as circles, the DMD
eigenvalues as triangles and the piDMD eigenvalues as crosses.
4. Applyingphysics-informeddynamicmodedecompositiontoenforcecanonical
physical structures
We now present a range of applications of piDMD. Each of the following sections investigates
a detailed application of piDMD for a specific physical principle. A summary is illustrated
in
figure 3
, and the degrees of freedom in each model, and associated computational cost, is
discussed in electronic supplementary material, F. The physical principles, corresponding matrix
structures, and optimal solutions and extensions are summarized below
§4a: Shift equivariant: circulant (and symmetric, skew-symmetric or unitary, electronic
supplementary material, B.1), low rank (electronic supplementary material, B.2), non-
equally spaced samples (electronic supplementary material, B.3), total least squares
(electronic supplementary material, B.4), Toeplitz & Hankel (electronic supplementary
material, B.5)
§4b: Conservative: unitary (§2b)
§4c: Self-adjoint: symmetric (§4c), skew-symmetric (electronic supplementary material, C.1),
symmetric in a subspace (§4c(i))
§4d: Local: tridiagonal (§4d), variable diagonals (electronic supplementary material, D.1),
periodic (electronic supplementary material, D.2), symmetric tridiagonal (electronic
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
9
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
supplementary material, D.3), total least squares (electronic supplementary material,
D.4), regularized locality (electronic supplementary material, D.5).
§4e: Causal: upper triangular (electronic supplementary material, E)
(a) Shift-equivariant systems
Shift equivariance (also called ‘translation equivariance’) is ubiquitous in the physical sciences.
Spatially homogeneous systems—such as constant coefficient PDEs or convolution operators—
are shift equivariant, and thus appear identical with respect to any (stationary) observer. In view
of Noether’s theorem [
74
], the conserved quantity related to shift equivariance is (generalized)
linear momentum; as such, incorporating shift equivariance into a DMD model means that the
model preserves linear momentum. While it may be unusual for practical engineering problems
to have a truly homogeneous direction, many problems of basic physical interest—such as wall-
bounded turbulent flow—frequently possess a spatially homogeneous dimension. In the sequel,
we derive a piDMD formulation for shift-equivariant operators.
We define
S
φ
as the
φ
-shift operator i.e.
S
φ
v
(
ξ
)
=
v
(
ξ
+
φ
) for all test functions
v
. We say that
a space-continuous linear operator
A
is shift equivariant if
A
commutes with the
φ
-shift operator
for all shifts:
S
φ
A
=
AS
φ
.
(4.1)
An application of the shift operator shows that if
A
is shift equivariant then the quantity e
λξ
A
e
λξ
is constant for all
ξ
. Thus, neglecting boundary conditions for now,
{
e
λξ
}
are eigenfunctions of
A
. If the domain is normalized to [
1, 1] and has periodic boundaries, then
λ
=
l
π
i
for integer
l
. In this case, the corresponding eigenfunctions form an orthogonal basis and the operator is
diagonalized by the known eigenfunctions.
In the following, we assume that we are studying a shift-equivariant system on a periodic
domain
ξ
[
1, 1]. We move from the continuous formulation to a discretized space and assume
that we are studying a function
u
(
ξ
,
t
) and have access to evenly spaced samples of
u
at
ξ
=
[
1,
1
+
,
...
,1
]. We discuss the case of non-equally spaced samples in electronic
supplementary material, B.3. If we define the state variable as
x
(
t
)
=
[
u
(
ξ
1
,
t
),
u
(
ξ
2
,
t
),
...
,
u
(
ξ
n
,
t
)]
then the discrete-space linear operator
A
that generates
x
(
t
) is diagonalized by its eigenvectors
A
=
F
diag(
ˆ
a
)
F
1
,
(4.2)
where
F
j
,
k
=
e
2
π
i(
j
1)(
k
1)
/
n
/
n
and
F
1
=
F
and
{
ˆ
a
j
}
are the unknown eigenvalues. Equation
(4.2) is equivalent to stating that
A
is circulant
A
=
a
0
a
n
1
...
a
1
a
1
a
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n
1
a
n
1
...
a
1
a
0
,
(4.3)
i.e.
A
j
,
k
=
a
(
j
k
)mod
n
. If the boundary conditions are instead Dirichlet or Neumann then
A
is not circulant but Toeplitz; we solve the corresponding Procrustes problem in electronic
supplementary material, B.5.
Substituting (4.2) into (1.2) and noting that the Frobenius norm is invariant to unitary
transformations allows (1.2) to be transformed to
arg min
ˆ
a
||
diag(
ˆ
a
)
X
Y
||
F
,
(4.4)
where
X
=
F
X
and
Y
=
F
Y
are the spatial discrete Fourier transforms of
X
and
Y
.Assuch,
X
and
Y
can be efficiently formed in
O
(
mn
log(
n
)) operations using fast Fourier transform (FFT)
methods [
75
].
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
10
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
The rows of the cost function (4.4) now decouple to produce
n
minimization problems:
arg min
ˆ
a
j
||
ˆ
a
j
̃
X
j
̃
Y
j
||
F
for 1
j
n
,
(4.5)
where
̃
X
j
and
̃
Y
j
are the
j
th rows of
X
and
Y
, respectively. An alternative view of this step is that
spatial wavenumbers decouple in periodic shift-equivariant systems, so we may analyse each
wavenumber individually. The coefficient
ˆ
a
j
represents the temporal growth of the
j
th Fourier
component. The optimal solution for each eigenvalue follows as:
ˆ
a
j
=
̃
Y
j
̃
X
j
.
(4.6)
In principle, standard DMD methods [
21
,
23
] could be encouraged to respect shift equivariance
by augmenting the data matrices with their shifted counterparts. However, one would strictly
need to include
n
shifts, thus producing
nm
samples which is usually too large in applications.
By contrast, the above solution is extremely efficient via the use of the FFT and the decoupling of
wavenumbers.
The electronic supplementary material includes alternative solutions for cases where
the system is shift equivariant and symmetric, skew-symmetric or unitary (electronic
supplementary material, B.1), low rank (electronic supplementary material, B.2), the samples
are not equally spaced (electronic supplementary material, B.3), the total least-squares case
(electronic supplementary material, B.4), and when the system is Toeplitz or Hankel (electronic
supplementary material, B.5).
(i) Example: plane channel flow
We now use this shift-equivariant piDMD to compute the modes of the linearized Navier–Stokes
equations from a single simulation. We consider the incompressible flow inside a plane channel
of size 2
π
×
2
×
2
π
along the spatial
x
,
y
and
z
coordinates that indicate the streamwise, wall-
normal and spanwise directions, respectively. The configuration considers a Reynolds number of
Re
=
2000 based on the channel half-height and the centreline velocity, and periodic boundary
conditions in the open faces of the channel; hence the flow is homogeneous in the
x
-and
z
-
directions. We use the spectral code
Channelflow
[
76
] to perform direct numerical simulations
(DNS) with the same numerical configurations as that presented in [
73
].
The dataset investigated is generated from a DNS of the response of the laminar flow with
parabolic velocity profile to a localized perturbation in the wall-normal velocity component of
the form
v
(
x
,
y
,
z
,0)
=

1
r
2
c
2
r
(cos(
π
y
)
+
1)e
(
r
2
/
c
2
r
y
2
/
c
2
y
)
,
(4.7)
where
r
2
=
(
x
π
)
2
+
(
z
π
)
2
, the parameters are set to
c
r
=
0.7 and
c
y
=
0.6, and the amplitude
of the perturbation was scaled to have an energy-norm of 10
5
to ensure that the effect of
nonlinearity is negligible. This initial condition, which was first studied in [
77
]andalsoin[
73
], is a
model of a disturbance that could be generated in experiments using a spanwise and streamwise
periodic array of axisymmetric jets injecting fluid perpendicular to the wall. The dataset obtained
from this simulation is then comprised of a sequence of 300 snapshots of the three-dimensional
velocity-perturbation field recorded every 0.5 time units. The data is corrupted with 2% Gaussian
noise.
Our aim is to learn the spectrum of the underlying linear operator that
best
describes the
evolution of the DNS snapshots, and to thus obtain a reliable predictive model for the dynamics.
When directly applying exact DMD to this dataset, the algorithm attempts to find global modes
for the three-dimensional velocity field and their spectrum. This is challenging because modes
associated with different streamwise and spanwise wavenumbers may be mixed, leading to
spurious eigenvalues, as shown in
figure 4
. However, we know that the flow is homogeneous
in the
x
and
z
coordinates, and is therefore shift-equivariant in these directions. Hence, we
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
11
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
shift-equivariant
directions
y
data : 200 000 states, 300 snapshots, 2% noise
initial condition
transient growth
decay to laminar
t
= 0
t
= 0
t
= 15
t
= 25
t
= 50
r
3
N
n
x
n
x
n
z
n
z
3
n
y
3
n
z
3
N

r
t
= 12
t
= 90
A =
A =
truth
DMD
piDMD
piDMD model
DMD model
x
z
learned spectrum
learned spectrum
0.02
0
–0.02
–0.04
–0.06
–0.08
0.02
0
–0.02
–0.04
–0.06
–0.08
–5
0
5
–5
0
5
test data
piDMD
DMD
predictions:
0.02
0
–0.02
–0.04
–0.06
–0.08
–0.1
024

Figure 4.
Learning the spectrum of the linearized Navier–Stokes equations from velocity measurements of the response to
a localized disturbance in a channel flow. piDMD embeds the shift-equivariant structure of the Navier–Stokes equations into
the model learning framework, thus learning the spectrum of the linearized operator with improved accuracy over exact DMD.
The training data are corrupted with 2% noise. The predictions of the learned DMD and piDMD modes are also compared. The
isosurfaces are surfaces of constant vertical velocity.
can leverage piDMD to incorporate this property, forcing the resulting matrix to respect, by
construction, the structure with three nested levels shown in
figure 4
. In practice, this amounts
to reshaping every snapshot, taking the FFT in the dimensions corresponding to
x
and
z
,
and performing exact DMD on the
y
-dependent Fourier amplitudes for every streamwise and
spanwise wavenumber tuple.
The spectra learned from both approaches are compared in
figure 4
, showing that piDMD
results in a far more accurate and complete eigenvalue spectrum. Importantly, piDMD accurately
identifies the most dynamically significant (least damped) eigenvalues, as illustrated in the
close-up plot. piDMD identifies three unstable eigenvalues, but the corresponding modes
are of low amplitude. By contrast, exact DMD identifies very few relevant eigenvalues
despite the model’s comparatively high rank of 200. As a result, the predictive ability of
exact DMD is poor while piDMD produces very accurate predictions of unseen test data
(bottom row).
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
12
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
(b) Conservative systems
Conservation laws are foundational to all areas of science. Identifying quantities that remain
constant in time—such as mass, momentum, energy, electric charge and probability—allow us to
derive universal governing equations, apply inductive reasoning, and understand basic physical
mechanisms. In this section, we demonstrate how conservation laws can be incorporated into the
DMD framework.
Suppose that we are studying a system that we know conserves energy. In applications of
DMD, it is implicitly assumed that measurements of the state have been suitably weighted so that
the square of the 2-norm corresponds to the energy of the state [
73
]:
E
(
x
)
=||
x
||
2
2
. In these variables,
the original optimization problems (1.1), (1.2) equate to finding the model
A
that minimizes the
energy of the errors between the true and predicted states (
y
k
and
Ax
k
, respectively). Thus,
if
A
represents a discrete-time linear dynamical system (
y
k
=
x
k
+
1
=
Ax
k
) then
A
is an energy
preserving operator if and only if
E
(
Ax
)
=||
Ax
||
2
2
=||
x
||
2
2
=
E
(
x
)forall
x
R
n
.
(4.8)
In words, (4.8) states that
A
does not change the energy of the system but merely redistributes
energy between the states. In matrix terminology, (4.8) holds if and only if
A
is
unitary
. Therefore,
for conservative systems, the optimization problem (1.2) is the orthogonal Procrustes problem
described in §2b, and the solution is given by (2.10). The eigenvalues of
A
lie on the unit circle,
and the eigenvectors are orthogonal. Thus, the eigenvectors oscillate in time, with no growth or
decay. Since the solution of the orthogonal Procrustes problem (2.10) requires the full SVD of an
n
×
n
matrix, it can be more computationally efficient to first project onto the leading POD modes
and build a model therein. In this case, the model is only energy preserving within the subspace
spanned by the leading POD modes.
Noise is a substantial problem for most DMD methods, and a common remedy is to phrase the
DMD optimization (1.1) as a total least-squares problem [
35
]. Perhaps surprisingly, the solution
to the orthogonal Procrustes problem (2.10) is also the solution to the total least-squares problem
when the solution is constrained to be orthogonal [
47
]. Thus, the solution (2.10) is statistically
optimal even when there is noise in both
X
and
Y
, as long as the noise has the same Gaussian
distribution in
X
and
Y
. The solutions obtained by energy-preserving piDMD and total least-
squares DMD [
78
] are often similar. Further, there are other optimizations closely related to the
orthogonal Procrustes problem including the case where
A
lies on the Steifel manifold (i.e. when
A
is rectangular with orthonormal columns [
52
]), there is a weighting matrix [
51
], missing data
[
46
] or high amounts of noise [
50
].
(i) Example:
Re
=
100 flow past a cylinder
We now apply this energy-preserving piDMD to study the flow past a cylinder, which is a
benchmark problem for modal decompositions [
24
]. The Reynolds number is
Re
=
DU
=
100,
where
D
is the diameter of the cylinder,
U
is the free-stream velocity and
ν
is the kinematic
viscosity. The data consist of 151 samples of vorticity measurements, corresponding to five
periods of vortex shedding, taken on a grid of 199
×
449 points. We note that the 2-norm of the
measurements is approximately constant in time. Note that the cylinder flow problem is only
approximately conservative due to the balance of the applied pressure and viscous loss; in reality,
the (numerical) energy of the state vector exhibits small oscillations. The data are contaminated
with 20% Gaussian noise, as illustrated in the top left panel of
figure 5
. We truncate the data to the
first 15 POD modes and learn a standard DMD model, and an energy-preserving piDMD model.
piDMD learns a more accurate representation of the leading-order dynamics than standard DMD.
For the high-order modes, the eigenvalues learned by DMD exhibits (well-known) spurious
damping [
34
] whereas the eigenvalues learned by piDMD are purely oscillatory and remain on
the imaginary axis. The trajectories of the POD coefficients are also more accurate for piDMD,
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
13
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
noisy training data
POD mode trajectories
time
mode 5
0
100
0
–100
40
20
10
0
0
–20
–10
–40
–0.05
–0.15
–0.10
–1.0
–0.5
0
0.5
1.0
mode 7
mode 1
10
–10
0
mode 9
010203040506070
truth
DMD
piDMD
learned spectra

Figure5.
LearningthespectrumoftheflowpastacylinderwithnoisymeasurementswithDMDandenergy-preservingpiDMD.
and do not exhibit the spurious energy loss caused by noise. Note that total least-squares DMD
exhibits comparable performance to piDMD in this problem.
(c) Self-adjoint systems
Self-adjoint systems are another important class of linear systems that arise frequently in solid
mechanics, heat and mass transfer, fluid mechanics and quantum mechanics. When studying
a system known to be self-adjoint, we can use piDMD to constrain our model
A
to lie in the
manifold of symmetric (Hermitian) matrices, such that
A
=
A
. Symmetric matrices have real
eigenvalues and, by the spectral theorem, are diagonalizable. This restriction places a significant
constraint on the learned model and can substantially reduce the sensitivity of DMD to noise.
The minimum-norm solution of the symmetric Procrustes problem [
79
]is
A
=
U
X
LU
X
,
(4.9)
where the entries of
L
are
L
i
,
j
=
L
j
,
i
=
σ
i
C
j
,
i
+
σ
j
C
i
,
j
σ
2
i
+
σ
2
j
if
σ
2
i
+
σ
2
j

=
0,
0
otherwise,
(4.10)
where
C
=
U
X
YV
X
and
X
=
U
X
Σ
X
V
X
is the SVD of
X
,and
σ
i
is the
i
th singular value. The
solution of the skew-symmetric case is similar (electronic supplementary material, C.1), and
algorithmic solutions are available when
A
is instead constrained to be positive definite [
48
]. For
large-scale systems, a low-rank approximation to
A
can be obtained by projecting the system onto
the first
r
principal components of
X
. This low-rank projection preserves the self-adjointness of
the model.
Note that the symmetric Procrustes problem is connected to the orthogonal Procrustes problem
(§4b) via the fact that every unitary matrix
U
can be expressed as the exponential of a Hermitian
matrix
H
as
U
=
exp(
i
H
).
(i) Learning energy states of the Schrödinger equation
Possibly the most famous self-adjoint operator is the quantum Hamiltonian,
ˆ
H
[
80
]. In quantum
mechanics, physical quantities of interest, such as position, momentum, energy and spin, are
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
14
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
DMD
(rank 15)
piDMD
(rank 15)
truth
learned energy levels
truth
DMD
piDMD
energy eigenstates
0
–1
1
0
10
6400 states, 400 snapshots,
10
G
aussian noise in
Y
data:
system:
(
t
)(
t
)
E
n
E
n
ΨΨ
Figure 6.
Learning the eigenstates of a quantum Hamiltonian of a hexagonal well with the self-adjoint piDMD.
represented by self-adjoint linear operators called ‘observables’, of which one example is the
Hamiltonian. The Hamiltonian describes the evolution of the probabilistic wave function via the
time-dependent Schrödinger equation
i
̄
h
d
d
t
|
Ψ
(
t
)
=
ˆ
H
|
Ψ
(
t
)
.
(4.11)
Solutions of (4.11) take the form of an eigenmode expansion
Ψ
(
ξ
,
t
)
=

j
=
1
α
j
ψ
j
(
ξ
)e
i
E
j
t
/
̄
h
,
(4.12)
where
E
j
is an energy level (eigenvalue) of
ˆ
H
with corresponding eigenstate
ψ
j
, the coefficients
α
j
are determined by the initial distribution of the wave function, and
ξ
is the spatial coordinate.
We consider a wave function evolving according to the two-dimensional Schrödinger’s
equation subject to an unknown Hamiltonian function. The unknown potential function is
taken to be a finite well in a hexagonal shape. We collect measurements of the wave function
and its velocity and train an exact DMD model and a piDMD model. In practice, empirical
measurements of the wave function can be obtained through optical tomographic methods
[
81
,
82
] or sequential measurements of complementary variables [
83
]. The data matrices
X
and
Y
are formed from measurements of
|
Ψ
(
t
)
and
i
̄
h
(d
/
d
t
)
|
Ψ
(
t
)
, respectively, which are themselves
formed by superposing eigenfunctions obtained by a finite difference method. Thus, the matrices
X
and
Y
are connected by an unknown self-adjoint Hamiltonian
ˆ
H
and we may apply our self-
adjoint piDMD optimization. The measurements of
Y
are contaminated with a large amount
(200%) of Gaussian noise but the measurements of
X
are clean.
Constraining the learned Hamiltonian to be self-adjoint reduces the sensitivity of the learning
process to noise, as evidenced by
figure 6
. From the bottom left panel, we see that the energy levels
of the piDMD model are physically consistent insofar as they are all real and positive. By contrast,
the energy levels learned by standard DMD exhibit a spurious imaginary component, which will
produce unrealistic growth/decay of the eigenstates. Both methods miss one eigenstate (
E
n
7.5)
since this particular mode is not well represented in the data. Additionally, the eigenstates
learned by piDMD are much less noisy than those learned by standard DMD. This can be
explained by noting that the eigenstates learned by piDMD lie in the span of
X
,whichis
clean, whereas the eigenstates learned by standard DMD lie in the span of
Y
, which is noisy.
In the electronic supplementary material, lemma C.1 proves that the symmetric piDMD model
(4.9) is less sensitive to noise than the exact DMD model (2.4). In summary, a physics-aware
approach enables a more accurate identification of quantum energy levels and the associated
eigenstates.
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023
15
royalsocietypublishing.org/journal/rspa
Proc.R.Soc.A
479
:20220576
..........................................................
(d) Spatially local systems
A hallmark of most physical systems is
spatial locality
. Points close to one another will usually
have a stronger coupling than points that are far from one another. For example, basic physical
processes such as advection and diffusion are spatially local. In most DMD applications, the data
are collected from a spatial grid, but the grid rarely plays a role in the DMD analysis beyond
forming the energy norm [
73
]. In fact, the output of the exact DMD algorithm remains invariant to
unitary transformations of the rows and columns of the data matrices, so that randomly shuffling
the rows (i.e. the spatial locations) will result in identical DMD models with the corresponding
shuffling of the rows of the modes [
62
,
84
]. Knowledge of the underlying grid from which the data
were collected enables us to bias the learning process to prioritize relationships that we expect to
have a strong local coupling.
Herein we consider the one-dimensional case; the analysis generalizes straightforwardly to
higher dimensions. We assume that the entries of
x
are samples of a function
u
taken at
n
grid
points
{
ξ
i
}
. The grid points can be arranged sequentially so that
ξ
i
i
+
1
. By the spatial locality
principle, we expect states that are spatially close to one another to have a stronger coupling than
states that are spatially far from one another. For example, we may expect entries close to the
diagonal of
A
to be larger than entries far from the diagonal. Then the entries of
A
would satisfy
|
A
i
,
j
|≥|
A
i
,
k
|
for
|
j
i
|
<
|
k
i
|
.
(4.13)
Equation (4.13) is merely a heuristic, and we would expect it to hold on average rather than for
each (
i
,
j
,
k
). Additionally, (4.13) is a difficult and expensive condition to implement in practice.
Instead, we consider an alternative version of spatial locality where we only permit coupling
between states that are sufficiently close
A
i
,
k
=
0for
d
<
|
k
i
|
.
(4.14)
Equation (4.14) describes a
d
-diagonal matrix. If
d
=
0 then states can only affect themselves and
A
is diagonal (comparable to (4.4)). A more interesting case is if
d
=
1 and we only allow coupling
between adjacent states. Then
A
is tridiagonal: the entries of
A
are zero except the leading, upper
and lower diagonals
A
=
β
1
γ
1
α
2
β
2
γ
2
α
3
.
.
.
.
.
.
.
.
.
.
.
.
γ
n
1
α
n
β
n
.
(4.15)
We now solve the optimization problem (1.2) when
M
is the manifold of tridiagonal matrices.
Electronic supplementary material, D includes many solutions for more general diagonal-like
structures including longer-range and variable coupling (electronic supplementary material, D.1),
spatially periodic local systems (electronic supplementary material, D.2), self-adjoint (electronic
supplementary material, D.3), total-least squares (electronic supplementary material, D.4) and
weaker local structures (electronic supplementary material, D.5). Since the rows of
A
are
decoupled, we can row-wise expand the Frobenius norm in (1.2) to obtain
n
smaller minimization
problems
arg min
α
i
,
β
i
,
γ
i
||
α
i
̃
x
i
1
+
β
i
̃
x
i
+
γ
i
̃
x
i
+
1
̃
y
i
||
2
for 1
i
n
,
(4.16)
with the convention
α
1
=
γ
n
=
0and
̃
x
0
=
̃
x
n
+
1
=
0
. Recall that
̃
x
i
and
̃
y
i
are the
i
th rows of
X
and
Y
, respectively (2.1). Each minimization problem (4.16) has the minimum-norm solution

α
i
β
i
γ
i

=
̃
y
i
̃
x
i
1
̃
x
i
̃
x
i
+
1
for 1
i
n
.
(4.17)
Each of the
n
minimizations costs
O
(
m
) so the total cost is
O
(
mn
).
Downloaded from https://royalsocietypublishing.org/ on 05 May 2023