of 6
IEEE CONTROL SYSTEMS LETTERS (L-CSS), PREPRINT VERSION, ACCEPTED DEC. 2020 (DOI: 10.1109/LCSYS.2020.3046529).
1
Neural Stochastic Contraction Metrics for
Learning-based Control and Estimation
Hiroyasu Tsukamoto
, Soon-Jo Chung
, and Jean-Jacques E. Slotine
Abstract
—We present Neural Stochastic Contraction Metrics
(NSCM), a new design framework for provably-stable robust
control and estimation for a class of stochastic nonlinear systems.
It uses a spectrally-normalized deep neural network to construct
a contraction metric, sampled via simplified convex optimiza-
tion in the stochastic setting. Spectral normalization constrains
the state-derivatives of the metric to be Lipschitz continuous,
thereby ensuring exponential boundedness of the mean squared
distance of system trajectories under stochastic disturbances. The
NSCM framework allows autonomous agents to approximate
optimal stable control and estimation policies in real-time, and
outperforms existing nonlinear control and estimation techniques
including the state-dependent Riccati equation, iterative LQR,
EKF, and the deterministic neural contraction metric, as illus-
trated in simulation results.
Index Terms
—Machine learning, Stochastic optimal control,
Observers for nonlinear systems.
I. I
NTRODUCTION
The key challenge for control and estimation of autonomous
aerospace and robotic systems is how to ensure optimality and
stability. Oftentimes, their motions are expressed as nonlinear
systems with unbounded stochastic disturbances, the time
evolution of which is expressed as It
ˆ
o stochastic differential
equations [1]. As their onboard computational power is often
limited, it is desirable to execute control and estimation
policies computationally as cheaply as possible.
In this paper, we present a Neural Stochastic Contraction
Metric (NSCM) based robust control and estimation frame-
work outlined in Fig. 1. It uses a spectrally-normalized neural
network as a model for an optimal contraction metric (differen-
tial Lyapunov function), the existence of which guarantees ex-
ponential boundedness of the mean squared distance between
two system trajectories perturbed by stochastic disturbances.
Unlike the Neural Contraction Metric (NCM) [2], where we
proposed a learning-based construction of optimal contrac-
tion metrics for control and estimation of nonlinear systems
with bounded disturbances, stochastic contraction theory [3]–
[5] guarantees stability and optimality in the mean squared
error sense for unbounded stochastic disturbances via convex
optimization. Spectral Normalization (SN) [6] is introduced in
the NSCM training, in order to validate a major assumption
in stochastic contraction that the first state-derivatives of the
metric are Lipschitz. We also extend the State-Dependent-
Coefficient (SDC) technique [7] further to include a target
Graduate Aerospace Laboratories, California Institute of Technology,
Pasadena, CA,
{
htsukamoto, sjchung
}
@caltech.edu
.
Nonlinear Systems Laboratory, Massachusetts Institute of Technology,
Cambridge MA,
jjs@mit.edu
.
Code: https://github.com/astrohiro/nscm
.
Fig. 1. Illustration of NSCM (
M
(
x
,
t
)
: optimal contraction metric;
x
i
and
M
i
:
sampled states and contraction metrics;
y
(
t
)
: measurements;
x
(
t
)
,
x
d
(
t
)
, and
ˆ
x
(
t
)
: actual, target, and estimated trajectories, respectively.
trajectory in control and estimation, for the sake of global
exponential stability of unperturbed systems.
In the offline phase, we sample contraction metrics by solv-
ing convex optimization to minimize an upper bound of the
steady-state mean squared distance of stochastically perturbed
system trajectories (see Fig. 1). Other convex objectives such
as control effort could be used depending on the application of
interest. We call this method the modified CV-STEM (mCV-
STEM), which differs from the original work [8] in the follow-
ing points: 1) a simpler stochastic contraction condition with
an affine objective function both in control and estimation,
thanks to the Lipschitz condition on the first derivatives of
the metrics; 2) generalized SDC parameterization, i.e.,
A
s.t.
A
(
x
,
x
d
,
t
)(
x
x
d
) =
f
(
x
,
t
) +
B
(
x
,
t
)
u
d
f
(
x
d
,
t
)
B
(
x
d
,
t
)
u
d
instead of
A
(
x
,
t
)
x
=
f
(
x
,
t
)
, for systems ̇
x
=
f
(
x
,
t
)+
B
(
x
,
t
)
u
,
which results in global exponential stability of unperturbed
systems even with a target trajectory,
x
d
for control and
x
for estimation; and 3) optimality in the contraction rate
α
and disturbance attenuation parameter
ε
. The second point is
in fact general, since
A
can always be selected based on the
line integral of the Jacobian of
f
(
x
,
t
) +
B
(
x
,
t
)
u
d
, a property
which can also be applied to the deterministic NCM setting
of [2]. We then train a neural network with the sampled metrics
subject to the aforementioned Lipschitz constraint using the
SN technique. Note that reference-independent integral forms
of control laws [9]–[13] could be considered by changing
how we sample the metrics in this phase. Our contraction-
based formulation enables larger contracting systems to be
built recursively by exploiting combination properties [14],
as in systems with hierarchical combinations (e.g. output
feedback or negative feedback), or to consider systems with
time-delayed communications [15].
In the online phase, the trained NSCM models are exploited
to approximate the optimal control and estimation policies,
which only require one neural network evaluation at each
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers
or lists, or reuse of any copyrighted component of this work in other works. Digital Object Identifier (DOI): 10.1109/LCSYS.2020.3046529
arXiv:2011.03168v4 [cs.LG] 3 Jan 2021
2
IEEE CONTROL SYSTEMS LETTERS (L-CSS), PREPRINT VERSION, ACCEPTED DEC. 2020 (DOI: 10.1109/LCSYS.2020.3046529).
time step as shown in Fig 1. The benefits of this framework
are demonstrated in the rocket state estimation and control
problem, by comparing it with the State-Dependent Riccati
Equation (SDRE) method [5], [7], Iterative LQR (ILQR) [16],
[17], EKF, NCM, and mCV-STEM.
Related Work:
Contraction theory [14] is an analytical
tool for studying the differential dynamics of a nonlinear
system under a contraction metric, whose existence leads to
a necessary and sufficient characterization of its exponential
incremental stability. The theoretical foundation of this paper
rests on its extension to stability analysis of stochastic non-
linear systems [3]–[5]. The major difficulty in applying it in
practice is the lack of general analytical schemes to obtain
a suitable stochastic contraction metric for nonlinear systems
written as It
ˆ
o stochastic differential equations [1].
For deterministic systems, there are several learning-based
techniques for designing real-time computable optimal Lya-
punov functions/contraction metrics. These include [2], [18],
[19], where neural networks are used to represent the optimal
solutions to the problem of obtaining a Lyapunov function.
This paper improves our deterministic NCM [2], as the NSCM
explicitly considers the case of stochastic nonlinear systems,
where deterministic control and estimation policies could fail
due to additional derivative terms in the differential of the
contraction metric under stochastic perturbation.
The CV-STEM [8] is derived to construct a contraction
metric accounting for the stochasticity in dynamical processes.
It is designed to minimize the upper bound of the steady-state
mean squared tracking error of stochastic nonlinear systems,
assuming that the first and second derivatives of the metric
with respect to its state are bounded. In this paper, we only
assume that the first derivatives are Lipschitz continuous,
thereby enabling the use of spectrally-normalized neural net-
works [6]. This also significantly reduces the computational
burden in solving the CV-STEM optimization problems, al-
lowing autonomous agents to perform both optimal control
and estimation tasks in real-time.
II. P
RELIMINARIES
We use
x
and
A
for the Euclidean and induced 2-
norm,
I
for the identity matrix,
E
[
·
]
for the expected value,
sym
(
A
) = (
A
+
A
T
)
/
2, and
A

0,
A

0,
A
0, and
A

0
for positive definite, positive semi-definite, negative definite,
and negative semi-definite matrices, respectively. Also,
f
x
is
the partial derivative of
f
(
x
,
t
)
respect to the state
x
, and
M
x
i
is of
M
(
x
,
t
)
with respect to the
i
th element of
x
,
M
x
i
x
j
is of
M
(
x
,
t
)
with respect to the
i
th and
j
th elements of
x
.
A. Neural Network and Spectral Normalization
A neural network is a mathematical model for representing
training samples
{
(
x
i
,
y
i
)
}
N
i
=
1
of
y
=
φ
(
x
)
by optimally tuning
its hyperparameters
W
`
, and is given as
y
i
=
φ
(
x
i
;
W
`
) =
T
L
+
1
σ
T
L
∗···∗
σ
T
1
(
x
i
)
(1)
where
T
`
(
x
) =
W
`
x
,
denotes composition of functions, and
σ
is an activation function
σ
(
x
) =
tanh
(
x
)
. Note that
φ
(
x
)
C
.
Spectral normalization (SN) [6] is a technique to overcome
the instability of neural network training by constraining (1)
to be globally Lipschitz, i.e.,
L
nn
0 s.t.
φ
(
x
)
φ
(
x
)
‖≤
L
nn
x
x
,
x
,
x
, which is shown to be useful in nonlinear
control designs [20]. SN normalizes the weight matrices
W
`
as
W
`
= (
C
nn
`
)
/
`
with
C
nn
0 being a given constant,
and trains a network with respect to
`
. Since this results
in
φ
(
x
)
φ
(
x
)
‖≤
C
L
+
1
nn
x
x
[6], setting
C
nn
=
L
1
/
(
L
+
1
)
nn
guarantees Lipschitz continuity of
φ
(
x
)
. In Sec. III-B, we
propose one way to use SN for building a neural network that
guarantees the Lipschitz assumption on
M
x
i
in Theorem 1.
B. Stochastic Contraction Analysis for Incremental Stability
Consider the following nonlinear system with stochastic
perturbation given by the It
ˆ
o stochastic differential equation:
dx
=
f
(
x
,
t
)
dt
+
G
(
x
,
t
)
d
W
(
t
)
,
x
(
0
) =
x
0
(2)
where
t
R
0
,
x
:
R
0
R
n
,
f
:
R
n
×
R
0
R
n
,
G
:
R
n
×
R
0
R
n
×
d
,
W
(
t
)
is a
d
-dimensional Wiener process,
and
x
0
is a random variable independent of
W
(
t
)
[21]. We
assume that 1)
L
1
>
0 s.t.
f
(
x
1
,
t
)
f
(
x
2
,
t
)
+
G
(
x
1
,
t
)
G
(
x
2
,
t
)
F
L
1
x
1
x
2
,
t
R
0
and
x
1
,
x
2
R
n
, and
2)
L
2
>
0, s.t.
f
(
x
1
,
t
)
2
+
G
(
x
1
,
t
)
2
F
L
2
(
1
+
x
1
2
)
,
t
R
0
and
x
1
R
n
for the sake of existence and uniqueness of
the solution to (2).
Theorem 1 analyzes stochastic incremental stability of two
trajectories of (2),
x
1
and
x
2
. In Sec. IV, we use it to find a
contraction metric
M
(
x
,
t
)
for given
α
,
ε
, and
L
m
, where
α
is
a contraction rate,
ε
is a parameter for disturbance attenuation,
and
L
m
is the Lipschitz constant of
M
x
i
. Note that
ε
and
L
m
are introduced for the sake of stochastic contraction and were
not present in the deterministic case [2]. Sec. IV-B2 delineates
how we select them in practice.
Theorem 1:
Suppose
g
1
,
g
2
[
0
,
)
s.t.
G
(
x
1
,
t
)
F
g
1
and
G
(
x
2
,
t
)
F
g
2
,
x
,
t
. Suppose also that
M
(
x
,
t
)

0 s.t.
M
x
i
,
x
i
is Lipschitz with respect to the state
x
, i.e.
M
x
i
(
x
,
t
)
M
x
i
(
x
,
t
)
‖≤
L
m
x
x
,
x
,
x
,
t
with
L
m
0. If
M
(
x
,
t
)

0
and
α
,
ε
,
ω
,
ω
(
0
,
)
are given by
̇
M
(
x
,
t
)+
2 sym
(
M
(
x
,
t
)
f
x
(
x
,
t
))+
α
g
I
−
2
α
M
(
x
,
t
)
(3)
ω
1
I

M
(
x
,
t
)

ω
1
I
,
x
,
t
(4)
where
α
g
=
L
m
(
g
2
1
+
g
2
2
)(
ε
+
1
/
2
)
, then the mean squared
distance between
x
1
and
x
2
is bounded as follows:
E
[
x
1
x
2
2
]
C
2
α
ω
ω
+
ω
E
[
V
(
x
(
0
)
,
δ
x
(
0
)
,
0
)]
e
2
α
t
.
(5)
where
V
(
x
,
δ
x
,
t
) =
δ
x
T
M
(
x
,
t
)
δ
x
and
C
= (
g
2
1
+
g
2
2
)(
2
/
ε
+
1
)
.
Proof:
Let us first derive the bounds of
M
x
i
and
M
x
i
x
j
.
Since
M
x
i
,
x
i
is Lipschitz, we have
M
x
i
x
j
‖≤
L
m
,
i
,
j
by
definition. For
h
0 and a unit vector
e
i
with 1 in its
i
th
element, the Taylor’s theorem suggests
ξ
,
ξ
+
R
n
s.t.
M
(
x
±
he
i
,
t
) =
M
(
x
,
t
)
±
M
x
i
(
x
,
t
)
h
+
M
x
i
x
i
(
ξ
±
,
t
)
h
2
/
2
.
(6)
This implies that
M
x
i
is bounded as
M
x
i
h
1
ω
1
+
L
m
h
/
2
2
L
m
ω
1
, where
h
=
2
/
(
L
m
ω
)
is substituted to
obtain the last inequality. Next, let
L
be the infinitesimal
H. TSUKAMOTO
et al.
: NEURAL STOCHASTIC CONTRACTION METRICS FOR LEARNING-BASED CONTROL AND ESTIMATION
3
differential generator [8]. Computing
L
V
using these bounds
as in [8] yields
L
V
δ
x
T
(
̇
M
+
2 sym
(
M f
x
)
)
δ
x
+(
g
2
1
+
g
2
2
)(
L
m
δ
x
2
/
2
+
2
2
L
m
ω
1
δ
x
+
ω
1
)
δ
x
T
(
̇
M
+
2 sym
(
M f
x
)+
α
g
I
)
δ
x
+
C
ω
1
(7)
where the relation 2
ab
ε
1
a
2
+
ε
b
2
, which holds for any
a
,
b
R
and
ε
>
0, is used with
a
=
2
/
ω
and
b
=
L
m
δ
z
to get the second inequality. This reduces to
L
V
≤−
2
α
V
+
C
ω
1
under the condition (3). The result (5) follows as in the
proof of Theorem 1 in [8].
Remark 1:
Note that there is a trade-off in using large
ε
in
Theorem 1, as it yields small
C
to decrease the steady-state
error in (5), but renders the constraint (3) tighter.
Lemma 1 is used to convexify the cost function in Sec. IV.
Lemma 1:
The inequalities (3) and (4) are equivalent to
[
̇
̄
W
+
2 sym
(
f
x
(
x
,
t
)
̄
W
)+
2
α
̄
W
̄
W
̄
W
ν
α
g
I
]

0
(8)
I

̄
W

χ
I
,
x
,
t
(9)
where
ν
=
1
/
ω
,
χ
=
ω
/
ω
, and
̄
W
=
ν
W
=
ν
M
1
.
Proof:
Multiplying both sides of (3) by
W

0 and then
by
ν
>
0 preserves matrix definiteness [22, pp. 114]. This
operation with Schur’s complement lemma [22, pp. 28] yield
(8). The rest follows the proof of Lemma 1 of [2].
Remark 2:
The variable conversion in Lemma 1 is necessary
to get a convex cost function (28) from the non-convex
cost (5) as
t
. In Sec. IV, we use it to derive a semi-
definite program in terms of
ν
,
χ
, and
̄
W
for finding a
contraction metric computationally efficiently [23]. We show
in Proposition 2 that this is equivalent to the non-convex
problem of minimizing (5) as
t
, subject to (3) and (4) in
terms of the original decision variables
ω
,
ω
, and
M
[8].
Finally, Lemma 2 introduces the generalized SDC form of
dynamical systems to be exploited also in Sec. IV.
Lemma 2:
Suppose that
f
(
x
,
t
)
and
B
(
x
,
t
)
are contin-
uously differentiable. Then
A
(
x
,
x
d
,
t
)
s.t.
A
(
x
,
x
d
,
t
)(
x
x
d
) =
f
(
x
,
t
) +
B
(
x
,
t
)
u
d
(
x
d
,
t
)
f
(
x
d
,
t
)
B
(
x
d
,
t
)
u
d
(
x
d
,
t
)
,
x
,
x
d
,
u
d
,
t
, and one such
A
is given as follows:
A
(
x
,
x
d
,
t
) =
1
0
̄
f
x
(
cx
+(
1
c
)
x
d
,
t
)
dc
(10)
where
̄
f
(
q
,
t
) =
f
(
q
,
t
) +
B
(
q
,
t
)
u
d
(
x
d
,
t
)
. We call
A
an SDC
form when it is constructed to satisfy controllability and
observability conditions (see Theorem 2 and Corollary 1).
Proof:
This follows from the integral relation given as
1
0
(
d
̄
f
(
cx
+(
1
c
)
x
d
,
t
)
/
dc
)
dc
=
̄
f
(
x
,
t
)
̄
f
(
x
d
,
t
)
.
III. N
EURAL
S
TOCHASTIC
C
ONTRACTION
M
ETRICS
This section illustrates how to construct an NSCM using
state samples
S
=
{
x
i
}
N
i
=
1
and stochastic contraction metrics
given by Theorem 1. This is analogous to the NCM [2], which
gives an optimal contraction metric for nonlinear systems with
bounded disturbances, but the NSCM explicitly accounts for
unbounded stochastic disturbances. For simplicity, we denote
the metric both for feedback control and estimation as
X
with
m
I

X

mI
, i.e.,
m
=
ω
1
,
m
=
ω
1
,
X
=
M
for control, and
m
=
ω
,
m
=
ω
,
X
=
W
for estimation.
A. Data Pre-processing
Since
X

0, where
X
is a contraction metric for control or
estimation, it has a unique upper triangular matrix
Y
R
n
×
n
with positive diagonal entries s.t.
X
=
Y
T
Y
[24, pp. 441]. We
use the nonzero entries of
Y
, denoted as
θ
(
x
,
t
)
R
n
(
n
+
1
)
/
2
,
for
y
i
of (1) to reduce its output dimension [2].
B. Lipschitz Condition and Spectral Normalization (SN)
We utilize SN in Sec. II-A to guarantee the Lipschitz
condition of Theorem 1 or Proposition 2 in Sec. IV.
Proposition 1:
Let
θ
(
x
;
W
sn
)
be a neural network (1) to
model
θ
(
x
,
t
)
in Sec. III-A, and
N
units
be the number of
neurons in its last layer. Also, let
W
sn
=
{
W
`
}
L
+
1
`
=
1
, where
W
`
=
(
`
/
`
)
C
nn
for 1
`
L
, and
W
`
=
m
(
`
/
`
)
/
N
units
for
`
=
L
+
1. If
C
nn
,
L
m
>
0 s.t.
2
θ
x
i
(
x
;
W
sn
)
θ
x
j
(
x
;
W
sn
)
(11)
+
2
θ
(
x
;
W
sn
)
‖‖
θ
x
i
x
j
(
x
;
W
sn
)
‖≤
L
m
,
i
,
j
,
x
,
then we have
X
‖≤
m
and
X
x
i
x
j
‖≤
L
m
,
x
i
,
x
j
, where
X
is the neural network model for the contraction metric
X
(
x
,
t
)
.
The latter inequality implies
X
x
i
,
i
is indeed Lipschitz
continuous with 2-norm Lipschitz constant
L
m
.
Proof:
Let
Y
be the neural net model of
Y
in Sec. III-A.
By definition of
X
=
Y
T
Y
and
θ
, where
X
is the contrac-
tion metric, we have
X
‖≤‖
Y
2
≤‖
Y
2
F
=
θ
2
. Thus,
the relation
θ
(
x
;
W
sn
)
‖≤
N
units
W
L
+
1
yields
X
‖≤
m
for
W
L
+
1
=
m
(
L
+
1
/
L
+
1
)
/
N
units
. Also, differentiating
X
twice yields
X
x
i
x
j
/
2
≤ ‖
Y
x
i
‖‖
Y
x
j
+
Y
‖‖
Y
x
i
x
j
‖ ≤
θ
x
i
‖‖
θ
x
j
+
θ
‖‖
θ
x
i
x
j
, where the second inequality is due
to
Y
‖≤‖
Y
F
=
θ
. Substituting
W
sn
gives (11).
Example 1:
To see how Proposition 1 works, let us consider
a scalar input/output neural net with one neuron at each
layer in (1). Since we have
θ
(
x
;
W
sn
)
‖≤‖
W
L
+
1
,
X

mI
is indeed guaranteed by
W
L
+
1
=
m
. Also, we can get
the bounds as
θ
x
(
x
;
W
sn
)
‖ ≤
mC
L
nn
and
θ
xx
(
x
;
W
sn
)
‖ ≤
W
L
+
1
C
L
nn
(
L
`
=
1
C
`
nn
) =
mC
L
+
1
nn
(
C
L
nn
1
)
/
(
C
nn
1
)
using
SN. Thus, (11) can be solved for
C
nn
by standard nonlinear
equation solvers, treating
m
and
L
m
as given constants.
Remark 3:
For non-autonomous systems, we can treat
t
or
time-varying parameters
p
(
t
)
as another input to the neural
network (1) by sampling them in a given parameter range of
interest. For example, we could use
p
= [
x
d
,
u
d
]
T
for systems
with a target trajectory. This also allows us to use adaptive
control techniques [25], [26] to update an estimate of
p
.
IV.
M
CV-STEM S
AMPLING OF
C
ONTRACTION
M
ETRICS
We introduce the modified ConVex optimization-based
Steady-state Tracking Error Minimization (mCV-STEM)
method, an improved version of CV-STEM [8] for sampling
the metrics which minimize an upper bound of the steady-state
mean squared tracking error via convex optimization.
Remark 4:
Due to its contraction-based formulation, com-
bination properties [14] also apply to the NSCM framework.
For example, contraction is preserved through hierarchical
combination of estimation and control (i.e. output feedback
control), or through time-delayed communications [15].
4
IEEE CONTROL SYSTEMS LETTERS (L-CSS), PREPRINT VERSION, ACCEPTED DEC. 2020 (DOI: 10.1109/LCSYS.2020.3046529).
A. Stability of Generalized SDC Control and Estimation
We utilize the general SDC parametrization with a target
trajectory (10), which captures nonlinearity through
A
(
x
,
x
d
,
t
)
or through multiple non-unique
A
i
[5], resulting in global
exponential stability if the pair
(
A
,
B
)
of (12) is uniformly
controllable [5], [7]. Note that
x
d
and
u
d
can be regarded as
extra inputs to the NSCM as in Remark 3, but we could use
Corollary 2 as a simpler formulation which guarantees local
exponential stability without using a target trajectory. Further
extension to control contraction metrics, which use differential
state feedback
δ
u
=
K
(
x
,
t
)
δ
x
[9]–[13], could be considered
for sampling the metric with global reference-independent
stability guarantees, achieving greater generality at the cost of
added computation. Similarly, while we construct an estimator
with global stability guarantees using the SDC form as in (22),
a more general formulation could utilize geodesics distances
between trajectories [4]. We remark that these trade-offs would
also hold for deterministic control and estimation design via
NCMs [2].
1) Generalized SDC Control:
Consider the following sys-
tem with a controller
u
R
m
and perturbation
W
(
t
)
:
dx
=(
f
(
x
,
t
)+
B
(
x
,
t
)
u
)
dt
+
G
c
(
x
,
t
)
d
W
(
t
)
(12)
dx
d
=(
f
(
x
d
,
t
)+
B
(
x
d
,
t
)
u
d
(
x
d
,
t
))
dt
(13)
where
B
:
R
n
×
R
0
R
n
×
m
,
G
c
:
R
n
×
R
0
R
n
×
d
,
W
(
t
)
is a
d
-dimensional Wiener process, and
x
d
:
R
0
R
n
and
u
d
:
R
n
×
R
0
R
m
denote the target trajectory.
Theorem 2:
Suppose
g
c
[
0
,
)
s.t.
G
c
(
x
,
t
)
F
g
c
,
x
,
t
,
and
M
(
x
,
x
d
,
t
)

0 s.t.
M
x
i
and
M
x
d
,
i
,
x
i
,
x
d
,
i
are Lipschitz
with respect to its state with 2-norm Lipschitz constant
L
m
.
Let
u
be designed as
u
=
u
d
(
x
d
,
t
)
B
(
x
,
t
)
T
M
(
x
,
x
d
,
t
)(
x
x
d
)
(14)
̇
M
+
2 sym
(
MA
)
2
MBB
T
M
+
α
gc
I
−
2
α
M
(15)
ω
1
I

M
(
x
,
x
d
,
t
)

ω
1
I
,
x
,
t
(16)
where
α
>
0,
α
gc
=
L
m
g
2
c
(
ε
+
1
/
2
)
,
ε
>
0, and
A
is given by
(10) in Lemma 2. If the pair
(
A
,
B
)
is uniformly controllable,
we have the following bound for the systems (12) and (13):
E
[
x
x
d
2
]
C
c
2
α
χ
+
ω
E
[
V
(
x
(
0
)
,
x
d
(
0
)
,
δ
q
(
0
)
,
0
)]
e
2
α
t
(17)
where
V
(
x
,
x
d
,
δ
q
,
t
) =
δ
q
T
M
(
x
,
x
d
,
t
)
δ
q
,
C
c
=
g
2
c
(
2
/
ε
+
1
)
,
ν
=
1
/
ω
,
χ
=
ω
/
ω
, and
q
is the state of the differential system
with its particular solutions
q
=
x
,
x
d
. Further, (15) and (16)
are equivalent to the following constraints in terms of
ν
,
χ
,
and
̄
W
=
ν
W
=
ν
M
1
:
[
̇
̄
W
+
2 sym
(
A
̄
W
)
2
ν
BB
T
+
2
α
̄
W
̄
W
̄
W
ν
α
gc
I
]

0
(18)
I

̄
W

χ
I
,
x
,
t
.
(19)
where the arguments are omitted for notational simplicity.
Proof:
Using the SDC parameterization (10) given in
Lemma 2, (12) can be written as
dx
= (
̄
f
(
x
d
,
t
)+(
A
(
x
,
x
d
,
t
)
B
(
x
,
t
)
B
(
x
,
t
)
T
M
(
x
,
x
d
,
t
))(
x
x
d
))
dt
+
G
c
(
x
,
t
)
d
W
. This re-
sults in the following differential system,
dq
= (
̄
f
(
x
d
,
t
) +
(
A
(
x
,
x
d
,
t
)
B
(
x
,
t
)
B
(
x
,
t
)
T
M
)(
q
x
d
))
dt
+
G
(
q
,
t
)
d
W
, where
G
(
q
,
t
)
is defined as
G
(
q
=
x
,
t
) =
G
c
(
x
,
t
)
and
G
(
q
=
x
d
,
t
) =
0.
Note that it has
q
=
x
,
x
d
as its particular solutions. Since
f
x
,
g
1
, and
g
2
in Theorem 1 can be viewed as
A
(
x
,
x
d
,
t
)
B
(
x
,
t
)
B
(
x
,
t
)
T
M
(
x
,
x
d
,
t
)
,
g
c
, and 0, respectively, applying its
results for
V
=
δ
q
T
M
(
x
,
x
d
,
t
)
δ
q
gives (17) as in (5). The
constraints (18) and (19) follow from the application of
Lemma 1 to (15) and (16).
Remark 5:
For input non-affine nonlinear systems, we can
find
f
(
x
,
u
)
f
(
x
d
,
u
d
) =
A
(
x
,
u
,
t
)(
x
x
d
) +
B
(
x
,
u
,
t
)(
u
u
d
)
by Lemma 2 and use it in Theorem 2, although (14) has to be
solved implicitly as
B
depends on
u
in this case [12], [13].
2) Generalized SDC Estimation:
Consider the following
system and a measurement
y
(
t
)
with perturbation
W
1
,
2
(
t
)
:
dx
=
f
(
x
,
t
)
dt
+
G
e
(
x
,
t
)
d
W
1
(
t
)
(20)
ydt
=
h
(
x
,
t
)
dt
+
D
(
x
,
t
)
d
W
2
(
t
)
(21)
where
h
:
R
n
×
R
0
R
m
,
G
e
:
R
n
×
R
0
R
n
×
d
1
,
D
:
R
n
×
R
0
R
m
×
d
2
, and
W
1
,
2
(
t
)
are two independent Wiener
processes. We have an analogous result to Theorem 2.
Corollary 1:
Suppose
g
e
,
d
[
0
,
)
s.t.
G
e
(
x
,
t
)
F
g
e
and
D
(
x
,
t
)
F
d
,
x
,
t
. Suppose also that
W
(
ˆ
x
,
t
) =
M
(
ˆ
x
,
t
)
1

0 s.t.
W
x
i
,
x
i
is Lipschitz with respect to its state
with 2-norm Lipschitz constant
L
m
. Let
ν
=
1
/
ω
and
x
be
estimated as
d
ˆ
x
=
f
(
ˆ
x
,
t
)
dt
+
M
(
ˆ
x
,
t
)
C
L
(
ˆ
x
,
t
)
T
(
y
h
(
ˆ
x
,
t
))
dt
(22)
̇
W
+
2 sym
(
WA
C
T
L
C
)+
α
ge
I
−
2
α
W
(23)
ω
I

W
(
ˆ
x
,
t
)

ω
I
,
0
<
ν
3
ν
c
,
x
,
ˆ
x
,
t
(24)
where
α
,
ν
c
,
ε
>
0,
α
ge
=
α
e
1
+
ν
c
ω
α
e
2
,
α
e
1
=
L
m
g
2
e
(
ε
+
1
/
2
)
,
and
α
e
2
=
L
m
c
2
d
2
(
ε
+
1
/
2
)
. Also,
A
(
x
,
ˆ
x
,
t
)
and
C
(
x
,
ˆ
x
,
t
)
are
given by (10) of Lemma 2 with
(
f
,
x
,
x
d
,
u
d
)
replaced by
(
f
,
ˆ
x
,
x
,
0
)
and
(
h
,
ˆ
x
,
x
,
0
)
, respectively, and
C
L
(
ˆ
x
,
t
) =
C
(
ˆ
x
,
ˆ
x
,
t
)
.
If
(
A
,
C
)
is uniformly observable and
C
(
x
,
ˆ
x
,
t
)
‖≤
c
,
x
,
ˆ
x
,
t
,
then we have the following bound:
E
[
x
ˆ
x
2
]
C
e
2
α
+
1
ω
E
[
V
(
x
(
0
)
,
δ
q
(
0
)
,
0
)]
e
2
α
t
(25)
where
V
(
ˆ
x
,
δ
q
,
t
) =
δ
q
T
W
(
ˆ
x
,
t
)
δ
q
,
C
e
=
C
e
1
χ
+
C
e
2
χν
2
,
C
e
1
=
g
2
e
(
2
/
ε
+
1
)
,
C
e
2
=
c
2
d
2
(
2
/
ε
+
1
)
,
χ
=
ω
/
ω
, and
q
is the
state of the differential system with its particular solutions
q
=
ˆ
x
,
x
. Further, (23) and (24) are equivalent to the following
constraints in terms of
ν
,
ν
c
,
χ
, and
̄
W
=
ν
W
:
̇
̄
W
+
2 sym
(
̄
WA
ν
C
T
L
C
)+
να
e
1
I
+
ν
c
α
e
2
I
−
2
α
̄
W
(26)
I

̄
W

χ
I
,
0
<
ν
3
ν
c
,
x
,
ˆ
x
,
t
(27)
where the arguments are omitted for notational simplicity.
Proof:
The differential system of (20) and (22) is given
as
dq
=
f
(
x
,
t
) + (
A
(
x
,
ˆ
x
,
t
)
M
(
ˆ
x
,
t
)
C
L
(
ˆ
x
,
t
)
T
C
(
x
,
ˆ
x
,
t
))(
q
x
))
dt
+
G
(
q
,
t
)
d
W
, where
G
(
q
,
t
)
is defined as
G
(
q
=
x
,
t
) =
G
e
(
x
,
t
)
and
G
(
q
=
ˆ
x
,
t
) =
M
(
ˆ
x
,
t
)
C
(
ˆ
x
,
t
)
T
D
(
x
,
t
)
. Viewing
V
,
g
1
, and
g
2
in Theorem 1 as
V
=
δ
q
T
W
(
ˆ
x
,
t
)
δ
q
,
g
1
=
g
e
, and
g
2
=
c
d
/
ω
, (25) – (27) follow as in the proof of Theorem 2
due to
ν
3
=
ω
3
ν
c
and the contraction condition (23).
Note that (15) and (23) depend on their target trajectory,
i.e.,
x
d
for control and
x
for estimation. We can treat them
H. TSUKAMOTO
et al.
: NEURAL STOCHASTIC CONTRACTION METRICS FOR LEARNING-BASED CONTROL AND ESTIMATION
5
as time-varying parameters
p
(
t
)
in a given space during the
mCV-STEM sampling as in Remark 3. Alternatively, we could
use the following to avoid this complication.
Corollary 2:
Using predefined trajectories (e.g.
(
x
d
,
u
d
) =
(
0
,
0
)
for control or
x
=
0 for estimation) in Thm. 2 or Cor. 1
leads to local exponential stability of (12) or (22).
Proof:
This follows as in the proof of Thm. 2 [2].
B. mCV-STEM Formulation
The following proposition summarizes the mCV-STEM.
Proposition 2:
The optimal contraction metric
M
=
W
1
that
minimizes the upper bound of the steady-state mean squared
distance ((17) of Thm. 2 or (25) of Corr. 1 with
t
) of
stochastically perturbed system trajectories is found by the
following convex optimization problem:
J
CV
=
min
ν
>
0
,
ν
c
>
0
,
χ
R
,
̄
W

0
c
1
χ
+
c
2
ν
+
c
3
P
(
ν
,
ν
c
,
χ
,
̄
W
)
(28)
s.t. (18) & (19) for control, (26) & (27) for estimation
where
c
1
,
c
2
,
c
3
[
0
,
)
and
P
is an additional performance-
based convex cost (see Sec. IV-B1). The weight of
ν
,
c
2
, can
either be viewed as a penalty on the 2-norm of feedback gains
or an indicator of how much we trust the measurement
y
(
t
)
.
Note that
α
,
ε
, and
L
m
are assumed to be given in (28) (see
Sec. IV-B2 for how to handle
̇
̄
W
preserving convexity).
Proof:
For control (17), using
c
1
=
C
c
/
(
2
α
)
and
c
2
=
c
3
=
0 gives (28). We can set
c
2
>
0 to penalize excessively large
u
through
ν
sup
x
,
t
M
(
x
,
x
d
,
t
)
. Since we have
ν
>
0 and
1
χ
χ
3
, (25) as
t
can be bounded as
C
e
1
χ
+
C
e
2
χν
2
2
γ
1
3
3
C
e
1
(
3
C
e
1
3
2
γ
χ
+
C
e
2
3
2
γ
ν
)
3
.
(29)
Minimizing the right-hand side of (29) gives (28) with
c
1
=
3
C
e
1
/
3
2
γ
,
c
2
=
C
e
2
/
3
2
γ
, and
c
3
=
0. Finally, since
d
=
0
in (21) means
C
e
2
=
0 and no noise acts on
y
,
c
2
also indicates
how much we trust the measurement.
1) Choice of P
(
ν
,
ν
c
,
χ
,
̄
W
)
:
Selecting
c
3
=
0 in Propo-
sition 2 yields an affine objective function which leads to
a straightforward interpretation of its weights. Users could
also select
c
3
>
0 with other performance-based cost func-
tions
P
(
ν
,
ν
c
,
χ
,
̄
W
)
in (28) as long as they are convex.
For example, an objective function
x
i
S
u
2
=
x
i
S
‖−
B
(
x
i
,
t
)
T
M
(
x
i
,
t
)
x
i
2
x
i
S
B
(
x
i
,
t
)
2
x
i
2
ν
2
, where
S
is the
state space of interest, gives an optimal contraction metric
which minimizes the upper bound of its control effort.
2) Additional Parameters and
̇
̄
W:
We assumed
α
,
ε
, and
L
m
are given in Proposition 2. For
α
and
ε
, we perform a line
search to find their optimal values as will be demonstrated
in Sec. V. For
L
m
, we guess it by a deterministic NCM [2]
and guarantee the Lipschitz condition by SN as explained in
Sec. III-B. Also, (28) can be solved as a finite-dimensional
problem by using backward difference approximation on
̇
̄
W
,
where we can then use
̄
W
 −
I
to obtain a sufficient
condition of its constraints, or solve it along pre-computed
trajectories
{
x
(
t
i
)
}
M
i
=
0
[2], [27]. The pseudocode to obtain the
NSCM depicted in Fig. 1 is given in Algorithm 1.
Algorithm 1:
NSCM Algorithm
Inputs :
States & parameters:
S
=
{
x
i
}
N
i
=
1
or
{
ˆ
x
i
}
N
i
=
1
&
T
=
{
p
i
}
M
i
=
1
(e.g.
p
=
t
,
[
x
d
,
u
d
]
T
, or
x
)
Outputs:
NSCM and
J
CV
in (28)
1. Sampling of Optimal Contraction Metrics
Find
L
m
in Thm. 1 using a deterministic NCM [2]
for
(
α
,
ε
)
A
LS
(A
LS
is a search set )
do
Solve (28) of Prop. 2 using
x
,
p
or ˆ
x
,
p
in
S
&
T
Save the optimizer
(
ν
,
χ
,
{
̄
W
i
}
N
i
=
1
)
and optimal
value
J
(
α
,
ε
) =
c
1
χ
+
c
2
ν
+
c
3
P
(
ν
,
ν
c
,
χ
,
̄
W
)
Find
(
α
,
ε
) =
arg min
(
α
,
ε
)
A
LS
J
and
J
CV
=
J
(
α
,
ε
)
Obtain
(
ν
(
α
,
ε
)
,
χ
(
α
,
ε
)
,
{
̄
W
i
(
α
,
ε
)
}
N
i
=
1
)
2. Spectrally-Normalized Neural Network Training
Pre-process data as in Sec. III-A
Split data into a train set
S
train
and test set
S
test
for
epoch
1
to
N
epochs
do
for
s
S
train
do
Train a neural network using SGD with the
Lipschitz condition on
X
x
i
as in Prop. 1
Compute the test error for data in
S
test
if
test error is small enough
then
break
Fig. 2. Rocket model (angle of attack
φ
, pitch rate
q
).
V. N
UMERICAL
I
MPLEMENTATION
E
XAMPLE
We demonstrate the NSCM on a rocket autopilot prob-
lem (https://github.com/astrohiro/nscm
). CVXPY [28] with the
MOSEK solver [29] is used to solve convex optimization.
A. Simulation Setup
We use the nonlinear rocket model in Fig. 2 [30], assuming
q
and specific normal force are available via rate gyros and
accelerometers. We use
G
c
= (
6
.
0e–2
)
I
n
,
G
e
= (
3
.
0e–2
)
I
n
, and
D
= (
3
.
0e–2
)
I
m
for perturbation in the NSCM construction.
The Mach number is varied linearly in time from 2 to 4.
B. NSCM Construction
We construct NSCMs by Algorithm 1. For estimation, we
select the Lipschitz constant on
X
x
i
to be
L
m
=
0
.
50 (see
Sec. IV-B2). The optimal
α
and
ε
,
α
=
0
.
40 and
ε
=
3
.
30,
are found by line search in Fig. 3. A neural net with 3 layers
and 100 neurons is trained using
N
=
1000 samples, where its
SN constant is selected as
C
nn
=
0
.
85 as a result of Proposi-
tion 1. We use the same approach for the NSCM control and
the resultant design parameters are given in Table I. Figure 4
implies that the NSCMs indeed satisfy the Lipschitz condition
with its prediction error smaller than 0
.
08 thanks to SN.
VI. D
ISCUSSION AND
C
ONCLUDING
R
EMARKS
We compare the NSCM with the SDRE [7], ILQR [16],
[17], EKF, NCM [2], and mCV-STEM. As shown in Fig. 5,