Online Robust Control in Large Uncertainty
Online Robust Control of Nonlinear Systems
with Large Uncertainty
1
Dimitar Ho
dho@caltech.edu
Caltech, Department of Control Dynamical Systems
Hoang M. Le
hoang.le@microsoft.com
Microsoft Research
John C. Doyle
doyle@caltech.edu
Caltech, Department of Control Dynamical Systems
Yisong Yue
yyue@caltech.edu
Caltech, Department of Computing Mathematical Sciences
Abstract
Robust control is a core approach for controlling systems with performance guarantees
that are robust to modeling error, and is widely used in real-world systems. However,
current robust control approaches can only handle small system uncertainty, and thus
require significant effort in system identification prior to controller design. We present an
online approach that robustly controls a nonlinear system under large model uncertainty.
Our approach is based on decomposing the problem into two sub-problems, “robust control
design” (which assumes small model uncertainty) and “chasing consistent models”, which
can be solved using existing tools from control theory and online learning, respectively. We
provide a learning convergence analysis that yields a finite mistake bound on the number
of times performance requirements are not met and can provide strong safety guarantees,
by bounding the worst-case state deviation. To the best of our knowledge, this is the first
approach for online robust control of nonlinear systems with such learning theoretic and
safety guarantees. We also show how to instantiate this framework for general robotic
systems, demonstrating the practicality of our approach.
1. Introduction
We study the problem of online control for nonlinear systems with large model uncertainty,
under the requirement to provide upfront control-theoretic guarantees for the worst case
online performance; by large uncertainty, we mean to say that we are given an arbitrarily large
set of potential models, of which an
unknown few
are exact descriptions of the true system
dynamics. Algorithms with such capabilities can enable us to (at least partially) sidestep
undertaking laborious system identification tasks prior to robust controller design. Motivated
by real-world control applications, we formulate a class of problems which allows to address in
a unified way common control problems such as stabilization, tracking, disturbance rejection,
robust set invariance, etc.. We introduce this as
online control
with
mistake guarantees
(OC-MG): We define a problem instance by specifying a desired system behavior and search
for online control algorithms which can quantify, in terms of number of
mistakes
, how often
1. This is an extended version of a conference paper that appeared in AISTATS 2021.
1
arXiv:2103.11055v1 [math.OC] 19 Mar 2021
Ho, Le, Yue and Doyle
the online controlled system could deviate from this behavior in the worst-case (i.e: worst
possible scenario of true system dynamics, disturbances, noise, etc.).
We propose a modular framework for OC-MG: Use robust control to design a
robust
oracle
π
, use online learning to design an algorithm
SEL
which
chases consistent models
, and
fuse them together via a simple meta-algorithm; the end result is Algorithm 1, which we refer
to as
A
π
(
SEL
)
. Our approach is based on decomposing the original problem into the two
independent sub-problems "robust oracle design" (ROD) and "consistent models chasing"
(CMC) which for many problem instances can be readily addressed with existing tools from
control theory (see Section 3) and online learning (see Section 4). We demonstrate in Section
5, that for general robotic systems, we can solve CMC via competitive convex body chasing
(Bubeck et al., 2020; Argue et al., 2019, 2020; Sellke, 2020) and ROD using well-known
robust control methods (Zhou and Doyle, 1998; Spong and Vidyasagar, 1987; Spong, 1992a;
Freeman and Kokotovic, 2008; Mayne et al., 2011). Once suitable subroutines
π
and
SEL
are
selected, we can provide online performance guarantees for the resulting control algorithm
A
π
(
SEL
)
which hold in the large uncertainty setting:
•
Mistake guarantee:
A worst case bound on the total number of times desired system
behavior is violated. See Theorem 2.9, 2.10, 2.13.
•
Safety guarantee:
A worst case norm bound on the state-trajectory. See Theorem 2.12.
To provide the above guarantees,
π
and
SEL
have to be solutions to a corresponding ROD
and CMC sub-problem. In Section 3 and Section 4 we discuss that in many problem settings
this is not a restrictive assumption. In particular, assuming that ROD can be solved, merely
ensures that the overall OC-MG problem is well-posed: The underlying control problem (i.e:
assuming no uncertainty) has to be tractably solvable with robust control; It is clear that
this is a bare minimum requirement to state a meaningful OC-MG problem. In Section 4 we
discuss different versions of the CMC sub-problem and present a reduction to nested convex
body chasing (Bubeck et al., 2020) which is applicable for a large class of systems.
In Section 6, we follow our approach to design a high-performing control algorithm for a
difficult nonlinear adaptive control problem: swinging-up a cartpole with large parametric
uncertainty
and
state constraints. We benchmark the performance of the online algorithm
A
π
(
SEL
)
against the offline optimal algorithm over 900 problem settings (adversarially chosen
system parameters, noise, disturbances) and show that
A
π
(
SEL
)
performs only marginally
worse than the optimal offline controller, which has access to the true system model.
Our framework reveals a promising connection between online learning and robust control
theory which enables systematic and modular design of robust learning and control algorithms
with provided safety and performance guarantees in the large uncertainty setting. To the
best of our knowledge, this is the first time that a fundamental connection between the fields
of online learning and control theory has ever been discovered in this context.
1.1 Problem Statement
Consider controlling a discrete-time
nonlinear dynamical system
with system equations:
x
t
+1
=
f
∗
(
t,x
t
,u
t
)
, f
∗
∈F
,
(1)
2
Online Robust Control in Large Uncertainty
where
x
t
∈X
and
u
t
∈U
denote the system state and control input at time step
t
and
X×U
denotes the state-action space. We assume that
f
∗
is an
unknown
function and that we only
know of an
uncertainty set
F
which contains the true
f
∗
.
Large uncertainty setting.
We impose no further assumptions on
F
and explicitly
allow
F
to represent arbitrarily large model uncertainties.
Control objective.
The control objective is specified as a sequence
G
= (
G
0
,
G
1
,...
)
of
binary cost functions
G
t
:
X×U 7→{
0
,
1
}
, where each function
G
t
encodes a desired condition
per time-step
t
:
G
t
(
x
t
,u
t
) = 0
means the state
x
t
and input
u
t
meet the requirements at
time
t
.
G
t
(
x
t
,u
t
) = 1
means that some desired condition is violated at time
t
and we will
say that the system made a
mistake
at
t
. The performance metric of system trajectories
x
:= (
x
0
,x
1
,...
)
and
u
:= (
u
0
,u
1
,...
)
is the sum of incurred cost
G
t
(
x
t
,u
t
)
over the time
interval
[0
,
∞
)
and we denote this the
total number of mistakes
:
# mistakes of
x
,
u
=
∞
∑
t
=0
G
t
(
x
t
,u
t
)
.
(2)
For a system state-input trajectory
(
x
,
u
)
to achieve an objective
G
, we want the above
quantity to be finite, i.e.: eventually the system stops making mistakes and meets the
requirements of the objective for all time.
Algorithm design goal.
The goal is to design an online decision rule
u
t
=
A
(
t,x
t
,...,x
0
)
such that
regardless
of the unknown
f
∗
∈F
, we are guaranteed to have finite or even explicit
upper-bounds on the total number of mistakes
(2)
of the online trajectories. Thus, we require
a strong notion of robustness:
A
can control any system
(1)
with the certainty that the
objective
G
will be achieved after finitely many mistakes. It is suitable to refer to our problem
setting as
online control
with
mistake guarantees
.
1.2 Motivation and related work
How can we design control algorithms for dynamical systems with strong guarantees without
requiring much a-priori information about the system?
This question is of particular interest in safety-critical settings involving real physical systems,
which arise in engineering domains such as aerospace, industrial robotics, automotive, energy
plants (Vaidyanathan et al., 2016). Frequently, one faces two major challenges during control
design: guarantees and uncertainty.
Guarantees.
Control policies can only be deployed if it can be certified in advance that
the policies will meet desired performance requirements online. This makes the mistake
guarantee w.r.t. objective
G
a natural performance metric, as
G
can incorporate control
specifications such as tracking, safety and stability that often arise in practical nonlinear
dynamical systems. The online learning for control literature mostly focused on linear systems
or linear controllers (Dean et al., 2018; Simchowitz et al., 2018; Hazan et al., 2020; Chen and
Hazan, 2020), with some emerging work on online control of nonlinear systems. One approach
is incorporate stability into the neural network policy as part of RL algorithm (Donti et al.,
2021). Alternatively, the parameters of nonlinear systems can be transformed into linear
space to leverage linear analysis (Kakade et al., 2020; Boffi et al., 2020). These prior work
focus on sub-linear regret bound, which is not the focus of our problem setup. We note that
3
Ho, Le, Yue and Doyle
regret is not necessarily the only way to measure performance. For example, competitive ratio
is an alternative performance metric for online learning to control (Goel and Wierman, 2019;
Shi et al., 2020). In addition, our mistake guarantee requirement is stricter than the no-regret
criterion and more amenable to control-theoretic guarantees. Specifically, (fast) convergence
of
1
T
∑
T
t
=0
G
t
(
x
t
,u
t
)
→
0
does not imply the total number of mistakes
∑
∞
t
=0
G
t
(
x
t
,u
t
)
is
bounded. We provide additional discussion on the large and growing literature on learning
control for linear systems, as well as adaptive control techniques from control community in
Appendix C.
Large uncertainty.
Almost always, the dynamics of the real system are not known
exactly and one has to resort to an approximate model. The most common approach in
online learning for control literature (Dean et al., 2017) is to perform system identification
(Ljung, 1999), and then use tools from robust control theory (Zhou and Doyle, 1998).
Robust controller synthesis can provide policies with desired guarantees, if one can obtain an
approximate model which is “provably close enough” to the real system dynamics. However,
estimating a complex system to a desired accuracy level quickly becomes intractable in
terms of computational and/or sample complexity. In the adversarial noise setting, system
identification of simple linear systems with precision guarantees can be NP-hard (Dahleh
et al., 1993). General approaches for nonlinear system identification with precision guarantees
are for the most part not available (recently Mania et al. (2020) analyzed sample complexity
under stochastic noise).
1.3 Overview of our approach
An alternative to the pipeline SysID
→
Control: start with rough models, learn
to control online.
While accurate models of real systems are hard to obtain, it is often
easy to provide more qualitative or rough models of the system dynamics
without
requiring
system identification. Having access to a rough system description, we aim to learn to control
the real system from online data and provide control-theoretic guarantees on the online
performance in advance.
Rough models as compactly parametrizable uncertainty sets.
In practice, we
never have the exact knowledge of
f
∗
in advance. However, for engineering applications
involving physical systems, the functional form for
f
∗
can often be derived through first
principles and application-specific domain knowledge. Conceptually, we can view the unknown
parameters of the functional form as conveying both the ‘modeled dynamics’ and ‘unmodeled
(adversarial) disturbance’ components of the ground truth
f
∗
in the system
x
t
+1
=
f
∗
(
t,x
t
,u
t
)
.
It is almost always the case, that we can represent the uncertainty in
f
∗
via a collection of
parameters in bounded ranges. How we choose to parametrize a given uncertainty set
F
is
not unique and poses a design choice.
We will take this as the starting point for our approach and assume a fixed parametriza-
tion of
F
in the form of a tuple
(
T
,
K
,d
)
, where
(
K
,d
)
is a compact metric space, called
parameter space
, and
T
is a map
K
7→
2
F
which defines a collection of models
{
T
[
θ
]
|
θ
∈
K
}
which represents a cover of the uncertainty set
F
. We define this formally as a compact
parameterization of
F
:
Definition 1.1.
A tuple
(
T
,
K
,d
)
, where
T
:
K
7→
2
F
is a
compact parametrization
of
F
, if
(
K
,d
)
is a compact metric space and
F ⊂
⋃
θ
∈
K
T
[
θ
]
.
4
Online Robust Control in Large Uncertainty
We will work with candidate parameters
θ
∈
K
of the system and consider a
θ
∗
to be a
true
parameter
of
f
∗
, if
f
∗
∈
T
[
θ
∗
]
. Ideally, each candidate model
T
[
θ
]
has small uncertainty; the
precise notion of "small uncertainty" however is problem specific and depends always on the
objective.
For concreteness, we give several simple examples of common parameter spaces
K
:
1.
Linear time-invariant system
: linear system with matrices
A
,
B
perturbed by bounded
disturbance sequence
w
∈
`
∞
,
‖
w
‖
∞
≤
η
:
f
∗
(
t,x,u
) =
Ax
+
Bu
+
w
t
.
(3)
The parameter space
K
contains bounded intervals describing parameters
θ
= (
A,B,η
)
.
2.
Nonlinear system, linear parametrization
: nonlinear system, where dynamics are a
weighted sum of nonlinear functions
ψ
i
perturbed by bounded disturbance sequence
w
∈
`
∞
,
‖
w
‖
∞
≤
η
:
f
∗
(
t,x,u
) =
M
∑
i
=1
a
i
ψ
i
(
x,u
) +
w
t
.
(4)
K
contains bounded intervals describing
θ
= (
{
a
i
}
,η
)
.
3.
Nonlinear system, nonlinear parametrization
: nonlinear system, with function
g
parametrized by fixed parameter vector
p
∈
R
m
(e.g., neural networks), perturbed by
bounded disturbance sequence
w
∈
`
∞
,
‖
w
‖
∞
≤
η
:
f
∗
(
t,x,u
) =
g
(
x,u
;
p
) +
w
t
.
(5)
K
contains bounded intervals describing
θ
= (
p,η
)
.
In these examples, the uncertainty set
F ⊂∪
θ
∈
K
T
[
θ
]
is covered by models
T
[
θ
]
with smaller
uncertainty of the form
T
[
θ
] =
{
t,x,u
7→
f
θ
(
x,u,w
t
)
|‖
w
‖
∞
≤
η
}
, where
f
θ
denotes one of
the functional forms on the right-hand side of eq. (3), (4) or (5).
Online robust control algorithm.
Given a compact parametrization
(
T
,
K
,d
)
for the
uncertainty set
F
, we design meta-algorithm
A
π
(
SEL
)
(Algorithm 1) that controls the system
(1) online by invoking two sub-routines
π
and
SEL
in each time step.
•
Consistent model chasing.
Procedure
SEL
receives a finite data set
D
, which contains
state and input observations, and returns a parameter
θ
∈
K
.
Design goal
: For each time
t
, the procedure
SEL
should select
θ
t
such that the set of
models
T
[
θ
t
]
stays “consistent” with
D
t
, i.e., candidate models in
T
[
θ
t
]
can
explain
the
past data. Moreover posited parameters
θ
t
should only change when necessary: two
posited parameters
θ
t
and
θ
t
′
should not be very different from each other, if both data
sets
D
t
and
D
t
′
contain “similar” amount information.
•
Robust oracle.
Procedure
π
receives a posited system parameter
θ
∈
K
as input and
returns a control policy
π
[
θ
] :
N
×X 7→U
which can be evaluated at time
t
to compute
a control action
u
t
=
π
[
θ
](
t,x
t
)
based on the current state
x
t
.
Design goal:
We require that
π
represents a robust control design subroutine for the
collection of models
T
, in the sense that policy
π
[
θ
]
could provide mistake guarantees
for
G
which are robust to bounded noise
if
the uncertainty set
F
were
T
[
θ
]
.
5
Ho, Le, Yue and Doyle
Algorithm 1
Meta-Implementation of
A
π
(
SEL
)
for (OC-MG)
Require:
procedures
π
and
SEL
Initialization:
D
0
←{}
,
x
0
is set to initial condition
ξ
0
1:
for
t
= 0
,
1
,...
to
∞
do
2:
D
t
←
append
(
t,x
t
,x
t
−
1
,u
t
−
1
)
to
D
t
−
1
(if
t
≥
1
)
.
update online history of observations
3:
θ
t
←
SEL
[
D
t
]
.
present online data to
SEL
, get posited parameter
θ
t
4:
u
t
←
π
[
θ
t
](
t,x
t
)
.
query
π
for policy
π
[
θ
t
]
and evaluate it
5:
x
t
+1
←
f
∗
(
t,x
t
,u
t
)
.
system transitions with unknown
f
∗
to next state
6:
end for
Theoretical contribution.
Our main theoretical results certify safety- and finite mistake
guarantees for the online control scheme
A
π
(
SEL
)
if the sub-routines
π
and
SEL
meet the
design requirement for “robust oracle” and “consistent model chasing” for a given uncertainty
set
F
and objective
G
. We will clarify the consistency and robustness requirements of the
sub-routines
π
and
SEL
in Section 2.1 and Section 2.2. For now, we present an informal
version of the finite mistake guarantees and worst-case state deviation for the online control
scheme
A
π
(
SEL
)
:
Theorem
(
Informal
)
.
For any (adversarial)
f
∗
∈ F
, the online control scheme
A
π
(
SEL
)
described in Algorithm 1 guarantees a-priori that the trajectories
x
,
u
will achieve the objective
G
after finitely many mistakes. The total number of mistakes
∑
∞
t
=0
G
t
(
x
t
,u
t
)
is at most
oracle performance
M
π
ρ
∗
Γ
1
(
size of uncertainty
F
efficiency of
SEL
∗
robustness margin
ρ
of
π
)
,
and the norm of the state
‖
x
t
‖
is at most
Γ
2
(
size of uncertainty
F
efficiency of
SEL
∗
single-step robustness margin of
π
,
‖
x
0
‖
)
,
for some increasing function
Γ
1
:
R
+
7→
R
+
and some function
Γ
2
:
R
+
7→
R
+
which is
increasing in the first argument and is linear in the second.
•
Performance of
π
: Assume the worst-possible
f
∗
∈F
, but also access to direct online
measurements
θ
t
=
θ
∗
+
v
t
of the a true parameter
θ
∗
with small noise
v
t
of size
ρ
;
M
π
ρ
denotes the worst-case #mistakes if we were to apply the almost ideal control law
u
t
=
π
[
θ
t
](
t,x
t
)
in this setting.
•
Efficiency of
SEL
: We quantify efficiency of
SEL
in the result through competitive
analysis of online algorithms. The procedure
SEL
posits parameters efficiently, if as
a function of time the parameter selection
θ
t
changes only when necessary, that is,
it only changes when new observations are informative and keeps a constant value
otherwise. We phrase this in terms of a
competitive ratio
γ
(with
γ
≥
1
) and distinguish
here between
γ
-competitive and
(
γ,T
)
-weakly competitive algorithms. The smaller
the constant
γ
is, the more efficient the algorithm posits parameters: As discussed in
Section 2.2, a smaller
γ
indicates that an online algorithm performs closer to the ideal
in-hindsight-optimal algorithm.
6
Online Robust Control in Large Uncertainty
Remark 1.
If the same procedure
π
serves as a robust oracle for a set of criteria
G
(1)
,
G
(2)
,
...
,
G
(
M
)
, then correspondingly the instantiation
A
π
(
SEL
)
provides multiple finite mistake
guarantees, i.e. one for each corresponding criteria
G
(
i
)
,
i
= 1
,...,M
.
This approach brings several attractive qualities:
•
Generality.
The result applies to a wide range of problem settings. The objective
G
and uncertainty set
F
serve as a flexible abstraction to represents a large class of
dynamical systems and control-theoretic performance objectives.
•
Robust guarantees in the large uncertainty setting.
Our result applies in settings where
only rough models are available. As an example, we can use the result to provide
guarantees in control settings with unstable nonlinear systems where stabilizing policies
are
not
known a-priori and which are subject to online adversarial disturbances.
•
Decoupling algorithm design for learning and control.
The construction of the “robust
oracle”
π
and the consistent model chasing procedure
SEL
can be addressed with existing
tools from control and learning. More generally, this perspective enables to decouple
for the first time learning and control problems in the large uncertainty setting into
separate robust control and online learning problems. See discussion in Section 3 and
Section 4.
•
Modular algorithm design for robust learning and control.
The above approach provides
a first interface between robust control and online learning, which enables a modu-
lar design of learning and control algorithms with versatile worst-case performance
guarantees against large model uncertainty.
•
A new tool for performance analysis of learning and control algorithms.
We can
view the above theorem also from an analysis point-of view: Many existing certainty-
equivalence based learning and control algorithms can be easily represented as an
instance of the meta-algorithm
A
π
(
SEL
)
and thus can be analyzed using the above
theorem.
Promising for design of efficient algorithms in practice.
Besides focusing on
providing worst-case guarantees in a general setting, empirical results show that our framework
is a promising approach to design efficient algorithms for learning and control in practice.
In Section 6, we apply our approach to the problem of swinging-up a cartpole with large
parametric uncertainty in a realistic and highly challenging setting and show that it achieves
consistently (over 900 experiments with different parameter settings) good performance.
2. Main result
As summarized in Algorithm 1, the main ingredients of our approach are a robust control
oracle
π
that returns a robust controller under posited system parameters, and an online
algorithm
SEL
that chases parameters sets that are consistent with the data collected so far.
Here we expand on the desired conditions of
π
and
SEL
.
2.1 Required conditions on procedure
π
Online control with oracle under idealized conditions.
Generally, procedure
π
is a
map
K
7→C
from parameter space
K
to the space
C
:=
{
κ
:
N
×X 7→U}
of all (non-stationary)
control policies of the form
u
t
=
κ
(
t,x
t
)
. A desired property of
π
as an
oracle
is that
π
7
Ho, Le, Yue and Doyle
ρ
-robust
∀
θ
∈
K
: sup
γ
≥
0
m
π
ρ
(
γ
;
θ
)
<
∞
uniformly
ρ
-robust
M
π
ρ
:= sup
γ
≥
0
,θ
∈
K
m
π
ρ
(
γ
;
θ
)
<
∞
locally
ρ
-robust
∀
γ
≥
0
, θ
∈
K
:
m
π
ρ
(
γ
;
θ
)
<
∞
locally uniformly
ρ
-robust
∀
γ
≥
0 :
M
π
ρ
(
γ
) := sup
θ
∈
K
m
π
ρ
(
γ
;
θ
)
<
∞
Table 1: Notions of oracle-robustness
returns controllers that satisfy
G
if the model uncertainty
were
small. In other words, if the
uncertainty set
F
were
contained in the set
T
[
θ
]
, then control policy
π
[
θ
]
could guarantee
to achieve the objective
G
with finite mistake guarantees. Further, in an idealized setting
where the true parameter
were
known exactly, the oracle should return policy such that the
system performance is robust to some level of bounded noise – which is a standard notion
of
robustness
. We make this design specification more precise below and discuss how to
instantiate the oracle in Section 3.
Idealized problem setting.
Let
θ
∗
be a parameter of the true dynamics
f
∗
, and assume
that online we have access to noisy observations
θ
= (
θ
0
,θ
1
,...
)
, where each measurement
θ
t
is
ρ
-close to
θ
∗
, under metric
d
. The online control algorithm queries
π
at each time-step
and applies the corresponding policy
π
[
θ
t
]
. The resulting trajectories obey the equations:
x
t
+1
=
f
∗
(
t,x
t
,u
t
)
, u
t
=
π
[
θ
t
](
t,x
t
)
(6a)
θ
t
s.t.:
d
(
θ
t
,θ
∗
)
≤
ρ,
where
f
∗
∈
T
[
θ
∗
]
(6b)
To facilitate later discussion, define the set of all feasible trajectories of the dynamic equations
(6) as the
nominal trajectories
S
I
[
ρ
;
θ
]
of the oracle:
Definition 2.1.
For a time-interval
I
= [
t
1
,t
2
]
⊂
N
and fixed
θ
∈
K
, let
S
I
[
ρ
;
θ
]
denote the
set of all pairs of finite trajectories
x
I
:= (
x
t
1
,...,x
t
2
)
,
u
I
:= (
u
t
1
,...,u
t
2
)
which for
θ
∗
=
θ
,
satisfy conditions (6) with some feasible
f
∗
and sequence
(
θ
t
1
,...,θ
t
2
)
.
Design specification for oracles.
We will say that
π
is
ρ
-robust for some objective
G
,
if all trajectories in
S
I
[
ρ
;
θ
]
achieve
G
after finitely many mistakes. We distinguish between
robustness and uniform robustness, which we define precisely below:
Definition 2.2
(robust oracle)
.
Equip
X
with some norm
‖‖
. For each
ρ,γ
≥
0
and
θ
∈
K
,
define the quantity
m
π
ρ
(
γ
;
θ
)
as
m
π
ρ
(
γ
;
θ
) :=
sup
I
=[
t,t
′
] :
t<t
′
sup
(
x
I
,u
I
)
∈S
I
[
ρ
;
θ
]
,
‖
x
0
‖≤
γ
∑
t
∈I
G
t
(
x
t
,u
t
)
If
m
π
0
(
γ
;
θ
)
<
∞
for all
γ
≥
0
,
θ
∈
K
, we call
π
an
oracle
for
G
w.r.t. parametrization
(
T
,
K
,d
)
.
In addition, we say an oracle
π
is (locally) (uniformally)
ρ
-robust, if the corresponding property
shown in Table 3 holds. If it exists,
M
π
ρ
is the
mistake constant
/
function
of
π
.
The constant
ρ >
0
will be referred to as the robustness margin of
π
. If we use the above
terms without referencing
ρ
, it should be understood that there exist some
ρ >
0
for which
the corresponding property is feasible. The mistake constant
M
π
ρ
can be viewed as a robust
8
Online Robust Control in Large Uncertainty
offline benchmark: It quantifies how many mistakes we would make in the worst-case, if we
could use the oracle
π
under idealized conditions, i.e., described by (6).
Invariance Property.
On top of
ρ
-robustness, for some results, we will require the
following additional condition from the oracle:
Definition 2.3.
For a fixed objective
G
, define the set of admissable states at time
t
as
X
t
=
{
x
|∃
u
′
:
G
t
(
x,u
′
) = 0
}
, i.e.: the set of states for which it is possible to achieve zero
cost at time step
t
. We call a
ρ
-robust oracle
π
cost invariant
, if for all
θ
∈
K
and
t
≥
0
the
following holds
•
For all
x
∈
X
t
holds
G
t
(
x,π
[
θ
](
t,x
)) = 0
.
•
For all
x
∈
X
t
,
f
∈
T
[
θ
]
and
θ
′
s.t.
d
(
θ
′
,θ
)
≤
ρ
, holds
f
(
t,x,π
[
θ
′
](
t,x
))
∈
X
t
+1
,
Remark 2.
The above condition is related to the well-known notion of
positive set/tube
invariance
in control theory (Blanchini, 1999): The above condition requires that the oracle
policies
π
[
θ
]
can ensure for their nominal model
T
(
θ
)
the following closed loop condition:
x
t
∈
X
t
=
⇒
x
t
+1
∈
X
t
+1
,
∀
t
.
Robustness of single-step closed loop transitions.
To provide worst-case guarantees
on the online state trajectory, we need to bound how system uncertainty can affect a single
online time-step state transition in the worst-case. To this end, consider we equip the state
space
X
with some norm
‖ · ‖
and define a desired property for the oracle in terms of its
performance in the idealized scenario:
Definition 2.4.
π
:
K
7→C
is
(
α,β
)
-single step robust in the space
(
X
,
‖ · ‖
)
if for any
2
-
time-steps nominal trajectory
(
x
t
+1
,x
t
)
,
(
u
t
+1
,u
t
)
∈S
[
t,t
+1]
[
ρ
;
θ
]
holds
‖
x
t
+1
‖≤
αρ
‖
x
t
‖
+
β.
The above property requires that in the idealized setting
(6)
, we can uniformly bound the
single-step growth of the state by a scalar linear function in the noise-level
ρ
and the previous
state norm. Equivalently, we can explicitly write out condition in Definition 2.4 as:
∃
α,β >
0 :
∀
θ,θ
′
∈
K
,x
∈X
,f
∈
T
[
θ
]
,t
≥
0 :
‖
f
(
t,x,π
[
θ
′
](
t,x
))
‖≤
αd
(
θ,θ
′
)
‖
x
‖
+
β.
2.2 Required conditions on procedure
SEL
We now describe required conditions for
SEL
, and will discuss specific strategies to design
SEL
in Section 4.
Let
D
:=
{
(
d
1
,...,d
N
)
|
d
i
∈
N
×X ×X ×U
, N <
∞}
be the space of all data sets
D
= (
d
1
,...,d
N
)
of time-indexed data points
d
i
= (
t
i
,x
+
i
,x
i
,u
i
)
. We will call a data
sequence
D
= (
D
1
,
D
2
,...
)
an online stream if the data sets
D
t
= (
d
1
,...,d
t
)
are formed
from a sequence of observations
(
d
1
,d
2
,...
)
.
Consistent models and parameters.
Intuitively, given a data set
D
= (
d
1
,...,d
N
)
of tuples
d
i
= (
t
i
,x
+
i
,x
i
,u
i
)
,
any
candidate
f
∈ F
which satisfies
x
+
i
=
f
(
t
i
,x
i
,u
i
)
for all
1
≤
i
≤
N
is
consistent
with
D
; Similarly, we will say that
f
is
consistent
with an online
stream
D
, if at each time-step
t
, it is consistent with the data set
D
t
. We will extend this
definition to models
T
[
θ
]
and parameters
θ
∈
K
: The model
T
[
θ
]
is a consistent model for
a data set
D
or online stream
D
, if it contains at least one function
f
which is consistent
with
D
or
D
, respectively;
θ
is then called a consistent parameter. Correspondingly, we will
define for some data set
D
, the set of all consistent parameters as
P
(
D
)
:
9
Ho, Le, Yue and Doyle
Definition 2.5
(Consistent Sets)
.
The consistent set map
P
:
D
7→
2
K
returns for each data
set
D ∈
D
the corresponding set of consistent parameters
P
(
D
)
:
P
(
D
) := closure
({
θ
∈
K
∣
∣
∃
f
∈
T
[
θ
]
s.t.
∀
(
t,x
+
,x,u
)
∈D
, x
+
=
f
(
t,x,u
)
})
.
(7)
Chasing conditions for selection
SEL
.
The first and basic design specification for
subroutine
SEL
is to output a consistent parameter
θ
=
SEL
[
D
]
for a given data set
D
,
provided such a parameter
θ
∈
K
exists. This requirement can be precisely stated in terms
of set-valued analysis: We want
SEL
to act as a
selection
map
D
7→
K
of the consistent set
map
P
:
D
7→
2
K
.
Definition 2.6.
(Aubin and Frankowska, 2009). A function
f
:
X
7→
Y
is a selection of the
set-valued map
F
:
X
7→
2
Y
, if
∀
x
∈
X
:
f
(
x
)
∈
F
(
x
)
.
Since we intend to use
SEL
in an online manner where we are given a stream of data
D
, in
addition, we require that
SEL
can posit consistent parameters
θ
t
=
SEL
[
D
t
]
in an efficient
manner online. A fitting notion of "efficiency" can be defined by comparing the variation of
the parameter sequence
θ
and the sequence of consistent sets
P
(
D
) := (
P
(
D
1
)
,
P
(
D
2
)
,...
)
over time, where we quantify the latter using the Hausdorff distance
d
H
: 2
K
×
2
K
7→
R
+
defined as
d
H
(
S
,
S
′
) = max
{
max
x
∈
S
d
(
x,
S
′
)
,
max
y
∈
S
′
d
(
y,
S
)
}
.
We phrase this in terms of desired "chasing"-properties (A)-(D) and refer to algorithms
SEL
with such properties as consistent model chasing (CMC) algorithms:
Definition 2.7.
Let
SEL
:
D
7→
K
be a selection of
P
. Let
D
= (
D
1
,
D
2
,...
)
be an online
data stream and let
θ
be a sequence defined for each time
t
as
θ
t
=
SEL
[
D
t
]
. Assume that
there always exists an
f
∈F
consistent with
D
and consider the following statements:
(A)
θ
∗
= lim
t
→∞
θ
t
exists.
(B)
lim
t
→∞
d
(
θ
t
,θ
t
−
1
) = 0
.
(C)
γ
-
competitive
:
∑
t
2
t
=
t
1
+1
d
(
θ
t
,θ
t
−
1
)
≤
γ d
H
(
P
(
D
t
2
)
,
P
(
D
t
1
))
holds for any
t
1
< t
2
.
(D)
(
γ,T
)
-
weakly competitive
:
∑
t
2
t
=
t
1
+1
d
(
θ
t
,θ
t
−
1
)
≤
γ d
H
(
P
(
D
t
2
)
,
P
(
D
t
1
))
holds for any
t
2
−
t
1
≤
T
.
We will say that
SEL
is a
consistent model chasing
(CMC) algorithm, if one of the above
chasing
properties can be guaranteed for any pair of
D
and corresponding
θ
.
The desired properties describe a natural notion of efficiency for this problem: The
posited consistent parameter
θ
t
should only change online if new data is also informative.
The competitiveness properties (C) and (D) naturally restricts changing
θ
t
when little new
information is available and permits bigger changes in
θ
t
only when new data is informative.
Per time interval
I
= [
t
1
,t
2
]
, the inequalities in (C) and (D) enforce the following: If the
consistent sets
P
(
D
t
2
)
and
P
(
D
t
1
)
are the same, i.e.: (
d
H
(
P
(
D
t
2
)
,
P
(
D
t
1
)) = 0
), the posited
parameters
θ
t
1
,...,θ
t
2
should all have the same value; On the other hand, if
P
(
D
t
2
)
is much
10
Online Robust Control in Large Uncertainty
smaller than
P
(
D
t
1
)
, i.e.: (
d
H
(
P
(
D
t
2
)
,
P
(
D
t
1
))
large ) the total variation
∑
t
2
t
=
t
1
+1
d
(
θ
t
,θ
t
−
1
)
in the posited parameters is allowed to be at most
γd
H
(
P
(
D
t
2
)
,
P
(
D
t
1
))
.
Properties (C) and (D) are stronger versions of (A) and (B), called
competitiveness
/
weak
competitiveness
. The relationship between the chasing properties are summarized in the
following corollary:
Corollary 2.8.
Then following implications hold between the properties of Definition 2.7:
(C)
⇒
(A)
⇒
⇒
(D)
⇒
(B)
The reverse (and any other) implications between the properties do not hold in general.
From the above diagram, we can see that
γ
-competitiveness is the strongest chasing
property as it implies all other properties. In section Section 4, we discuss that in cases
where the consistent set map
P
returns convex sets,
γ
-competitive CMC algorithms can be
designed via a reduction to the nested convex bodies chasing (NCBC) problem (Bubeck et al.,
2020). On the other hand, for
T
= 1
and any
γ
≥
1
, the weaker chasing property (D) can
always be achieved, even if the map
P
returns arbitrary non-convex sets. We show in Section
4.4 a simple projection-based selection rule, which satisfies
(
γ,
1)
-weak competitiveness.
2.3 Main Results
Assuming that
π
and
SEL
meet the required specifications, we can provide the overall
guarantees for the algorithm. Let
(
T
,
K
,d
)
be a compact parametrization of a given uncertainty
set
F
. Let
π
be robust per Definition 2.2 and
SEL
return consistent parameters per Definition
2.7. We apply the online control strategy
A
π
(
SEL
)
described in Algorithm 1 to system
x
t
+1
=
f
∗
(
t,x
t
,u
t
)
with unknown dynamics
f
∗
∈F
and denote
(
x
,
u
)
as the corresponding
state and input trajectories.
2.3.1 Finite mistake guarantees
Theorem 2.9.
Assume that
SEL
is a CMC-algorithm with chasing property (A) and that
π
is an oracle for an objective
G
. Then, the following mistake guarantees hold:
(i) If
π
is robust then
(
x
,
u
)
always satisfies:
∑
∞
t
=0
G
t
(
x
t
,u
t
)
<
∞
.
(ii) If
π
is uniformly
ρ
-robust and
SEL
is
γ
-competitive, then
(
x
,
u
)
obey the inequality:
∞
∑
t
=0
G
t
(
x
t
,u
t
)
≤
M
π
ρ
(
2
γ
ρ
diam(
K
) + 1
)
.
The strong competitiveness property is not necessary if instead stronger conditions on the
oracle can be enforced: Theorem 2.10 states that if the oracle
π
is cost-invariant we can
weaken the assumptions on
SEL
and still provide finite mistake guarantees.
Theorem 2.10.
Assume that
SEL
is a CMC-algorithm with chasing property (B) and that
π
is an uniformly
ρ
-robust, cost-invariant oracle for an objective
G
. Then the following mistake
guarantees holds:
11
Ho, Le, Yue and Doyle
(i)
(
x
,
u
)
always satisfies:
∑
∞
t
=0
G
t
(
x
t
,u
t
)
<
∞
.
(ii)
If
SEL
is
(
γ,T
)
-weakly competitive, then
(
x
,
u
)
obey the following inequality, with
N
denoting the packing number of
K
(see definition below):
∞
∑
t
=0
G
t
(
x
t
,u
t
)
≤
M
π
ρ
(
N
(
K
,r
∗
) + 1)
, r
∗
:=
1
2
ρ
γ
T
M
π
ρ
+
T
Definition 2.11
((Dudley, 1987))
.
Let
(
M
,d
)
be a metric space and
S
⊂M
a compact set.
For
r >
0
, define
N
(
S
,r
)
as the
r
-
packing number
of
S
:
N
(
S
,r
) := max
{
n
∈
N
∣
∣
∃
θ
1
,...,θ
N
∈
S
s.t.
d
(
θ
i
,θ
j
)
> r
for all
1
≤
i,j
≤
n, i
6
=
j
}
.
Basic asymptotic mistake guarantees.
The guarantees stated in part
(
i
)
of the above
theorems are asymptotic ones, since they are equivalent to stating
lim
t
→∞
G
t
(
x
t
,u
t
) = 0
. To
provide this guarantee, we only impose weak asymptotic conditions on
π
and
SEL
:
SEL
needs
to either satisfy the convergence condition (A) or (B), while
π
has to be either robust or
uniformly robust+cost-invariant for some non-zero margin, respectively. The appeal of the
guarantee
(
i
)
is that we do not require knowledge of any specific constants (such as
ρ
or
γ
)
to give this basic asymptotic guarantee.
Guarantees with explicit mistake bounds for convex and non-convex
P
(
D
)
.
On the other hand, the guarantees stated in part
(
ii
)
provide additional explicit mistakes
bounds if stronger chasing-properties can be established for the CMC algorithm
SEL
. Notice
that the mistake bound provided in Theorem 2.9 provides an exponential improvement
over the bound provided in Theorem 2.10. As an example, assume
K
were the unit cube
[0
,
1]
n
in
R
n
and take
d
(
x,y
) :=
‖
x
−
y
‖
∞
, then the corresponding packing number is at
least
N
(
K
,r
∗
)
≥
(
M
π
ρ
γ
ρ
diam
(
K
))
n
. The stronger result Theorem 2.9 requires
SEL
to be a
γ
-competitive CMC-algorithm. We show in Section 4 that if the consistent map
P
returns
always convex sets, we can design
γ
-competitive CMC-algorithms through a reduction to the
nested convex body chasing problem. However, it is unclear whether
γ
-competitiveness can
be achieved for the case of non-convex consistent sets. The weaker mistake bound presented
in Theorem 2.10 on the other hand only requires the weak competitiveness property (D)
which can always be achieved with
T
= 1
( see Section 4.4). Therefore, in contrast to
Theorem 2.9, the result Theorem 2.10 is directly applicable to general maps
P
.
A theoretical tool for performance analysis.
Theorem 2.9 and 2.10 can be invoked
on any learning and control method that instantiates
A
π
(
SEL
)
. It offers a set of sufficient
conditions to verify whether a learning agent
A
π
(
SEL
)
can provide mistake guarantees: We
need to show that w.r.t. some compact parametrization
(
T
,
K
,d
)
of the uncertainty set
F
,
π
operates as a robust oracle for some objective
G
, and that
SEL
satisfies strong enough
chasing properties.
A design guideline.
Theorem 2.9 also suggests a design philosophy of decoupling the
learning and control problem into two separate problems while retaining the appropriate
guarantees: (1) design a robust oracle
π
for a specified control goal
G
; and (2) design an
online selection procedure
SEL
that satisfies the chasing properties defined in Definition 2.7.
Extending the domain of robust control through augmentation.
Robust control
methods which only apply for small uncertainty settings, can be naturally extended to the
12
Online Robust Control in Large Uncertainty
large uncertainty setting. Provided that the design method can be embedded as an oracle
sub-routine
π
rc
and that we can find a suitable
SEL
routine, the meta-algorithm
A
π
rc
(
SEL
)
provides a simple extension of the original method to the large uncertainty setting:
A
π
rc
(
SEL
)
carries any guarantees of the original control method (assuming they can be paraphrased as
mistake guarantees) to the large uncertainty setting.
2.3.2 A worst-case bound on the state norm
Regardless of the objective
G
, we can provide worst-case state norm guarantees for
A
π
(
SEL
)
in a normed state space
(
X
,
‖ · ‖
)
, if
SEL
is a competitive or a weakly competitive CMC
algorithm and
π
provides sufficient robustness guarantees for a single time-step transition:
Theorem 2.12.
Assume that
π
:
K
7→C
is
(
α,β
)
-single step robust in the space
(
X
,
‖ · ‖
)
.
Then, the following state bound guarantees hold:
(i) If
SEL
is a
γ
-competitive CMC algorithm, then:
∀
t
:
‖
x
t
‖≤
e
αγφ
(
K
)
(
e
−
t
‖
x
0
‖
+
β
e
e
−
1
)
.
(ii) If
SEL
is a
(
γ,T
)
-weakly competitive CMC algorithm, then:
‖
x
‖
∞
≤
inf
0
<μ<
1
(
1 + (
αφ
(
K
))
n
∗
)
max
{
β
1
−
μ
,
‖
x
0
‖}
+
β
n
∗
∑
k
=0
(
αφ
(
K
))
k
where
n
∗
=
N
(
K
,
μ
αγ
)
and
φ
(
K
)
denotes the diameter of
K
.
Notice that the above worst-case guarantee holds for
arbitrarily large
diameter
φ
(
K
)
of
the parameter space
K
and applies naturally to scenarios where initially stabilizing control
policies do not exist. To this end, it is also important to verify that the required property
defined in Definition 2.4 does
not
imply existence of an initially stabilizing control policy.
2.3.3
Characterizing performance in terms of #mistakes vs. information gain
As shown in the appendix Section B, the mistake bounds presented in Theorem 2.9(ii) and
Theorem 2.10(ii) are actually simplifications of the stricter mistake bound inequalities:
If Thm.2.9(ii), then:
∀
τ > τ
:
τ
∑
t
=
τ
G
t
(
x
t
,u
t
)
≤
M
π
ρ
(
2
γ
ρ
d
H
(
P
(
D
τ
)
,
P
(
D
τ
)) + 1
)
If Thm.2.10(ii), then:
∀
τ > τ
:
τ
∑
t
=
τ
G
t
(
x
t
,u
t
)
≤
M
π
ρ
(
N
(
P
(
D
τ
)
\
int(
P
(
D
τ
))
,r
∗
) + 1
)
The above bounds provide more insight into the performance of an
A
π
(
SEL
)
algorithm, by
showing that the mistake guarantees scale in an efficient way with the system uncertainty:
The total mistakes occurring in any time-interval
[
τ,
τ
]
increase with the total information
gained during the same interval! The amount of information gained in the time-frame
[
τ,
τ
]
is measured by the terms
d
H
(
P
(
D
τ
)
,
P
(
D
τ
)
and
N
(
P
(
D
τ
)
\
int
(
P
(
D
τ
))
,r
∗
)
, which quantify
13
Ho, Le, Yue and Doyle
(via Hausdorff-distance and packing numbers) how much parametric uncertainty has been
reduced during
[
τ,
τ
]
, by comparing the consistent sets
P
(
D
τ
)
,
P
(
D
τ
)
at the start and the
beginning of the time window. Thus, the bounds allow for a simple interpretation: The
algorithm
A
π
(
SEL
)
only makes informative mistakes.
2.3.4 Mistake guarantees with locally robust oracles
The worst-case bound shown in Theorem 2.12 can be directly used to extend the result of
Theorem 2.9 to problem settings where we only have access to locally robust oracles.
Theorem 2.13
(Corollary of Thm. 2.9)
.
Consider the setting and assumptions of Thm.2.9
and Thm.2.10, but relax the oracle robustness requirements to corresponding local versions and
enforce the additional oracle assumption stated in Thm.2.12. Then all guarantees of Thm.2.9
(i),(ii) and Thm.2.10 (i),(ii) still hold, if we replace
M
π
ρ
in Thm.2.9 (ii) and Thm.2.10 (ii)
respectively by
M
π
ρ
(
γ
∞
)
and
M
π
ρ
(
γ
w
∞
)
with the constants:
γ
∞
=
e
αγφ
(
K
)
(
‖
x
0
‖
+
β
e
e
−
1
)
γ
w
∞
= inf
0
<μ<
1
(
1 + (
αφ
(
K
))
n
∗
)
max
{
β
1
−
μ
,
‖
x
0
‖}
+
β
n
∗
∑
k
=0
(
αφ
(
K
))
k
where
n
∗
=
N
(
K
,
μ
αγ
)
and
φ
(
K
)
denotes the diameter of
K
.
2.3.5 Oracle policies with memory.
The previous results assume that
π
returns static policies of the type
(
t,x
)
7→
u
. However,
this assumption is only made for ease of exposition. All previous results also hold in the case
where
π
returns policies which have an internal state, as long as we can define the internal
state to be shared among all oracle policies, i.e.: as part of the oracle implementation online,
we update the state
z
t
at each step
t
according to some fixed update rule
h
z
t
=
h
(
t,z
t
−
1
,x
t
,u
t
,...,x
0
,u
0
)
,
and control policies
π
[
θ
]
,
θ
∈
K
are maps
(
t,x,z
)
7→
u
which we evaluate at time
t
as
u
t
=
π
[
θ
](
t,x
t
,z
t
)
.
We discuss in section 3 how robust oracle design is a pure robust control problem and briefly
overview the main available methods. Designing procedures
SEL
with the properties stated
in Definition 2.7 poses a new class of online learning problems. We propose in Section 4
a reduction to the well-known nested convex body chasing problem, which enables design
and analysis of competitive CMC procedures in the scenario where the consistent sets
P
(
D
t
)
are always convex. For the non-convex scenario, we highlight a general projection based
selection which can ensure weak-competitiveness.
3. Robust Control Oracle
Designing robust oracles
π
introduced in Definition 2.2 can be mapped to well-studied
problems in robust control theory. We use a simplified problem setting to explain this
correspondence.
14
Online Robust Control in Large Uncertainty
Figure 1:
closed loop
CL
θ
∗
: An idealized setting in which we have noisy measurements of
the true
θ
∗
.
Consider a class of system of the form
x
t
+1
=
g
(
x
t
,u
t
;
θ
∗
) +
w
t
,
‖
w
t
‖ ≤
1
, where
θ
∗
is
an unknown system parameter which lies in a known compact set
K
⊂
R
m
. We represent
the uncertainty set as
F
=
∪
θ
∈
K
T
[
θ
]
with
T
[
θ
] :=
{
f
∗
:
t,x,u
7→
g
(
x,u
;
θ
) +
w
t
|‖
w
‖
∞
≤
1
}
.
Let
π
:
K
7→C
be a procedure which returns state feedback policies
π
[
θ
] :
X 7→U
for a given
θ
∈
K
. Designing an uniformly
ρ
-robust oracle
π
can be equivalently viewed as making the
closed loop system (described by
(6)
) of the idealized setting robust to disturbance and noise.
For the considered example, the closed loop is depicted in Figure 1 and is represented by
CL
∗
θ
which maps system perturbations (
w
,
v
) to corresponding system trajectories
x
,
u
. We
call
π
a uniformly
ρ
-robust oracle if the cost-performance (measured as
∑
∞
t
=0
G
t
(
x
t
,u
t
)
) of
the closed loop
CL
∗
θ
is robust to disturbances of size
1
and noise of size
ρ
for any
θ
∗
∈
K
. For
any noise
‖
v
‖
∞
≤
ρ
and disturbance
‖
w
‖
∞
≤
1
, the performance cost has to be bounded as:
∞
∑
t
=0
G
t
(
x
t
,u
t
)
≤
M
π
ρ
or
∞
∑
t
=0
G
t
(
x
t
,u
t
)
≤
M
π
ρ
(
‖
x
0
‖
)
,
for some fixed constant
M
π
ρ
or fixed function
M
π
ρ
(
·
)
, in case we can only establish local prop-
erties. Now, if we identify the cost functions
G
t
with their level sets
S
t
:=
{
(
x,u
)
|G
t
(
x,u
) = 0
}
,
we can phrase the former equivalently as a form of robust trajectory tracking problem or
a set-point control problem
1
(Khalil and Grizzle, 2002). It is common in control theory to
provide guarantees in the form of convergence rates (finite-time or exponential convergence)
on the tracking-error; these guarantees can be directly mapped to
M
π
ρ
and
M
π
ρ
(
·
)
, as shown
in the next example:
Example 1.
Assume we want to track a desired trajectory
x
d
within
ε
accuracy in some
normed state space
(
X
,
‖ · ‖
)
. We can phrase this as a control objective
G
by defining
G
t
(
x,u
) := 0
,
if
‖
x
d
t
−
x
‖ ≤
ε
and
G
t
(
x,u
) := 1
,
otherwise
. Providing an exponential
convergence guarantee of the type
‖
x
d
t
−
x
t
‖≤
cμ
t
with constants
c
,
μ <
1
is a basic problem
studied in control theory. It is easy to see that such a guarantee implies
∑
∞
t
=0
G
t
(
x
t
,u
t
)
≤
log(
c
‖
x
0
‖
)+log(
ε
−
1
)
log(
μ
−
1
)
. Hence, if design method
π
provides a policy
π
[
θ
]
which guarantees (for any
‖
w
‖
∞
≤
1
,
‖
v
‖
∞
≤
ρ
) for any trajectory of the closed-loop
CL
θ
the convergence condition
∀
t
:
‖
x
d
t
−
x
t
‖≤
cμ
t
, then
π
is a locally
ρ
-uniformly robust oracle with the mistake function
M
π
ρ
=
log(
c
‖
x
0
‖
)+log(
ε
−
1
)
log(
μ
−
1
)
.
1. in that case
G
t
would be not time dependent
15
Ho, Le, Yue and Doyle
Available methods for robust oracle design.
Many control methods exist for robust
oracle design. Which method to use depends on the control objective
G
, the specific
application, and the system class (linear/nonlinear/hybrid, etc.). For a broad survey, see
(Zhou et al., 1996; Spong and Vidyasagar, 1987; Spong, 1992a) and references therein. We
characterize two general methodologies (which can also be combined):
•
Robust stability analysis focus:
In an initial step, we use analytical design principles
from robust nonlinear and linear control design to propose an oracle
π
[
θ
](
x
)
in closed-
form for all
θ
and
x
. In a second step we prove robustness using analysis tools such
as for example
Input-to-State Stability
(ISS) stability analysis (Jiang et al., 1999) or
robust set invariance methods (Rakovic et al., 2006; Rakovic and Baric, 2010)).
•
Robust control synthesis:
If the problem permits, we can also directly address the
control design problem from a computational point of view, by formulating the design
problem as an optimization problem and compute for a control law with desired
guarantees directly. This can happen partially online, partially offline. Some common
nonlinear approaches are robust (tube-based) MPC (Mayne et al., 2011; Borrelli et al.,
2017), SOS-methods (Parrilo, 2000),(Aylward et al., 2008), Hamilton-Jacobi reachability
methods (Bansal et al., 2017).
There are different advantages and disadvantages to both approaches and it is important to
point out that robust control problems are not always tractably solvable. See (Blondel and
Tsitsiklis, 2000; Braatz et al., 1994) for simple examples of robust control problems which are
NP-hard. The computational complexity of robust controller synthesis tends to increase (or
even be potentially infeasible) with the complexity of the system of interest; it also further
increases as we try optimize for larger robustness margins
ρ
.
The dual purpose of the oracle.
In our framework, access to a robust oracle
is a necessary prerequisite to design learning and control agents
A
π
(
SEL
)
with mistake
guarantees. However this is a mild assumption and is often more enabling than restrictive.
First, it represents a natural way to ensure well-posedness of the overall learning and control
problem; If robust oracles cannot be found for an objective, then the overall problem is likely
intrinsically hard or ill-posed (for example necessary fundamental system properties like
stabilizability/ detectability are not satisfied).
Second, the oracle abstraction enables a modular approach to robust learning and control
problems, and directly leverages existing powerful methods in robust control: Any model-
based design procedure
π
which works well for the small uncertainty setting (i.e.: acts as a
robust oracle) can be augmented with an online chasing algorithm
SEL
(with required chasing
properties) to provide robust control performance (in the form of mistake guarantees) in the
large uncertainty setting via the augmented algorithm
A
π
(
SEL
)
.
4. Consistent models chasing
The “chasing properties” defined in Definition 2.7 express different solution approaches to
the same underlying online learning problem, which we refer broadly as a
consistent models
chasing
(CMC) problem. We discuss how this problem is linked to the online learning
literature and discuss a reduction to the well-known nested convex body chasing (NCBC)
problem.
16