of 35
Online Robust Control of Nonlinear Systems with Large Uncertainty
Dimitar Ho
Hoang M. Le
John Doyle
Yisong Yue
Caltech
Microsoft Research
Caltech
Caltech
Abstract
Robust control is a core approach for con-
trolling systems with performance guarantees
that are robust to modeling error, and is
widely used in real-world systems. However,
current robust control approaches can only
handle small system uncertainty, and thus
require significant effort in system identifica-
tion prior to controller design. We present
an online approach that robustly controls a
nonlinear system under large model uncer-
tainty. Our approach is based on decomposing
the problem into two sub-problems, “robust
control design” (which assumes small model
uncertainty) and “chasing consistent models”,
which can be solved using existing tools from
control theory and online learning, respec-
tively. We provide a learning convergence
analysis that yields a finite mistake bound
on the number of times performance require-
ments are not met and can provide strong
safety guarantees, by bounding the worst-case
state deviation. To the best of our knowledge,
this is the first approach for online robust
control of nonlinear systems with such learn-
ing theoretic and safety guarantees. We also
show how to instantiate this framework for
general robotic systems, demonstrating the
practicality of our approach.
1 Introduction
We study the problem of online control for nonlinear
systems with large model uncertainty under the require-
ment to provide upfront control-theoretic guarantees
for the worst case online performance. Algorithms with
such capabilities can enable us to (at least partially)
sidestep undertaking laborious system identification
Proceedings of the 24
th
International Conference on Artifi-
cial Intelligence and Statistics (AISTATS) 2021, San Diego,
California, USA. PMLR: Volume 130. Copyright 2021 by
the author(s).
tasks prior to robust controller design. Our goal is
to design frameworks that can leverage existing ap-
proaches for online learning (to ensure fast convergence
(Bubeck et al., 2020)) and robust control (which have
control-theoretic guarantees under small model uncer-
tainty (Zhou and Doyle, 1998)), in order to exploit the
vast literature of prior work and simplify algorithm
design. To this end, we introduce a class of problems,
which we refer to as
online control
with
mistake guar-
antees
(OC-MG). To the best of our knowledge, this is
the first rigorous treatment of online robust control of
nonlinear systems under large uncertainty.
1.1 Problem Statement
Consider controlling a discrete-time
nonlinear dynami-
cal system
with system equations:
x
t
+1
=
f
(
t,x
t
,u
t
)
, f
∈F
,
(1)
where
x
t
∈X
and
u
t
∈U
denote the system state and
control input at time step
t
and
X ×U
denotes the
state-action space. We assume that
f
is an
unknown
function and that we only know of an
uncertainty set
F
which contains the true
f
.
Large uncertainty setting.
We impose no further
assumptions on
F
and explicitly allow
F
to represent
arbitrarily large model uncertainties.
Control objective.
The control objective is specified
as a sequence
G
= (
G
0
,
G
1
,...
)
of binary cost functions
G
t
:
X ×U 7→{
0
,
1
}
, where each function
G
t
encodes a
desired condition per time-step
t
:
G
t
(
x
t
,u
t
) = 0
means
the state
x
t
and input
u
t
meet the requirements at time
t
.
G
t
(
x
t
,u
t
) = 1
means that some desired condition
is violated at time
t
and we will say that the system
made a
mistake
at
t
. The performance metric of system
trajectories
x
:= (
x
0
,x
1
,...
)
and
u
:= (
u
0
,u
1
,...
)
is
the sum of incurred cost
G
t
(
x
t
,u
t
)
over the time interval
[0
,
)
and we denote this the
total number of mistakes
:
# mistakes of
x
,
u
=
t
=0
G
t
(
x
t
,u
t
)
.
(2)
For a system state-input trajectory
(
x
,
u
)
to achieve
an objective
G
, we want the above quantity to be finite,
i.e.: eventually the system stops making mistakes and
meets the requirements of the objective for all time.
Robust Online Control of Nonlinear Systems with Large Uncertainty
Algorithm design goal.
The goal is to design an
online decision rule
u
t
=
A
(
t,x
t
,...,x
0
)
such that
regardless
of the unknown
f
∈F
, we are guaranteed
to have finite or even explicit upper-bounds on the total
number of mistakes
(2)
of the online trajectories. Thus,
we require a strong notion of robustness:
A
can control
any system
(1)
with the certainty that the objective
G
will be achieved after finitely many mistakes. It
is suitable to refer to our problem setting as
online
control
with
mistake guarantees
.
1.2 Motivation and related work
How can we design control algorithms for dynamical
systems with strong guarantees without requiring much
a-priori information about the system?
This question is of particular interest in safety-critical
settings involving real physical systems, which arise
in engineering domains such as aerospace, industrial
robotics, automotive, energy plants (Vaidyanathan
et al., 2016). Frequently, the designer of control policies
faces two major challenges: guarantees and uncertainty.
Guarantees.
Control policies can only be deployed
if it can be certified in advance that the policies will
meet desired performance requirements online. This
makes the mistake guarantee w.r.t. objective
G
a nat-
ural performance metric, as
G
can incorporate control
specifications such as tracking, safety and stability that
often arise in practical nonlinear dynamical systems.
The online learning for control literature mostly fo-
cused on linear systems or linear controllers (Dean
et al., 2018; Simchowitz et al., 2018; Hazan et al., 2020;
Chen and Hazan, 2020), with some emerging work on
online control of nonlinear systems. One approach is
incorporate stability into the neural network policy
as part of RL algorithm (Donti et al., 2021). Alter-
natively, the parameters of nonlinear systems can be
transformed into linear space to leverage linear analysis
(Kakade et al., 2020; Boffi et al., 2020). These prior
work focus on sub-linear regret bound, which is not
the focus of our problem setup. We note that regret is
not necessarily the only way to measure performance.
For example, competitive ratio is an alternative perfor-
mance metric for online learning to control (Goel and
Wierman, 2019; Goel et al., 2019; Shi et al., 2020). In
addition, our mistake guarantee requirement is stricter
than the no-regret criterion and more amenable to
control-theoretic guarantees. Specifically, (fast) conver-
gence of
1
T
T
t
=0
G
t
(
x
t
,u
t
)
0
does not imply the total
number of mistakes
t
=0
G
t
(
x
t
,u
t
)
is bounded. We
provide additional discussion on the large and growing
literature on learning control for linear systems, as well
as adaptive control techniques from control community
in Appendix F.
Large uncertainty.
Almost always, the dynamics of
the real system are not known exactly and one has to
resort to an approximate model. The most common
approach in online learning for control literature (Dean
et al., 2017) is to perform system identification (Ljung,
1999), and then use tools from robust control theory
(Zhou and Doyle, 1998). Robust controller synthesis
can provide policies with desired guarantees, if one
can obtain an approximate model which is “provably
close enough” to the real system dynamics. However,
estimating a complex system to a desired accuracy level
quickly becomes intractable in terms of computational
and/or sample complexity. In the adversarial noise
setting, system identification of simple linear systems
with precision guarantees can be NP-hard (Dahleh
et al., 1993). General approaches for nonlinear system
identification with precision guarantees are for the most
part not available (recently Mania et al. (2020) analyzed
sample complexity under stochastic noise).
1.3 Overview of our approach
An alternative to SysID+control: using only
rough models, learn to control online.
While ac-
curate models of real systems are hard to obtain, it is
often easy to provide more qualitative or rough models
of the system dynamics
without
requiring system iden-
tification. Having access to a rough system description,
we aim to learn to control the real system from online
data and provide control-theoretic guarantees on the
online performance in advance.
Rough models as compactly parametrizable un-
certainty sets.
In practice, we never have the exact
knowledge of
f
in advance. However, for engineer-
ing applications involving physical systems, the func-
tional form for
f
can often be derived through first
principles and application-specific domain knowledge.
Conceptually, we can view the unknown parameters
of the functional form as conveying both the ‘mod-
eled dynamics’ and ‘unmodeled (adversarial) distur-
bance’ components of the ground truth
f
in the system
x
t
+1
=
f
(
t,x
t
,u
t
)
. The range of unknown parameters
will form a compact parameter set
K
, which in turn
determines the size of model uncertainty set
F
.
Definition 1.1
(Compact parametrization)
.
A tuple
(
T
,
K
,d
)
is a
compact parametrization
of
F
, if
(
K
,d
)
is
a compact metric space and if
T
:
K
7→
2
F
is a mapping
such that
F ⊂
θ
K
T
(
θ
)
.
We will work with candidate parameters
θ
K
of the
system. Intuitively, consider a (non-unique) compact
parameterization
K
of the uncertain model set
F
, i.e.,
there exists a mapping
T
:
K
7→F
such that for each
parameter
θ
K
,
T
[
θ
]
represents a set of candidate
models, ideally with small uncertainty. The uncertainty
will be reflected more precisely by whether the system
is robustly controllable if the true parameter
were
θ
.
Ho, Le, Doyle, Yue
For concreteness, we give several simple examples of
possible parametrizations
K
for different systems:
1.
Linear time-invariant system
: linear system with
matrices
A
,
B
perturbed by bounded disturbance
sequence
w
`
,
w
η
:
f
(
t,x,u
) =
Ax
+
Bu
+
w
t
.
(3)
K
can describe known bounds for
θ
= (
A,B,η
)
.
2.
Nonlinear system, linear parametrization
: nonlin-
ear system, where dynamics are a weighted sum
of nonlinear functions
ψ
i
perturbed by bounded
disturbance sequence
w
`
,
w
η
:
f
(
t,x,u
) =
M
i
=1
a
i
ψ
i
(
x,u
) +
w
t
.
(4)
K
can represent known bounds on
θ
= (
{
a
i
}
)
.
3.
Nonlinear system, nonlinear parametrization
: non-
linear system, with function
g
parametrized by
fixed parameter vector
p
R
m
(e.g., neural net-
works), perturbed by bounded disturbance se-
quence
w
`
,
w
η
:
f
(
t,x,u
) =
g
(
x,u
;
p
) +
w
t
.
(5)
K
can represent known bounds on
θ
= (
p,η
)
.
In these examples, the set
F
can be summarized as:
F
=
{
f
θ
(
x,u,w
t
)
with
w
η
and
θ
K
}
,
(6)
where
f
θ
denotes one of functional forms on the right-
hand side of equation (3), (4) or (5).
Online robust control algorithm.
Given a compact
parametrization
(
T
,
K
,d
)
for the uncertainty set
F
, we
design meta-algorithm
A
π
(
SEL
)
(Algorithm 1) that
controls the system
(1)
online by invoking two sub-
routines
π
and
SEL
in each time step.
Consistent model chasing.
Procedure
SEL
receives
a finite data set
D
, which contains state and input
observations, and returns a parameter
θ
K
. The
design criterion is for each time
t
, the procedure
SEL
selects
θ
t
such that the set of models
T
[
θ
t
]
stays “consistent” with
D
t
, i.e., candidate models
can
explain
the past data.
Robust oracle.
Procedure
π
receives a posited
system parameter
θ
K
as input and returns a
control policy
π
[
θ
] :
N
×X 7→ U
which can be
evaluated at time
t
to compute a control action
u
t
=
π
[
θ
](
t,x
t
)
based on the current state
x
t
. The
control policy design is a robust control procedure,
in the sense that
π
[
θ
]
achieves
G
if
f
T
[
θ
]
(which may not be the case at given time step).
Theoretical result.
We will clarify the consistency
and robustness requirements of the sub-routines
π
and
SEL
in the next section. For now, we present an in-
formal finite mistake guarantees for the online control
scheme
A
π
(
SEL
)
from Algorithm 1.
Algorithm 1
Meta-Implementation of
A
π
(
SEL
)
for
(OC-MG)
Require:
procedures
π
and
SEL
Initialization:
D
0
←{}
,
x
0
is set to initial condition
ξ
0
1:
for
t
= 0
,
1
,...
to
do
2:
D
t
append
(
t,x
t
,x
t
1
,u
t
1
)
to
D
t
1
(if
t
1
)
.
update online history of observations
3:
θ
t
SEL
[
D
t
]
.
present online data to
SEL
, get
posited parameter
θ
t
4:
u
t
π
[
θ
t
](
t,x
t
)
.
query
π
for policy
π
[
θ
t
]
and
evaluate it
5:
x
t
+1
f
(
t,x
t
,u
t
)
.
system transitions with
unknown
f
to next state
6:
end for
Theorem
(
Informal
)
.
For any (adversarial)
f
∈F
,
the online control scheme
A
π
(
SEL
)
described in Algo-
rithm 1 guarantees a-priori that the trajectories
x
,
u
will achieve the objective
G
after finitely many mistakes.
The number of mistakes
t
=0
G
t
(
x
t
,u
t
)
is at most
M
π
ρ
(
2
γ
diameter
(
K
)
oracle robustness margin
ρ
+ 1
)
,
and the state
x
t
is never larger than
exp(
α
π
γ
diameter
(
K
))(
x
0
+
C
π
)
,
where
M
π
ρ
denotes the worst case total mistakes of
ρ
robust oracle
π
(under true parameter),
α
π
,
C
π
are
"robustness" constants of
π
and
γ
is the "competitive
ratio" of
SEL
.
This approach brings several attractive qualities:
Generality.
The result applies to a wide range of
problem settings. The objective
G
and uncertainty
set
F
serve as a flexible abstraction to represents
a large class of dynamical systems and control-
theoretic performance objectives.
Robust guarantees in the large uncertainty setting.
Our result applies in settings where only rough
models are available. As an example, we can use
the result to provide guarantees in control settings
with unstable uncertain nonlinear systems where
stabilizing policies are
not
known a-priori.
Decoupling algorithm design for learning and con-
trol.
The construction of the “robust oracle”
π
and
the consistent model chasing procedure
SEL
can
be addressed with existing tools from control and
learning. More generally, this perspective enables
to decouple for the first time learning and control
problems into separate robust control and online
learning problems.
2 Main result
As summarized in Algorithm 1, the main ingredients of
our approach are a robust control oracle
π
that returns
a robust controller under posited system parameters,
and an online algorithm
SEL
that chases parameter sets
Robust Online Control of Nonlinear Systems with Large Uncertainty
that are consistent with the data collected so far. Here
we expand on the desired conditions of
π
and
SEL
.
2.1 Required conditions on procedure
π
Online control with oracle under idealized con-
ditions.
Generally, procedure
π
is a map
K
7→C
from
parameter space
K
to the space
C
:=
{
κ
:
N
×X 7→U}
of all (non-stationary) control policies of the form
u
t
=
κ
(
t,x
t
)
. A desired property of
π
as an
oracle
is that
π
returns controllers that satisfy
G
if the model
uncertainty
were
small. In other words, if the uncer-
tainty set
F
were
contained in the set
T
[
θ
]
, then control
policy
π
[
θ
]
could guarantee to achieve the objective
G
with finite mistake guarantees. Further, in an idealized
setting where the true parameter
were
known exactly,
the oracle should return policy such that the system
performance is robust to some level of bounded noise
– which is a standard notion of
robustness
. We make
this design specification more precise below and discuss
how to instantiate the oracle in Section 3.
Idealized problem setting.
Let
θ
be a parameter of
the true dynamics
f
, and assume that online we have
access to noisy observations
θ
= (
θ
0
1
,...
)
, where
each measurement
θ
t
is
ρ
-close to
θ
, under metric
d
.
The online control algorithm queries
π
at each time-
step and applies the corresponding policy
π
[
θ
t
]
. The
resulting trajectories obey the equations:
x
t
+1
=
f
(
t,x
t
,u
t
)
, u
t
=
π
[
θ
t
](
t,x
t
)
(7a)
θ
t
s.t.:
d
(
θ
t
)
ρ,
where
f
T
[
θ
]
(7b)
To facilitate later discussion, define the set of all feasible
trajectories of the dynamic equations
(7)
as the
nominal
trajectories
S
I
[
ρ
;
θ
]
of the oracle:
Definition 2.1.
For a time-interval
I
= [
t
1
,t
2
]
N
and fixed
θ
K
, let
S
I
[
ρ
;
θ
]
denote the set of all
pairs of finite trajectories
x
I
:= (
x
t
1
,...,x
t
2
)
,
u
I
:=
(
u
t
1
,...,u
t
2
)
which for
θ
=
θ
, satisfy conditions
(7)
with some feasible
f
and sequence
(
θ
t
1
,...,θ
t
2
)
.
Design specification for oracles.
We will say that
π
is
ρ
-robust for some objective
G
, if all trajectories
in
S
I
[
ρ
;
θ
]
achieve
G
after finitely many mistakes. We
distinguish between robustness and uniform robustness,
which we define precisely below:
Definition 2.2
(robust oracle)
.
For each
ρ
0
and
θ
K
, define the quantity
m
π
ρ
(
θ
)
as
m
π
ρ
(
θ
) :=
sup
I
=[
t,t
] :
t<t
sup
(
x
I
,u
I
)
∈S
I
[
ρ
;
θ
]
t
∈I
G
t
(
x
t
,u
t
)
If
m
π
0
(
θ
)
<
for all
θ
K
, we call
π
an
oracle
for
G
w.r.t. parametrization
(
T
,
K
,d
)
. In addition, we say an
oracle
π
is (uniformally)
ρ
-robust, if the corresponding
property below holds:
ρ
-robust:
m
π
ρ
(
θ
)
<
for all
θ
K
uniformly
ρ
-robust:
M
π
ρ
:= sup
θ
K
m
π
ρ
(
θ
)
<
If it exists,
M
π
ρ
is the
mistake constant
of
π
.
π
is a robust oracle for
G
if the property in definition
above holds for some
ρ >
0
and we refer to
ρ
as the
robustness margin. The mistake constant
M
π
ρ
can be
viewed as a robust offline benchmark: it quantifies how
many mistakes we would make in the worst-case, if we
could use the oracle
π
under idealized conditions, i.e.,
described by (7).
2.2 Required conditions on procedure
SEL
We now describe required conditions for
SEL
, and will
discuss specific strategies to design
SEL
in Section 4.
Let
D
be the space of all tuples of time-indexed data
points
d
i
= (
t
i
,x
+
i
,x
i
,u
i
)
:
D
:=
{
(
d
1
,...,d
N
)
|
d
i
N
×X ×X ×U
, N <
∞}
,
Generally procedure
SEL
takes as an input a data set
D ∈
D
and outputs a parameter
SEL
[
D
]
K
.
Intuitively, given a data set
D
= (
d
1
,...,d
N
)
of tuples
d
i
= (
t
i
,x
+
i
,x
i
,u
i
)
,
any
candidate
f
∈F
which satis-
fies
x
+
i
=
f
(
t
i
,x
i
,u
i
)
for all
1
i
N
is
consistent
with
D
. We will extend this definition to parameters:
θ
K
is a consistent parameter for
D
, if
T
[
θ
]
contains
at least one function
f
which is consistent with
D
. Cor-
respondingly, we will define for some data
D
, the set
of all consistent parameters as
P
(
D
)
:
Definition 2.3
(Consistent Sets)
.
For all
D ∈
D
, de-
fine the set
P
(
D
)
as:
P
(
D
) := closure
(
{
all
θ
K
such that (9)
}
)
,
(8)
f
T
(
θ
) :
(
t,x
+
,x,u
)
∈D
:
x
+
=
f
(
t,x,u
)
.
(9)
Chasing conditions for selection
SEL
.
The design
specification for subroutine
SEL
is to output a con-
sistent parameter
θ
=
SEL
[
D
]
for given data set
D
,
provided such a parameter
θ
K
exists. In addition,
we require that for a stream of data
D
t
= (
d
1
,...,d
t
)
collected from the same system
f
∈ F
, the sequence
of parameters
θ
t
=
SEL
[
D
t
]
posited by
SEL
satisfies
certain convergence properties.
Definition 2.4
(consistent model chasing)
.
Let
D
t
=
(
d
1
,...,d
t
)
be a stream of data generated by:
x
t
+1
=
f
(
t,x
t
,u
t
)
x
0
=
ξ
0
, f
∈F
d
t
= (
t,x
t
,x
t
1
,u
t
1
)
.
and let
θ
t
=
SEL
[
D
t
]
define a parameter sequence
θ
returned by
SEL
. We say that
SEL
is
chasing consistent
models
if
θ
t
P
(
D
t
)
,
t
and
lim
t
→∞
θ
t
P
(
D
)
holds
regardless of initial condition
ξ
0
, input sequence
u
or
f
∈F
. We further say
SEL
is
γ
-
competitive
if
t
=1
d
(
θ
t
t
1
)
γ
max
θ
0
K
d
(
P
(
D
)
0
)
,
holds for a fixed constant
γ >
0
, which we call the
competitive ratio
. (
d
(
S
,p
) := inf
q
S
d
(
q,p
)
)
Ho, Le, Doyle, Yue
2.3 Main theorem
Assuming for now that
π
and
SEL
meet the required
specifications, we can provide the overall guarantees for
the algorithm. Let
(
T
,
K
,d
)
be a compact parametriza-
tion of a given uncertainty set
F
. Let
π
be robust
per Definition 2.2 and
SEL
return consistent parame-
ters per Definition 2.4. We apply the online control
strategy
A
π
(
SEL
)
described in Algorithm 1 to system
x
t
+1
=
f
(
t,x
t
,u
t
)
with unknown dynamics
f
∈ F
and denote
(
x
,
u
)
as the corresponding state and input
trajectories. The mistakes will be bounded as follows:
Theorem 2.5.
Assume that
SEL
chases consistent
models and that
π
is an oracle for an objective
G
. Then
the following mistake guarantees hold:
(i) If
π
is robust then
(
x
,
u
)
always satisfy:
t
=0
G
t
(
x
t
,u
t
)
<
.
(ii)
If
π
is uniformly
ρ
-robust and
SEL
is
γ
-competitive,
then
(
x
,
u
)
obey the inequality:
t
=0
G
t
(
x
t
,u
t
)
M
π
ρ
(
2
γ
ρ
diam(
K
) + 1
)
.
Theorem 2.6.
If
π
is
(
α,β
)
-single step robust
1
and
SEL
is
γ
-competitive,
x
t
t
is always bounded by
x
t
‖≤
e
αγ
diam(
K
)
(
e
t
x
0
+
β
e
e
1
)
(10)
Theorem 2.5 can be invoked on any learning and con-
trol method that instantiates
A
π
(
SEL
)
. It offers a set
of sufficient conditions to verify whether a learning
agent
A
π
(
SEL
)
can provide mistake guarantees: We
need to show that w.r.t. some compact parametriza-
tion
(
T
,
K
,d
)
of the uncertainty set
F
,
π
operates as
a robust oracle for some objective
G
, and that
SEL
satisfies strong enough chasing properties. Theorem
2.6 provides a general safety guarantee for
A
π
(
SEL
)
without requiring
π
to be an oracle for any particular
objective
G
.
Theorem 2.5 also suggests a design philosophy of decou-
pling the learning and control problem into two separate
problems while retaining the appropriate guarantees:
(1) design a robust oracle
π
for a specified control goal
G
; and (2) design an online selection procedure
SEL
that
satisfies the chasing properties defined in Definition 2.4.
We discuss in section 3 how addressing (1) is a pure
robust control problem and briefly overview the main
available methods. Designing procedures
SEL
with the
properties stated in Definition 2.4 poses a new class of
online learning problems. We propose in Section 4 a
reduction of
SEL
to the well-known nested convex body
chasing problem, which enables design and analysis of
1
see appendix A
Figure 1: closed loop
CL
θ
: An idealized setting in
which we have noisy measurements of the true
θ
.
competitive
SEL
procedures.
2.4 Extensions
In the appendix section A we present the full version
of the main result which considers a broader definition
for robust oracle and chasing properties (see definition
A.2 and A.6).
Oracle policies with memory.
The results assume
that
π
returns static policies of the type
(
t,x
)
7→
u
.
However, this assumption is only made for ease of
exposition. The results of Theorem 2.5 also hold in the
case where
π
returns policies which have an internal
state, as long as we can define the internal state to be
shared among all oracle policies, i.e.: as part of the
oracle implementation online, we update the state
z
t
at each step
t
according to some fixed update rule
h
:
z
t
=
h
(
t,z
t
1
,x
t
,u
t
,...,x
0
,u
0
)
,
and control policies
π
[
θ
]
,
θ
K
are maps
(
t,x,z
)
7→
u
which we evaluate at time
t
as
u
t
=
π
[
θ
](
t,x
t
,z
t
)
.
3 Robust Control Oracle
Designing robust oracles
π
introduced in Definition 2.2
(extended version in A.2) can be mapped to well-studied
problems in robust control theory. We use a simplified
problem setting to explain this correspondence.
Consider a class of system of the form
x
t
+1
=
g
(
x
t
,u
t
;
θ
)+
w
t
,
w
t
‖≤
1
, where
θ
is an unknown sys-
tem parameter which lies in a known compact set
K
R
m
. We represent the uncertainty set as
F
=
θ
K
T
[
θ
]
with
T
[
θ
] :=
{
f
:
t,x,u
7→
g
(
x,u
;
θ
) +
w
t
|‖
w
1
}
Let
π
:
K
7→C
be a procedure which returns state feed-
back policies
π
[
θ
] :
X 7→U
for a given
θ
K
. Design-
ing an uniformly
ρ
-robust oracle
π
can be equivalently
viewed as making the closed loop system (described
by
(7)
) of the idealized setting robust to disturbance
and noise. For the considered example, the closed
loop is depicted in Figure 1 and is represented by
CL
θ
which maps system perturbations (
w
,
v
) to correspond-
ing system trajectories
x
,
u
. We call
π
a uniformly
ρ
-robust oracle if the cost-performance (measured as
t
=0
G
t
(
x
t
,u
t
)
) of the closed loop
CL
θ
is robust to
Robust Online Control of Nonlinear Systems with Large Uncertainty
disturbances of size
1
and noise of size
ρ
for any
θ
K
.
For any noise
v
ρ
and disturbance
w
1
,
the performance cost has to be bounded as:
t
=0
G
t
(
x
t
,u
t
)
M
π
ρ
or
t
=0
G
t
(
x
t
,u
t
)
m
π
ρ
(
x
0
)
,
for some fixed constant
M
π
ρ
or fixed function
m
π
ρ
, in
case we can only establish local properties. Now, if
we identify the cost functions
G
t
with their level sets
S
t
:=
{
(
x,u
)
|G
t
(
x,u
) = 0
}
, we can phrase the former
equivalently as a form of robust trajectory tracking
problem or a set-point control problem (if
G
t
is time
independent) (Khalil and Grizzle, 2002). It is common
in control theory to provide guarantees in the form of
convergence rates (finite-time or exponential conver-
gence) on the tracking-error; these guarantees can be
directly mapped to
M
π
ρ
and
m
π
ρ
(
·
)
.
Available methods for robust oracle design.
Many control methods exist for robust oracle design.
Which method to use depends on the control objective
G
, the specific application, and the system class (lin-
ear/nonlinear/hybrid, etc.). For a broad survey, see
(Zhou et al., 1996; Spong and Vidyasagar, 1987; Spong,
1992a) and references therein. We characterize two
general methodologies (which can also be combined):
Robust stability analysis focus:
In an initial step,
we use analytical design principles from robust
nonlinear and linear control design to propose an
oracle
π
[
θ
](
x
)
in closed-form for all
θ
and
x
. In
a second step we prove robustness using analysis
tools such as for example
Input-to-State Stability
(ISS) stability analysis (Jiang et al., 1999) or ro-
bust set invariance methods (Rakovic et al., 2006;
Rakovic and Baric, 2010)).
Robust control synthesis:
If the problem permits,
we can also directly address the control design
problem from a computational point of view, by
formulating the design problem as an optimization
problem and compute for a control law with desired
guarantees directly. This can happen partially
online, partially offline. Some common nonlinear
approaches are robust (tube-based) MPC (Mayne
et al., 2011; Borrelli et al., 2017), SOS-methods
(Parrilo, 2000),(Aylward et al., 2008), Hamilton-
Jacobi reachability methods (Bansal et al., 2017).
There are different advantages and disadvantages to
both approaches and it is important to point out that
robust control problems are not always tractably solv-
able. See (Blondel and Tsitsiklis, 2000; Braatz et al.,
1994) for simple examples of robust control problems
which are NP-hard. The computational complexity of
robust controller synthesis tends to increase (or even
be potentially infeasible) with the complexity of the
system of interest; it also further increases as we try
optimize for larger robustness margins
ρ
.
The dual purpose of the oracle.
In our framework,
access to a robust oracle is a necessary prerequisite
to design learning and control agents
A
π
(
SEL
)
with
mistake guarantees. However this is a mild assumption
and is often more enabling than restrictive. First, it
represents a natural way to ensure well-posedness of
the overall learning and control problem; If robust
oracles cannot be found for an objective, then the
overall problem is likely intrinsically hard or ill-posed
(for example necessary fundamental system properties
like stabilizability/ detectability are not satisfied).
Second, the oracle abstraction enables a modular ap-
proach to robust learning and control problems, and
directly leverages existing powerful methods in robust
control: Any model-based design procedure
π
which
works well for the small uncertainty setting (i.e.: acts
as a robust oracle) can be augmented with an online
chasing algorithm
SEL
(with required chasing proper-
ties) to provide robust control performance (in the form
of mistake guarantees) in the large uncertainty setting
via the augmented algorithm
A
π
(
SEL
)
.
4 Chasing consistent models
To provide mistake guarantees for the Algorithm
A
π
(
SEL
)
, the procedure
SEL
must satisfy the “chasing
property” as defined in Definition 2.4.
4.1 Chasing consistent models competitively
Assume that at each timestep
1
k
T
, we are given a
data point
d
k
= (
t
k
,x
+
k
,x
k
,u
k
)
, where
t
k
N
,
x
+
k
,x
k
X
,
u
k
∈U
and assume that at time
t
= 0
, it is known
that each data point
d
k
will satisfy the equation
x
+
k
=
f
(
t
k
,x
k
,u
k
)
for some fixed, but unknown dynamics
f
∈ F
. Denote
D
k
:= (
d
1
,...,d
k
)
as the tuple of
collected observation until time
k
. Given a compact
parametrization
(
T
,
K
,d
)
of
F
, we are looking for a
chasing procedure
SEL
which given a stream of online
data
D
t
= (
d
1
,...,d
t
)
, finds as quickly as possible a
parameter
θ
consistent with
D
t
.
2
We quantify performance using competitiveness, which
is a common performance objective in the design of
online learning algorithms (Koutsoupias and Papadim-
itriou, 2000; Borodin and El-Yaniv, 2005; Chen et al.,
2018; Goel and Wierman, 2019; Shi et al., 2020; Yu
et al., 2020). Starting with some initial
θ
0
, a sequence
of parameters
θ
1
,...,θ
T
returned by
SEL
will be eval-
uated by its
total moving cost
T
j
=1
d
(
θ
j
j
1
)
. For
each time
T
, we benchmark against the best param-
eter selection
θ
1
,...,θ
T
in hindsight, that is the se-
quence with smallest moving cost assuming we know
2
Notice that if we know that the data
D
T
is collected
from a single trajectory, we can add the conditions
t
k
+1
=
t
k
+ 1
and
x
k
+1
=
x
+
k
as known constraints.
Ho, Le, Doyle, Yue
the set
P
(
D
T
)
. It is clear that the best possible choice
for
k
1
is to choose
θ
k
as some constant param-
eter:
θ
arg min
θ
P
(
D
T
)
d
(
θ
0
)
.
We denote the
corresponding optimal offline cost as
OPT
(
D
T
;
θ
0
) =
min
θ
P
(
D
T
)
d
(
θ
0
)
and evaluate it at the worst possi-
ble starting condition
θ
0
to define the offline benchmark
OPT
(
D
T
) = max
θ
0
P
(
D
0
)
OPT(
D
T
;
θ
0
)
.
Definition 4.1
(Competitive consistent model chas-
ing)
.
Let
J
T
(
D
T
;
θ
0
)
denote the total moving cost of
the online algorithm for a data sequence
D
T
and ini-
tial selection
θ
0
. We call an algorithm
γ
-competitive
for consistent model chasing in the parametrization
(
T
,
K
,d
)
, if for any data sequence
D
T
, sequence length
T
0
and
θ
0
K
, the cost
J
T
(
D
T
;
θ
0
)
is bounded as:
J
T
(
D
T
;
θ
0
)
γ
OPT
(
D
T
)
.
(11)
4.2 Reduction to chasing convex bodies
The main difficulty in selecting the parameters
θ
t
to
solve CMC competitively is that, for any time
t < T
,
we cannot guarantee to select a parameter
θ
t
which is
guaranteed to lie in the future consistent set
P
(
D
T
)
.
However, a sequence of consistent sets is always nested,
i.e.
K
P
(
D
1
)
··· ⊃
P
(
D
T
)
. This inspires a com-
petitive procedure for the CMC problem, through a
reduction to a known online learning problem.
Under the following Assumption 4.2 we can reduce
the CMC problem to a well-known problem of
nested
convex body chasing
(NCBC) (Bubeck et al., 2020).
Assumption 4.2.
Given a compact parametrization
(
T
,
K
,d
)
of the uncertainty set
F
, the consistent sets
P
(
D
)
are always convex for any data set
D ∈
D
.
This assumption is valid for instance for the general
class of robotic manipulation problems (section 5).
Nested convex body chasing (NCBC).
In NCBC,
we have access to online nested sequence
S
0
,
S
1
,...,
S
T
of convex sets in some metric space
(
M
,d
)
(i.e.:
S
t
S
t
1
). The learner selects at each time
t
a
point
p
t
from
S
t
. The goal in competitive NCBC
is to produce
p
1
,...,p
T
online such that the total
moving cost
T
j
=1
d
(
p
j
,p
j
1
)
at time
T
is competitive
with the offline-optimum, i.e. there is some
γ >
0
s.t.
T
j
=1
d
(
p
j
,p
j
1
)
γ
OPT
T
, where
OPT
T
:=
max
p
0
S
0
min
p
S
T
d
(
p,p
0
)
.
Remark 1.
NCBC is a special case of the more general
convex body chasing (CBC) problem, first introduced
by (Friedman and Linial, 1993), which studied compet-
itive algorithms for metrical goal systems.
Let the sequence of convex consistent sets
P
(
D
t
)
be
the corresponding
S
t
of the NCBC problem, any
γ
-
competitive agent
A
for the NCBC problem can instan-
tiate a
γ
-competitive selection for competitive model-
chasing, as summarized in the following reduction:
Algorithm 2
γ
-competitive CMC selection
SEL
NCBC
Require:
γ
-competitive NCBC algorithm
A
NCBC
, consis-
tent set map
P
:
D
7→
2
K
1:
procedure
SEL
NCBC
(
t,x
+
,x,u
)
2:
D
t
←D
t
1
(
t,x
+
,x,u
)
3:
S
t
P
(
D
t
)
.
construct/update new consistent set
4: present set
S
t
to
A
NCBC
5:
A
NCBC
chooses
θ
t
S
t
6:
return
θ
t
7:
end procedure
Proposition 4.3.
Consider the setting of Assump-
tion 4.2. Then any
γ
-competitive algorithm for
NCBC in metric space
(
K
,d
)
instantiates via Algo-
rithm 2 a
γ
-competitive CMC procedure
SEL
NCBC
for
the parametrization
(
T
,
K
,d
)
.
Simple competitive NCBC-algorithms in eu-
clidean space
R
n
.
When
(
K
,d
)
is a compact euclidean
finite dimensional space, recent exciting progress on
the NCBC problem provides a variety of competitive
algorithms (Argue et al., 2019, 2020; Bubeck et al.,
2020; Sellke, 2020) that can instantiate competitive
selections per Algorithm 2.
We highlight two simple instantiations based on results
in (Argue et al., 2019), and (Bubeck et al., 2020). Both
algorithms can be tractably implemented in the setting
of Assumption 4.2. The selection criteria for
SEL
p
(
D
t
)
and
SEL
s
(
D
t
)
is defined as:
SEL
p
(
D
t
) := arg min
θ
P
(
D
t
)
θ
SEL
p
(
D
t
1
)
,
(12a)
SEL
s
(
D
t
) :=
s
(
P
(
D
t
))
,
(12b)
where
SEL
p
defines simply a greedy projection operator
and where
SEL
s
selects according to the
Steiner-Point
s
(
P
(
D
t
))
of the consistent set
P
(
D
t
)
at time
t
.
Definition 4.4
(Steiner Point)
.
For a convex body
K
,
the Steiner point is defined as the following integral
over the
n
1
dimensional sphere
S
n
1
:
s
(
K
) =
n
v
S
n
1
max
x
K
v,x
vdv.
(13)
Remark 2.
As shown in Bubeck et al. (2020), the
Steiner point can be approximated efficiently by solving
randomized linear programs. We take this approach
for our later empirical validation in section 6.
The competitive analysis presented in (Bubeck et al.,
2020) can be easily adapted to establish that
SEL
p
and
SEL
s
are competitive for the CMC problem:
Corollary 4.5
(of Theorem 1.3 (Argue et al., 2019),
and Theorem 2.1 (Bubeck et al., 2020))
.
Assume
K
is
a compact convex set in
R
n
and
d
(
x,y
) :=
x
y
2
.
Then, the procedures
SEL
p
and
SEL
s
are competitive
(CMC)-algorithms with constants
γ
p
and
γ
s
:
γ
p
= (
n
1)
n
n
+1
2
,
γ
s
=
n
2
.
(14)
Robust Online Control of Nonlinear Systems with Large Uncertainty
4.3 Constructing consistent sets online
Constructing consistent sets
P
(
D
t
)
online can be ad-
dressed with tools from set-membership identification.
For a large collection of linear and nonlinear systems,
the sets
P
(
D
)
can be constructed efficiently online.
Such methods have been developed and studied in the
literature of set-membership identification, for a recent
survey see (Milanese et al., 2013). Moreover it is often
possible to construct
P
(
D
)
as an intersection of finite
half-spaces, allowing for tractable representations as
LPs. To see a particularly simple example, consider
the following nonlinear system with some unknown
parameters
α
R
M
and
η
, where
w
t
is a vector with
entries in the interval
[
η
]
:
x
t
+1
=
M
i
=1
α
i
ψ
i
(
x
t
,u
t
) +
w
t
,
(15)
where
ψ
i
:
X ×U 7→ X
are
M
known nonlinear func-
tions. If we represent the above system as an uncer-
tain system
T
with parameter
θ
= [
α
;
η
]
, it is easy
to see that the consistent sets
P
(
D
)
for some data
D
=
{
(
x
+
i
,x
i
,u
i
)
|
1
i
H
}
of
H
observations takes
the form of a polyhedron:
P
(
D
) =
{
θ
= [
α
;
η
]
|
s.t. (16) for all
1
i
H
}
,
defined by the inequalities:
[
ψ
1
(
x
i
,u
i
)
,...,ψ
M
(
x
i
,u
i
)]
α
x
+
i
+
1
η,
(16a)
[
ψ
1
(
x
i
,u
i
)
,...,ψ
M
(
x
i
,u
i
)]
α
x
+
i
1
η.
(16b)
We can see that any linear discrete-time system can
be put into the above form
(15)
. Moreover, as shown
in section 5 the above representation also applies for a
large class of (nonlinear) robotics system.
5 Examples
Here we demonstrate two illustrative examples of how
to instantiate our approach: learning to stabilize of a
scalar linear system, and learning to track a trajectory
on a fully actuated robotic system. Detailed derivations
are provided in the Appendix E.
5.1 Control of uncertain scalar linear system
Consider the basic setting of controlling a scalar lin-
ear system with unknown parameters and bounded
disturbance
|
w
k
|≤
η <
1
:
x
k
+1
=
α
x
k
+
β
u
k
+
w
k
=:
f
(
k,x
k
,u
k
)
,
with the goal to reach the interval
X
T
= [
1
,
1]
and
remain there. Equivalently, this goal can be expressed
as the objective
G
= (
G
0
,
G
1
,...
)
with cost functions:
G
t
(
x,u
) :=
{
0
,
if
|
x
|≤
1
1
,
else
,
t
0
,
since "reaching and remaining in
X
T
" is equivalent to
achieving
G
within finite mistakes. The true parameter
θ
= (
α
)
lies in the set
K
= [
a,a
]
×
[1
,
1 + 2
b
]
with known
a >
0
,
η <
1
and
b
>
0
.
Parametrization of uncertainty set.
We describe
the uncertainty set
F
=
θ
K
T
[
θ
]
through the compact
parametrization
(
T
,
K
,d
)
with parameter space
(
K
,d
)
,
d
(
x,y
) :=
x
y
,
x
:=
|
x
1
|
+
a
|
x
2
|
and the collection
of models:
T
[
θ
] :=
{
t,x,u
7→
θ
1
x
+
θ
2
u
+
w
t
|‖
w
η
}
.
Robust oracle
π
.
We use the simple deadbeat
controller:
π
[
θ
](
t,x
) :=
(
θ
1
2
)
x
. It can be easily
shown that
π
is a locally
ρ
-uniformly robust oracle for
G
for any margin in the interval
(0
,
1
η
)
.
Construction of consistent sets via LPs.
The
consistent sets
P
(
D
t
)
are convex, can be constructed
online and are the intersection of
K
with
2
t
halfspaces:
{
θ
K
|
s.t.:
i < t
:
|
θ
1
x
i
+
θ
2
u
i
x
i
+1
|≤
η
}
,
Competitive
SEL
s
via NCBC.
We instantiate
SEL
s
using Algorithm 2 with Steiner point selection. From
Corollary 4.5, we have
γ
s
=
n
2
= 1
.
Mistake guarantee for
A
π
(
SEL
s
)
The extension of
the results in Theorem A.12, (ii) apply and we obtain
for
A
π
(
SEL
s
)
and the stabilization objective
G
, the
following mistake guarantee:
t
=0
G
t
(
x
t
,u
t
)
2
2
+
ε, ε
= 2(
a
+
b
)
.
The above inequality shows that the worst-case total
number of mistakes grows quadratically with the size
of the initial uncertainty
ε
in the system parameters.
Notice, however that the above inquality holds for
arbitrary large choices of
a
and
b
. Thus,
A
π
(
SEL
s
)
gives finite mistake guarantees for this problem setting
for arbitrarily large system parameter uncertainties.
5.2 Trajectory following in robotic systems
We consider uncertain fully-actuated robotic systems.
A vast majority of robotic systems can be modeled via
the robotic equation of motion (Murray, 2017):
M
η
(
q
) ̈
q
+
C
η
(
q,
̇
q
) ̇
q
+
N
η
(
q,
̇
q
) =
τ
+
τ
d
,
(17)
where
q
R
n
is the multi-dimensional generalized co-
ordinates of the system,
̇
q
and
̈
q
are its first and second
(continuous) time derivatives,
M
η
(
q
)
,
C
η
(
q,
̇
q
)
,
N
η
(
q,
̇
q
)
are matrix and vector-value functions that depend on
the parameters
η
R
m
of the robotic system, i.e.
η
comes from parametric physical model. Often,
τ
is
the control action (e.g., torques and forces of actua-
tors), which acts as input of the system. Disturbances
and other uncertainties present in the system can be
modeled as additional torques
τ
d
R
n
perturbing the
equations. The disturbances are bounded as
|
τ
d
(
t
)
|≤
ω
,
where
ω
R
n
and the inequality is entry-wise.
Consider a system with unknown
η
,
ω
, where the
Ho, Le, Doyle, Yue
parameter
θ
= [
η
;
ω
]
is known to be contained in a
bounded set
K
. Our goal is to track a desired trajectory
q
d
, given as a function of time
q
d
:
R
7→
R
n
, within

precision, i.e.: Denoting
x
= [
q
>
,
̇
q
>
]
>
as the state
vector and
x
d
= [
q
>
d
,
̇
q
>
d
]
>
as the desired state, we want
the system state trajectory
x
(
t
)
to satisfy:
lim sup
t
→∞
x
(
t
)
x
d
(
t
)
‖≤
.
(18)
As common in practice, we assume we can observe
the sampled measurements
x
k
:=
x
(
t
k
)
,
x
d
k
:=
x
d
(
t
k
)
and apply a constant control action (zero-order-hold
actuation)
τ
k
:=
τ
(
t
k
)
at the discrete time-steps
t
k
=
kT
s
with small enough sampling-time
T
s
to allow for
continuous-time control design and analysis.
Control objective
G

.
We phrase trajectory tracking
as a control objective
G
ε
with the cost functions:
G

k
(
x,u
) :=
{
0
,
if
x
x
d
k
‖≤

1
,
else
,
k
0
.
Robust oracle design.
We outline in Appendix G
how to design a control oracle using established meth-
ods for robotic manipulators (Spong, 1992b).
Constructing consistent sets.
For many robotic
systems (for example robot manipulators), one can
derive from first principles (Murray, 2017) that the
left-hand-side of
(53)
can be factored into a
n
×
m
matrix of known functions
Y
(
q,
̇
q,
̈
q
)
and a constant
vector
η
R
m
. We can then construct consistent sets
at each time
t
as polyhedrons of the form:
P
(
D
t
) =
{
θ
K
R
m
+
n
k
t
:
A
k
θ
b
k
}
,
(19)
where
A
k
=
A
k
(
x
k
k
)
and
b
k
=
b
k
(
x
k
k
)
are a
matrix and vector of “features” constructed from
u
k
=
τ
k
and
x
t
via the known functional form of
Y
.
Designing
π
and
SEL
.
We outline in Appendix
E how to design a robust oracle based on a well-
established robust control method for robotic manipu-
lators proposed in (Spong, 1992b). Since the consistent
sets are convex and can be constructed online, we can
implement procedures
SEL
p
and
SEL
s
defined in
(12)
as competitive algorithms for the CMC problem.
Mistake guarantee for
A
π
(
SEL
p/s
)
The result-
ing online algorithm
A
π
(
SEL
p
)
or
A
π
(
SEL
s
)
guar-
antees finiteness of the total number of mistakes
k
=0
G

k
(
x
k
k
)
which implies the desired tracking be-
havior
lim sup
k
→∞
x
x
d
‖≤

. Moreover, if we can
provide a bound
M
on the mistake constant
M
π
ρ
< M
,
we obtain from Theorem 2.5, (ii) the mistake guarantee:
k
=0
G

k
(
x
k
k
)
M
(
2
L
ρ
+ 1)
,
which bounds the number of times the system could
have a tracking error larger than

.
π
[
θ
]
0
0
.
4
0
.
99
1
1
A
π
(
SEL
)
0
0
.
2
0
.
8
0
.
95
1
T
3
s
6
s
12
s
30
s
50
s
Table 1: Fraction of experiments completing the swing
up before time
T
: ideal policy
π
[
θ
]
vs.
A
π
(
SEL
)
6 Empirical Validation
We illustrate the practical potential for of our approach
on a challenging cart-pole swing-up goal from limited
amount of interaction. Compared to the standard cart-
pole domain that is commonly used in RL (Brockman
et al., 2016a), we introduce modifications motivated by
real-world concerns in several important ways:
1.
Goal specification
: the goal is to swing up and
balance the cart-pole from a down position, which
is significantly harder than balancing from the
up-right position (the standard RL benchmark).
2.
Realistic dynamics
: we use a high-fidelity
continuous-time nonlinear model, with noisy mea-
surements of discrete-time state observations.
3.
Safety
: cart position has to be kept in a bounded
interval for all time. In addition, acceleration
should not exceed a specified maximum limit.
4.
Robustness to structured adversarial disturbance
:
We evaluate on 900 uncertainty settings, each with
a different
θ
reflecting mass, length, and friction.
The tuning parameter remains the same for all ex-
periments. This robustness requirement amounts
to a generalization goal in contemporary RL.
5.
Other constraints
: no system reset is allowed dur-
ing learning (i.e., a truly continual goal).
Our introduced modification make this goal signifi-
cantly more challenging from both online learning and
adaptive control perspective. Table 2 summarizes the
results over 900 different parameter conditions (corre-
sponding to 900 adversarial settings). See Appendix G
for additional description of our setup and results.
We employ well-established techniques to synthesize
model-based oracles. The expert controllers are a hy-
brid combination of a linear state-feedback LQR around
the upright position, a so-called energy-based swing-up
controller (See (Åström and Furuta, 2000)) and a con-
trol barrier-function to satisfy the safety constraints.
As also described in (Dulac-Arnold et al., 2019), adding
constraints on state and acceleration makes learning
the swing-up of the cart-pole a significantly harder goal
for state-of-the art learning and control algorithms.
Table 2 compares the online algorithm to the corre-
sponding ideal oracle policy
π
[
θ
]
shows that the online
controller is only marginally slower.
Robust Online Control of Nonlinear Systems with Large Uncertainty
References
Yasin Abbasi-Yadkori and Csaba Szepesvári. Regret
bounds for the adaptive control of linear quadratic
systems. In
Proceedings of the 24th Annual Confer-
ence on Learning Theory
, pages 1–26, 2011.
Yasin Abbasi-Yadkori, Nevena Lazic, and Csaba
Szepesvári. Regret bounds for model-free linear
quadratic control.
arXiv preprint arXiv:1804.06021
,
2018.
Naman Agarwal, Brian Bullins, Elad Hazan, Sham M
Kakade, and Karan Singh. Online control with adver-
sarial disturbances.
arXiv preprint arXiv:1902.08721
,
2019a.
Naman Agarwal, Elad Hazan, and Karan Singh. Log-
arithmic regret for online control. In
Advances in
Neural Information Processing Systems
, pages 10175–
10184, 2019b.
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada.
Control barrier function based quadratic programs
for safety critical systems.
IEEE Transactions on
Automatic Control
, 62(8):3861–3876, Aug 2017. ISSN
1558-2523. doi: 10.1109/TAC.2016.2638961.
Brian DO Anderson, Thomas Brinsmead, Daniel Liber-
zon, and A Stephen Morse. Multiple model adaptive
control with safe switching.
International journal of
adaptive control and signal processing
, 15(5):445–470,
2001.
CJ Argue, Sébastien Bubeck, Michael B Cohen, Anu-
pam Gupta, and Yin Tat Lee. A nearly-linear bound
for chasing nested convex bodies. In
Proceedings
of the Thirtieth Annual ACM-SIAM Symposium on
Discrete Algorithms
, pages 117–122. SIAM, 2019.
CJ Argue, Anupam Gupta, Guru Guruganesh, and
Ziye Tang. Chasing convex bodies with linear com-
petitive ratio. In
Proceedings of the Fourteenth An-
nual ACM-SIAM Symposium on Discrete Algorithms
,
pages 1519–1524. SIAM, 2020.
Karl Johan Åström and Katsuhisa Furuta. Swinging
up a pendulum by energy control.
Automatica
, 36
(2):287–295, 2000.
Erin M Aylward, Pablo A Parrilo, and Jean-Jacques E
Slotine. Stability and robustness analysis of nonlin-
ear systems via contraction metrics and sos program-
ming.
Automatica
, 44(8):2163–2170, 2008.
Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J
Tomlin. Hamilton-jacobi reachability: A brief
overview and recent advances. In
2017 IEEE 56th
Annual Conference on Decision and Control (CDC)
,
pages 2242–2253. IEEE, 2017.
Franco Blanchini. Set invariance in control.
Automatica
,
35(11):1747–1767, 1999.
Vincent D Blondel and John N Tsitsiklis. A survey
of computational complexity results in systems and
control.
Automatica
, 36(9):1249–1274, 2000.
Nicholas M Boffi, Stephen Tu, and Jean-Jacques E
Slotine. Regret bounds for adaptive nonlinear control.
arXiv preprint arXiv:2011.13101
, 2020.
Allan Borodin and Ran El-Yaniv.
Online computation
and competitive analysis
. cambridge university press,
2005.
Francesco Borrelli, Alberto Bemporad, and Manfred
Morari.
Predictive control for linear and hybrid sys-
tems
. Cambridge University Press, 2017.
R. P. Braatz, P. M. Young, J. C. Doyle, and M. Morari.
Computational complexity of /spl mu/ calculation.
IEEE Transactions on Automatic Control
, 39(5):
1000–1002, 1994. doi: 10.1109/9.284879.
Greg Brockman, Vicki Cheung, Ludwig Pettersson,
Jonas Schneider, John Schulman, Jie Tang, and
Wojciech Zaremba. Openai gym.
arXiv preprint
arXiv:1606.01540
, 2016a.
Greg Brockman, Vicki Cheung, Ludwig Pettersson,
Jonas Schneider, John Schulman, Jie Tang, and Wo-
jciech Zaremba. Openai gym, 2016b.
Sébastien Bubeck, Bo’az Klartag, Yin Tat Lee, Yuanzhi
Li, and Mark Sellke. Chasing nested convex bodies
nearly optimally. In
Proceedings of the Fourteenth
Annual ACM-SIAM Symposium on Discrete Algo-
rithms
, pages 1496–1508. SIAM, 2020.
Niangjun Chen, Gautam Goel, and Adam Wierman.
Smoothed online convex optimization in high dimen-
sions via online balanced descent. In
Conference On
Learning Theory (COLT)
, 2018.
Xinyi Chen and Elad Hazan. Black-box control
for linear dynamical systems.
arXiv preprint
arXiv:2007.06650
, 2020.
Alon Cohen, Avinatan Hasidim, Tomer Koren, Nevena
Lazic, Yishay Mansour, and Kunal Talwar. Online
linear quadratic control. In
International Conference
on Machine Learning
, pages 1029–1038, 2018.
Martin Corless and George Leitmann. Continuous state
feedback guaranteeing uniform ultimate boundedness
for uncertain dynamic systems.
IEEE Transactions
on Automatic Control
, 26(5):1139–1144, 1981.
Munther A Dahleh, Theodore V Theodosopoulos, and
John N Tsitsiklis. The sample complexity of worst-
case identification of fir linear systems.
Systems &
control letters
, 20(3):157–166, 1993.
Sarah Dean, Horia Mania, Nikolai Matni, Benjamin
Recht, and Stephen Tu. On the sample complexity
of the linear quadratic regulator.
Foundations of
Computational Mathematics
, pages 1–47, 2017.