Proceedings of Machine Learning Research vol XX:1–14, 2023
Learning Disturbances Online for Risk-Aware Control:
Risk-Aware Flight with Less Than One Minute of Data
Prithvi Akella
1
PAKELLA
@
CALTECH
.
EDU
Skylar X. Wei
1
SWEI
@
CALTECH
.
EDU
Joel W. Burdick
1
JWB
@
ROBOTICS
.
CALTECH
.
EDU
Aaron D. Ames
1
AMES
@
CALTECH
.
EDU
1
1200 E California Blvd MC 104-44, Pasadena, CA 91101
Abstract
Recent advances in safety-critical risk-aware control are predicated on
apriori
knowledge of the
disturbances a system might face. This paper proposes a method to efficiently learn these distur-
bances online, in a risk-aware context. First, we introduce the concept of a
Surface-at-Risk
, a risk
measure for stochastic processes that extends Value-at-Risk — a commonly utilized risk measure in
the risk-aware controls community. Second, we model the norm of the state discrepancy between
the model and the true system evolution as a scalar-valued stochastic process and determine an
upper bound to its
Surface-at-Risk
via Gaussian Process Regression. Third, we provide theoretical
results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with
respect to the data sets collected during system operation. Finally, we experimentally verify our
procedure by augmenting a drone’s controller and highlight performance increases achieved via
our risk-aware approach after collecting less than a minute of operating data.
Keywords:
Value-at-Risk, Risk-Aware Control, Gaussian Process, Scenario Optimization
1. Introduction
The models we use for control synthesis are useful, though oftentimes inaccurate. To wit, re-
duced order models are heavily utilized for controller synthesis for complex robotic systems,
e.g.
quadrupeds, bipeds, drones,
etc
(Bouman et al. (2020); Fan et al. (2021); Ubellacker et al. (2021);
Xiong (2021)). However, these models require robustification to disturbances (e.g. to compensate
for the gap between the reduced and full order models) to function reliably on these complex sys-
tems (Thieffry et al. (2018); Kim et al. (2020); Alan et al. (2021); Kolathaya and Ames (2018);
Ahmadi et al. (2020)). As a result, recent studies on the robust control of nonlinear systems cen-
ter around input-to-state-safe control (Kolathaya and Ames (2018); Romdlony and Jayawardhana
(2016); Taylor et al. (2020)) and risk-aware control (Ahmadi et al. (2020); Lindemann et al. (2021);
Majumdar and Pavone (2020); Dixit et al. (2021); Akella et al. (2022a)) among other techniques.
These methods typically assume
apriori
knowledge of a model and possible disturbances (or at least
the magnitude thereof) and employ control techniques designed to reject those known disturbances.
On the other hand, learning-based approaches attempt to identify the underlying model (Buisson-
Fenet et al. (2020); Nguyen-Tuong and Peters (2011); Jain et al. (2018); Berkenkamp and Schoellig
(2015); Folkestad et al. (2022); Westenbroek et al. (2021); Wang et al. (2018)), in many cases
through Gaussian Process Regression (GPR) (Williams and Rasmussen (2006)).
© 2023 P. Akella
1
, S.X. Wei
1
, J.W. Burdick
1
& A.D. Ames
1
.
arXiv:2212.06253v1 [eess.SY] 12 Dec 2022
L
EARNING
D
ISTURBANCES
O
NLINE FOR
R
ISK
-A
WARE
C
ONTROL
Wind
Ground
Effect
Tether
A
B
C
D
E
Controller
True
System
Nominal
Model
Learning Disturbances
Figure 1: (Top Left) A general overview of our procedure, (Top Right) a photo of our experimental
setup, and (Bottom) snippets of flight paths taken by the drone during the second set of
experiments run — the experiments depicted on the left in Figure 3. Our procedure has
two parts. First, we implement a nominal controller and calculate norm discrepancies
between predicted model evolution and true system evolution. Then, we fit, via gaussian
process regression, a risk-aware disturbance model for the disturbances that the nominal
system experiences. We show in Section 4 how our procedure dramatically improves
baseline controller performance and provide a statement on the theoretical accuracy of
our model in Section 3.
However, assuming
apriori
knowledge of disturbances might not be accurate in real-world set-
tings, and gaussian process regression for model determination tends to be sample-complex and only
uncover expected system behavior. While learning expected behavior is indeed useful, control pred-
icated on expected models of system behavior might yield problematic behavior in safety-critical
settings where risk-sensitive approaches are preferable (Ahmadi et al. (2021); Ono et al. (2018)).
Skipping the model identification step, recent work in Bayesian Optimization and Reinforcement
Learning aims to identify such risk-aware policies in a model-free fashion (Cakmak et al. (2020);
Makarova et al. (2021); Heger (1994); Chow et al. (2017); Mihatsch and Neuneier (2002); Geibel
and Wysotzki (2005)). However, these prior works assume an ability to sample disturbances di-
rectly, assume
apriori
knowledge of disturbances, or are sample-complex.
Our Contribution:
We propose a risk-aware model augmentation approach via learning distur-
bance models online that does not require
apriori
disturbance knowledge. Our approach is sample-
efficient as shown in Section 4, where we require less than a minute of flight data to make risk-aware
2
L
EARNING
D
ISTURBANCES
O
NLINE FOR
R
ISK
-A
WARE
C
ONTROL
control improvements on a drone mid-flight. Furthermore, by building off prior work (Akella et al.
(2022b,a)), we both define and ensure that our learned disturbance surface is a
Surface-at-Risk
for
the stochastic process accounting for the discrepancy between model and true system evolution.
Hence, augmenting the controller with our learned disturbance model yields an efficient risk-aware
controller as we demonstrate experimentally.
Structure:
Section 2.1 provides a brief background on gaussian process regression, and Section 2.2
formally defines a
Surface-at-Risk
for a stochastic process. Section 3 presents the problem of upper-
bounding such a surface and provides a theoretical statement on the accuracy of our procedure with
respect to identifying such an upper bound. Finally, Section 4 showcases the utility of our procedure
for risk-aware control of a drone with online disturbance learning.
2. Mathematical Preliminaries and Definitions
2.1. A Brief Aside on Gaussian Process Regression
A key concept in our approach is the notion of
Surfaces-at-Risk
which we fit via GPR as part of our
procedure. GPR typically assumes the existence of an unknown function
f
:
X
→
R
that we aim to
represent by taking noisy samples
y
of
f
at points
x
∈
X
where the noise
ξ
is typically assumed to
be sub-Gaussian (Srinivas et al. (2009); Chowdhury and Gopalan (2017); Williams and Rasmussen
(2006)). Let
X
=
{
x
i
}
N
i
=1
be a set of
N
points
x
∈
X
and
Y
be the corresponding set of noisy
observations,
i.e.
Y
=
{
y
i
=
f
(
x
i
) +
ξ,
∀
x
i
∈
X
}
. Furthermore, let
k
:
X
×
X
→
R
be a
positive-definite
kernel function
. Then, a
gaussian process
is uniquely defined by its mean function
μ
:
X
→
R
and its variance function
σ
:
X
→
R
. These functions are defined as follows, with
k
N
(
x
) = [
k
(
x,x
i
)]
x
i
∈
X
,
K
= [
k
(
x
i
,x
j
)]
x
i
,x
j
∈
X
,
y
1:
N
= [
y
i
]
y
i
∈
Y
, and
λ
= (1 +
2
N
)
:
μ
N
(
x
) =
k
N
(
x
)
T
(
K
+
λI
N
)
−
1
y
1:
N
, σ
N
(
x
) =
k
N
(
x,x
)
,
(1)
k
N
(
x,x
′
) =
k
(
x,x
′
)
−
k
N
(
x
)
T
(
K
N
+
λI
)
−
1
k
N
(
x
′
)
.
Lastly, each kernel function has a space of functions it can reproduce to point-wise accuracy,
it’s Reproducing Kernel Hilbert Space (RKHS). Under the assumption that the function to-be-fitted
f
has bounded norm in the RKHS of the chosen kernel
k
, GPR guarantees high-probability rep-
resentation of
f
as formalized in the theorem below, taken from Chowdhury and Gopalan (2017):
Theorem 1
Let
f
:
X
→
R
,
X
=
{
x
i
}
N
i
=1
be a set of
N
points
x
∈
X
,
Y
=
{
y
i
=
f
(
x
i
) +
ξ
}
x
i
∈
X
be a set of noisy observations
y
i
of
f
(
x
i
)
with
R
sub-gaussian noise
ξ
, and
k
:
X
×
X
→
R
be a
positive-definite kernel function. If
f
has
B
-bounded RKHS norm for some
B >
0
, i.e.
‖
f
‖
RKHS
≤
B
, then, with
μ
N
and
σ
N
as per
(1)
and with minimum probability
1
−
δ
,
|
μ
N
(
x
)
−
f
(
x
)
|≤
B
+
R
√
√
√
√
2 ln
√
det
(
(1 +
2
N
)
I
N
+
K
N
)
δ
σ
N
(
x
)
,
∀
x
∈
X.
2.2. Surfaces-at-Risk for Scalar Stochastic Processes
This section formally defines a
Surface-at-Risk
for a scalar stochastic process — the specific struc-
ture we aim to fit via GPR. Given a probability space
(Ω
,
F
,
P
)
with
Ω
a sample space,
F
a
σ
-algebra
3