2212.06253.pdf

Proceedings of Machine Learning Research vol XX:1–14, 2023

Learning Disturbances Online for Risk-Aware Control:

Risk-Aware Flight with Less Than One Minute of Data

Prithvi Akella

PAKELLA

CALTECH

EDU

Skylar X. Wei

SWEI

CALTECH

EDU

Joel W. Burdick

JWB

ROBOTICS

CALTECH

EDU

Aaron D. Ames

AMES

CALTECH

EDU

1

1200 E California Blvd MC 104-44, Pasadena, CA 91101

Abstract

Recent advances in safety-critical risk-aware control are predicated on

apriori

knowledge of the

disturbances a system might face. This paper proposes a method to efficiently learn these distur-

bances online, in a risk-aware context. First, we introduce the concept of a

Surface-at-Risk

, a risk

measure for stochastic processes that extends Value-at-Risk — a commonly utilized risk measure in

the risk-aware controls community. Second, we model the norm of the state discrepancy between

the model and the true system evolution as a scalar-valued stochastic process and determine an

upper bound to its

Surface-at-Risk

via Gaussian Process Regression. Third, we provide theoretical

results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with

respect to the data sets collected during system operation. Finally, we experimentally verify our

procedure by augmenting a drone’s controller and highlight performance increases achieved via

our risk-aware approach after collecting less than a minute of operating data.

Keywords:

Value-at-Risk, Risk-Aware Control, Gaussian Process, Scenario Optimization

1. Introduction

The models we use for control synthesis are useful, though oftentimes inaccurate. To wit, re-

duced order models are heavily utilized for controller synthesis for complex robotic systems,

e.g.

quadrupeds, bipeds, drones,

etc

(Bouman et al. (2020); Fan et al. (2021); Ubellacker et al. (2021);

Xiong (2021)). However, these models require robustification to disturbances (e.g. to compensate

for the gap between the reduced and full order models) to function reliably on these complex sys-

tems (Thieffry et al. (2018); Kim et al. (2020); Alan et al. (2021); Kolathaya and Ames (2018);

Ahmadi et al. (2020)). As a result, recent studies on the robust control of nonlinear systems cen-

ter around input-to-state-safe control (Kolathaya and Ames (2018); Romdlony and Jayawardhana

(2016); Taylor et al. (2020)) and risk-aware control (Ahmadi et al. (2020); Lindemann et al. (2021);

Majumdar and Pavone (2020); Dixit et al. (2021); Akella et al. (2022a)) among other techniques.

These methods typically assume

apriori

knowledge of a model and possible disturbances (or at least

the magnitude thereof) and employ control techniques designed to reject those known disturbances.

On the other hand, learning-based approaches attempt to identify the underlying model (Buisson-

Fenet et al. (2020); Nguyen-Tuong and Peters (2011); Jain et al. (2018); Berkenkamp and Schoellig

(2015); Folkestad et al. (2022); Westenbroek et al. (2021); Wang et al. (2018)), in many cases

through Gaussian Process Regression (GPR) (Williams and Rasmussen (2006)).

, S.X. Wei

, J.W. Burdick

& A.D. Ames

arXiv:2212.06253v1 [eess.SY] 12 Dec 2022

EARNING

ISTURBANCES

NLINE FOR

ISK

-A

WARE

ONTROL

Wind

Ground

Effect

Tether

Controller

True

System

Nominal

Model

Learning Disturbances

Figure 1: (Top Left) A general overview of our procedure, (Top Right) a photo of our experimental

setup, and (Bottom) snippets of flight paths taken by the drone during the second set of

experiments run — the experiments depicted on the left in Figure 3. Our procedure has

two parts. First, we implement a nominal controller and calculate norm discrepancies

between predicted model evolution and true system evolution. Then, we fit, via gaussian

process regression, a risk-aware disturbance model for the disturbances that the nominal

system experiences. We show in Section 4 how our procedure dramatically improves

baseline controller performance and provide a statement on the theoretical accuracy of

our model in Section 3.

However, assuming

apriori

knowledge of disturbances might not be accurate in real-world set-

tings, and gaussian process regression for model determination tends to be sample-complex and only

uncover expected system behavior. While learning expected behavior is indeed useful, control pred-

icated on expected models of system behavior might yield problematic behavior in safety-critical

settings where risk-sensitive approaches are preferable (Ahmadi et al. (2021); Ono et al. (2018)).

Skipping the model identification step, recent work in Bayesian Optimization and Reinforcement

Learning aims to identify such risk-aware policies in a model-free fashion (Cakmak et al. (2020);

Makarova et al. (2021); Heger (1994); Chow et al. (2017); Mihatsch and Neuneier (2002); Geibel

and Wysotzki (2005)). However, these prior works assume an ability to sample disturbances di-

rectly, assume

apriori

knowledge of disturbances, or are sample-complex.

Our Contribution:

We propose a risk-aware model augmentation approach via learning distur-

bance models online that does not require

apriori

disturbance knowledge. Our approach is sample-

efficient as shown in Section 4, where we require less than a minute of flight data to make risk-aware

EARNING

ISTURBANCES

NLINE FOR

ISK

-A

WARE

ONTROL

control improvements on a drone mid-flight. Furthermore, by building off prior work (Akella et al.

(2022b,a)), we both define and ensure that our learned disturbance surface is a

Surface-at-Risk

for

the stochastic process accounting for the discrepancy between model and true system evolution.

Hence, augmenting the controller with our learned disturbance model yields an efficient risk-aware

controller as we demonstrate experimentally.

Structure:

Section 2.1 provides a brief background on gaussian process regression, and Section 2.2

formally defines a

Surface-at-Risk

for a stochastic process. Section 3 presents the problem of upper-

bounding such a surface and provides a theoretical statement on the accuracy of our procedure with

respect to identifying such an upper bound. Finally, Section 4 showcases the utility of our procedure

for risk-aware control of a drone with online disturbance learning.

2. Mathematical Preliminaries and Definitions

2.1. A Brief Aside on Gaussian Process Regression

A key concept in our approach is the notion of

Surfaces-at-Risk

which we fit via GPR as part of our

procedure. GPR typically assumes the existence of an unknown function

→

that we aim to

represent by taking noisy samples

at points

∈

where the noise

is typically assumed to

be sub-Gaussian (Srinivas et al. (2009); Chowdhury and Gopalan (2017); Williams and Rasmussen

(2006)). Let

{

}

be a set of

points

∈

and

be the corresponding set of noisy

observations,

i.e.

{

(

) +

ξ,

∀

∈

}

. Furthermore, let

→

be a

positive-definite

kernel function

. Then, a

gaussian process

is uniquely defined by its mean function

→

and its variance function

→

. These functions are defined as follows, with

(

) = [

(

x,x

)]

∈

= [

(

)]

∈

= [

]

∈

, and

= (1 +

)

(

) =

(

)

(

λI

)

−

, σ

(

) =

(

x,x

)

(1)

(

x,x

′

) =

(

x,x

′

)

−

(

)

(

λI

)

−

(

′

)

Lastly, each kernel function has a space of functions it can reproduce to point-wise accuracy,

it’s Reproducing Kernel Hilbert Space (RKHS). Under the assumption that the function to-be-fitted

has bounded norm in the RKHS of the chosen kernel

, GPR guarantees high-probability rep-

resentation of

as formalized in the theorem below, taken from Chowdhury and Gopalan (2017):

Theorem 1

Let

→

{

}

be a set of

points

∈

{

(

) +

}

∈

be a set of noisy observations

(

)

with

sub-gaussian noise

, and

→

be a

positive-definite kernel function. If

has

-bounded RKHS norm for some

B >

, i.e.

‖

RKHS

≤

, then, with

and

as per

(1)

and with minimum probability

−

(

)

−

(

)

|≤







√

2 ln

√

det

(

(1 +

)







(

)

∀

∈

2.2. Surfaces-at-Risk for Scalar Stochastic Processes

This section formally defines a

Surface-at-Risk

for a scalar stochastic process — the specific struc-

ture we aim to fit via GPR. Given a probability space

(Ω

)

with

Ω

a sample space,

-algebra