of 7
Meta-Learning-Based Robust Adaptive Flight
Control Under Uncertain Wind Conditions
Michael O’Connell, Guanya Shi, Xichen Shi, and Soon-Jo Chung
Abstract
— Realtime model learning proves challenging for
complex dynamical systems, such as drones flying in variable
wind conditions. Machine learning technique such as deep
neural networks have high representation power but is often too
slow to update onboard. On the other hand, adaptive control
relies on simple linear parameter models can update as fast as
the feedback control loop. We propose an online composite
adaptation method that treats outputs from a deep neural
network as a set of basis functions capable of representing
different wind conditions. To help with training, meta-learning
techniques are used to optimize the network output useful for
adaptation. We validate our approach by flying a drone in
an open air wind tunnel under varying wind conditions and
along challenging trajectories. We compare the result with other
adaptive controller with different basis function sets and show
improvement over tracking and prediction errors.
I. I
NTRODUCTION
For a given dynamical system, complexity and uncertainty
can arise either from its inherent property or the changing
environment. Thus model accuracy is often key in designing
a high-performance and robust control system. If the model
structure is known, conventional system identification tech-
niques can be used to resolve the parameters of the model.
When the system becomes too complex to model analytically,
modern machine learning research conscripts data-driven and
neural network approaches that often result in bleeding-edge
performance given enough samples, proper tuning, and ade-
quate time for training. However, the harsh requirement on
a learning-based control system calls for both representation
power and fast in execution simultaneously. Thus it is natural
to seek wisdom from the classic field of adaptive control,
where successes have been seen using simple linear-in-
parameter models with provably robust control designs [1],
[2]. On the other hand, the field of machine learning has
made its own progress toward fast online paradigm, with the
rising interest in few-shot learning [3], continual learning [4],
[5], and meta learning [6], [7].
A particular interesting scenario for a system in a changing
environment is a multi-rotor flying in varying wind con-
ditions. Classic multi-rotor control does not consider the
aerodynamic forces such as drag or ground effects [8].
The thruster direction is controlled to follow the desired
acceleration along a trajectory. To account for aerodynamic
forces in practice, an integral term is often added to the
velocity controller [9]. Recently, [10] uses incremental non-
linear dynamic inversion (INDI) to estimate external force
through filtered accelerometer measurement, and then apply
Authors are with California Institute of Technology, USA.
{
moc,
gshi, xshi, sjchung
}
@caltech.edu
.
Fig. 1.
Intel Aero drone flying in front of Caltech CAST Fan Array.
direct force cancellation in the controller. [11] assumed a
diagonal rotor drag model and proved differential flatness
of the system for cancellation, and [12] used a nonlinear
aerodynamic model for force prediction. When a linear-in-
parameter (LIP) model is available, adaptive control theories
can be applied for controller synthesis. This does not limit the
model to only physics-based parameterizations, and a neural
network basis can be used [13], [2]. It has been applied to
multi-rotor for wind disturbance rejection in [14].
When adapting to complex system dynamics or fast chang-
ing environment, one would expect the network approximator
to have enough representation power, which makes a deep
neural network (DNN) an desirable candidate. However,
there are several issues associated with using a deep network
for adaptive control purpose. First, training a DNN often
requires back propagation, easily leading to a computation
bottleneck for realtime control on small drones. Second,
continual online training may incur catastrophic inference
where previously learned knowledge is forgotten uninten-
tionally. Third, a vanilla network for a regression problem
often does not have guarantees on desirable properties for
control design, such as output boundedness and Lipschitz
continuity. Fortunately, advances have been made in circum-
venting these issues. Training a deep network by updating
the last layer’s weight more frequently than the rest of the
network is proven to work for approximating
Q
-function in
reinforcement learning [15], [16]. This enables the possibility
of fast adaptation without incurring high computation burden.
Spectral normalization on all the network weights can con-
strain the Lipschitz constant [17]. We used this technique in
our prior work [18] to derive stable DNN-based controller
for multirotor landing.
We thus propose the following method for an online
arXiv:2103.01932v1 [cs.RO] 2 Mar 2021
composite adaptive control based on DNN. We approximate
the unknown part of our dynamics model with a DNN trained
offline with previously collected data. When deployed online,
only the last layer weights are updated in a fashion similar
to composite adaptive control [19]. The training process
employs model-agnostic meta-learning (MAML) technique
from [7] to facilitate hidden layer outputs becoming good
basis functions for online adaptation. All the network weights
are spectrally normalized during training as well as online
adaptation, to constrain approximator Lipschitz constant,
which was proven to be a necessary condition for stable
control design [18].
II. PROBLEM STATEMENT
A. Mixed Model for Robot Dynamics
Consider the general robot dynamics model:
H
(
q
) ̈
q
+
C
(
q,
̇
q
) ̇
q
+
g
(
q
) +
f
(
q,
̇
q
;
c
) =
τ
(1)
where
q,
̇
q,
̈
q
R
n
are the
n
dimensional position, velocity,
and acceleration vectors,
H
(
q
)
is the symmetric, positive
definite inertia matrix,
C
(
q,
̇
q
) ̇
q
is the centripetal and Cori-
olis torque vector,
g
(
q
)
is the gravitational torque vector,
f
(
q,
̇
q
;
c
)
incorporates unmodeled dynamics, and
c
=
c
(
t
)
is
hidden state used to represent changing envirnoment.
We approximate the unmodeled dynamics term by a linear
combination of a set of
m
neural network kernels. We cons-
dier two formulations here. First, we approximate
f
(
q,
̇
q
;
c
)
by linearly combining
m
outputs from
m
separately trained
neural networks
i
:
R
n
×
R
n
R
n
parameterized by
θ
i
:
f
(
q,
̇
q
;
c
)
m
i
=1
a
i
(
c
)
i
(
q,
̇
q
;
θ
i
) =
φ
(
q,
̇
q
; Θ)
a
(
c
)
(2)
where
a
(
c
) = [
a
i
(
c
)]
R
n
and the kernels are stacked such
that
φ
(
q,
̇
q
; Θ) = [
(
q,
̇
q,θ
i
)
i
]
and
Θ = [
θ
i
]
.
Second, we consider the alternative formulation where
f
(
q,
̇
q
;
c
)
is approximated with a single neural network,
where
a
represents the weights of its last layer, and
{
φ
i
}
represent the hidden states before the last layer. This can be
explicitly written as
f
(
q,
̇
q
;
c
)
m
i
=1
n
j
=1
a
i,j
(
c
)
φ
i
(
q,
̇
q
;
θ
i
)ˆe
j
=
φ
(
q,
̇
q
; Θ)
a
(
c
)
φ
(
q,
̇
q
; Θ)
,
[
φ
i
]
T
0
0
0
[
φ
i
]
T
0
0
0
[
φ
i
]
T
(3)
where
ˆe
j
represent the standard basis vectors.
In both cases, maximum representation error,

, is

= min
a
R
m
max
(
q,
̇
q,c
)
Ξ
φ
(
q,
̇
q
; Θ)
a
f
(
q,
̇
q
;
c
)
(4)
where
Ξ
is the compact domain of interest. Note, the bound-
edness of

is apparent under the assumption of bounded Lip-
schitz constant of
f
(
q,
̇
q,c
)
and bounded training error. Given
Θ
, the goal then is to design a control law,
τ
(
q,
̇
q,q
d
,
̇
q
d
)
,
that drives
(
q,
̇
q
)
(
q
d
,
̇
q
d
)
, subject to dynamics in (1).
B. Quadrotor Position Control Subject to Uncertain Wind
Conditions
Now we specialize the problem to quadrotors. Consider
states given by global position,
p
R
3
, velocity
v
R
3
,
attitude rotation matrix
R
SO(3)
, and body angular
velocity
ω
R
3
. Then dynamics of a quadrotor is
̇
p
=
v,
m
̇
v
=
mg
+
Rf
u
+
f
a
,
(5a)
̇
R
=
RS
(
ω
)
, J
̇
ω
=
×
ω
+
τ
u
+
τ
a
,
(5b)
where
m
is the mass and
J
is the inertia matrix of
the system,
S
(
·
)
is the skew-symmetric mapping,
g
=
[0
,
0
,
g
]
>
is the gravity vector,
f
u
= [0
,
0
,T
]
T
and
τ
u
=
[
τ
x
y
z
]
T
are the total thrust and body torques from four
rotors predicted by a nominal model, and
f
a
= [0
,
0
,T
]
T
and
τ
a
= [
τ
x
y
z
]
T
are forces and torques resulting
from unmodelled aerodynamic effects due to varying wind
conditions.
For drones in strong wind conditions, the primary
disturbance is unmodelled aerodynamic forces
f
a
=
[
f
a,x
,f
a,y
,f
a,z
]
>
. Thus, considering only position dynamics,
we cast (5a) into the form of (1), by taking
H
(
q
) =
m
I
,
where
I
is the identity matrix,
C
(
q,
̇
q
)
0
,
g
=
m
g
,
f
(
q,
̇
q
;
c
) =
f
a
, and
τ
=
R
f
u
. Note that the quadrotor attitude
dynamics is just a special case of (1).
C. Meta-Learning and Adaptation Goal
Suppose we have pre-collected meta-training data
D
meta
=
{
D
1
,D
2
,
···
,D
T
}
, with
T
sub datasets. In each sub
dataset,
D
i
, we have
L
i
state and force measurement pairs,
([
q
k
,
̇
q
k
]
,
ˆ
f
(
q
k
,
̇
q
k
,c
i
))
, generated from some fixed but un-
known wind condition, represented by
c
i
. The goal of meta-
learning is to generate a set of parameters,
Θ =
{
θ
i
}
m
i
=1
, such
that a linear combination of the neural net kernels,
{
φ
i
}
m
i
=1
,
can represent any wind condition with small error.
Consequently, the adaptive controller aims to stabilize the
system to a desired trajectory given the prior information
of the dynamic model (1) and the learned kernels, (2) or
(3). If exponential convergence is guaranteed, the closed-loop
system to aerodynamic effects not captured by the prior from
meta learning, encapsulated in

.
III. META-LEARNING NEURAL NET KERNELS
Recall that the meta learning goal is to learn a set of
kernel functions,
{
φ
(
q,
̇
q
)
}
, such that for any wind condition,
c
, there exists a suitable
a
such that
φ
(
q,
̇
q,
Θ)
is a good
approximation of
f
(
q,
̇
q,c
)
. We can formulate this problem
as the minimization problem
Θ
= min
a
i
,
Θ
i
J
i
(
a
i
,
Θ)
(6)
with
J
i
(
a
i
,
Θ
,
(
q
i
(
·
)
,
̇
q
i
(
·
)))
(7)
=
t
f
0
f
(
q
i
(
r
)
,
̇
q
i
(
r
)
,c
i
)
φ
(
q
i
(
r
)
,
̇
q
i
(
r
)
,
Θ)
a
i
2
dr
J
(
a
i
,
Θ
,D
i
) = ∆
t
·
(
q,
̇
q,f
)
D
i
f
φ
(
q,
̇
q
)
,
Θ)
a
i
2
where the training data is divided into subsets,
D
i
, each
corresponding to a fixed wind conditions,
Note that (6) can be equivalently written as
min
Θ
i
min
a
i
J
i
(
a
i
,
Θ)
(8)
The inner problem,
i
min
a
i
J
i
(
a
i
,
Θ)
, is simply a linear
least squares problem and can be solved exactly for a fixed
value of
Θ
. Since there are many more training examples,
given as discrete measurements of
(
q,
̇
q,f
(
q,
̇
q,c
)
, than pa-
rameters,
a
, the least squares solution is over determined and
we can approximate it well with the least squares solution on
a small, randomly sampled subset of the training examples,
D
a
i
. The remaining examples
D
Θ
i
, such that
D
a
i
D
Θ
i
=
D
i
and
D
a
i
D
Θ
i
=
.
Write the least squares solution for
a
as
a
=
a
LS
,D
a
i
)
.
(9)
Note that this solution can be explicitly written as the
solution to the following equation.
φ
(
q
(1)
,
̇
q
(1)
)
φ
(
q
(2)
,
̇
q
(2)
)
.
.
.
φ
(
q
(
K
)
,
̇
q
(
K
)
)
︷︷
Φ
R
K
×
m
a
=
f
(1)
f
(2)
.
.
.
f
(
K
)
︷︷
F
.
(10)
where
K
is the size of
D
a
i
. Therefore the least-square
solution will be
a
LS
,D
a
i
) =
LS
,D
a
i
) = (Φ
T
Φ)
1
Φ
T
F
(11)
Now with
a
as a function of
Θ
, we can solve the outer
problem in (8) using stochastic gradient descent on
Θ
. This
gives the following iterative algorithm for solving for
Θ
.
Algorithm 1
Meta-Learning Algorithm
1:
procedure
M
ETA
-L
EARNING
(
{
D
i
}
)
.
Θ
2:
initialize
Θ
0
randomly,
k
= 0
3:
repeat
4:
for
i
∈{
1
,...T
}
do
5:
(
D
a
i
,D
Θ
i
)
random split
D
i
6:
Θ
k
+1
= Θ
k
β
·∇
Θ
k
J
i
(
a
LS
k
,D
a
i
)
,
Θ
k
,D
Θ
i
)
7:
end for
8:
k
k
+ 1
9:
until
converged
10:
end procedure
11:
return
Θ
k
Note, if
D
a
i
=
D
Θ
i
=
D
i
then asymptotic convergence
is guaranteed since solving the least squares problem is
monotonically decreasing during each iteration, the batch
update law is monotonically decreasing for small enough
β
,
and the 2-norm, and hence the cost, is lower bounded by 0.
IV. ROBUST COMPOSITE ADAPTATION
Recall the control design objective is to design a control
system that leverages the kernels,
φ
(
q,
̇
q
; Θ)
, to stabilize the
system defined in (1), to some desired trajectory
(
q
d
,
̇
q
d
)
.
Treating
Θ
as fixed, we will not notate dependence on
Θ
in
this section. The control system will have two parts: a control
law,
τ
(
q,
̇
q,q
d
,
̇
q
d
,
ˆ
a
)
, and an update law,
ˆ
a
(
q,
̇
q,q
d
,
̇
q
d
)
.
In the process of designing the control system, we make
a few key assumptions.
Assumption 1:
The desired trajectory and its first and
second derivative,
{
q
d
(
t
)
,
̇
q
d
(
t
)
,
̈
q
d
(
t
)
}
, are bounded.
Assumption 2:
The flown flight trajectory,
(
q
(
t
)
,
̇
q
(
t
))
,
and the current wind conditions,
c
, are a subset of
Ξ
.
Thus, the optimal parameters for the flown flight tra-
jectory and current wind conditions, given by
a
=
a
LS
,
(
q
(
t
)
,
̇
q
(
t
)
,f
(
q
(
t
)
,
̇
q
(
t
)
,c
))
, with pointwise represen-
tation error,
d
(
q,
̇
q
) =
φ
(
q,
̇
q
; Θ)
a
f
(
q,
̇
q
;
c
)
, have maxi-
mum representation error along the flown flight trajectory,
d
,
less than the maximum global representation error,

. That
is,
d
(
q,
̇
q
)
,
φ
(
q,
̇
q
; Θ)
a
f
(
q,
̇
q
;
c
)
‖≤
d
(12)
d
,
min
a
R
m
max
t
φ
(
q
(
t
)
,
̇
q
(
t
))
a
f
(
q
(
t
)
,
̇
q
(
t
)
,c
)
‖≤

(13)
Note that for time varying optimal parameters,
a
=
a
(
t
)
,
we can follow the same formulation but have an additional
disturbance term proportional to
̇
a
.
A. Nonlinear Control Law
In formulating our control problem, we first define the
composite velocity tracking error term,
s
, and the reference
velocity,
̇
q
r
, such that
s
= ̇
q
̇
q
r
=
̇
̃
q
+ Λ ̃
q
, where
̃
q
=
q
q
d
is the position tracking error, and
Λ
is a control gain and
positive definite. Then given parameter estimate
ˆ
a
, we define
the following control law
τ
=
H
(
q
) ̈
q
r
+
C
(
q,
̇
q
) ̇
q
r
+
g
(
q
) +
φ
(
q,
̇
q
a
Ks
(14)
where
K
is another positive definite control gain. Combining
(1) and (14) leads to the closed-loop dynamics of
H
(
q
) ̇
s
+ (
C
(
q,
̇
q
) +
K
)
s
φ
(
q,
̇
q
) ̃
a
=
d
(
q,
̇
q
)
(15)
B. Composite Adaptation Law
We will define an adaptation law that combines a tracking
error update term, a prediction error update term, and a
regularization term. This formulation follows [19] with the
inclusion of regularization.
First, we define the prediction error as
e
(
q,
̇
q
)
,
φ
(
q,
̇
q
a
f
(
q,
̇
q,c
) =
φ
(
q,
̇
q
) ̃
a
+
d
(
q,
̇
q
)
(16)
Next, we filter the right hand side of (16) with a stable first-
order filter with step response
w
(
r
)
to define the filtered
prediction error.
e
1
a,t
)
,
W
(
t
a
y
1
(
t
) =
W
(
t
) ̃
a
+
d
1
(
t
)
(17)
with filtered measurement,
y
1
=
y
1
(
t
) =
t
0
w
(
t
r
)
y
(
r
)
dr
,
filtered kernel function,
W
=
W
(
t
) =
t
0
w
(
t
r
)
φ
(
r
)
dr
,
and filtered disturbance,
d
1
=
d
1
(
t
) =
t
0
w
(
t
r
)
d
(
r
)
dr
.
Now consider the following cost function.
J
2
a
) =
t
0
e
λ
(
t
r
)
W
(
r
a
y
1
(
r
)
2
dr
+
γ
ˆ
a
2
(18)
Note this is is closely related to the cost function defined
in (6), with three modifications. First, the inclusion of an
exponential forgetting factor will lead to exponential conver-
gence of
̄
P
1
̃
a
, where
̄
P
will be defined in (21). Second,
the regularization term,
γ
ˆ
a
2
will guarantee invertibility
of
̄
P
, even without the persistence of excitation condition
usually required to guarantee parameter convergence. How-
ever, note that lack of persistence of excitation could lead to
poorly conditioned
̄
P
1
. Regularization also introduces an
additional disturbance term proportional to
a
, as seen later
in (24). Third, this cost function uses the filtered prediction
error instead of the unfiltered prediction error to smooth the
update law and to allow use to remove
̈
q
from the update
law via integration by parts on
y
1
.
Note that
J
2
is quadratic and convex in
ˆ
a
, leading to a
simple closed form solution for
ˆ
a
. However, this requires
evaluating an integral over the entire trajectory at every
time step, so differentiating this closed form solution for
ˆ
a
gives the following prediction error with regularization
regularization update law.
̇
ˆ
a
=
̄
P
(
W
T
e
1
+
λγ
ˆ
a
)
(19)
̇
̄
P
=
(
λ
λγ
̄
P
̄
PW
T
W
)
̄
P
(20)
where
̄
P
,
(
t
0
e
λ
(
t
r
)
W
T
Wdr
+
γ
)
1
(21)
Now, we define the composite adaptation law with reg-
ularization, which incorporates an additional tracking error
based term proportional to
s
into (19). Later, we will see that
this tracking error term exactly cancels the
̃
a
term in (15).
̇
ˆ
a
=
̄
P
(
φ
T
s
+
W
T
e
1
+
λγ
ˆ
a
)
(22)
Theorem 4.1:
Under Assumptions 1 and 2 and using the
control law defined in (14), the composite tracking error
and parameter estimation error evolving according to the
dynamics in (15) and adaptation law in (20-22) exponentially
converge to the error ball
[
s
̃
a
]
sup
t
[
κ
(
H
(
q
(
t
)))
(
̄
P
1
)]
min[
k,
λ
2
(
λ
min
(
̄
P
1
) +
γ
)]
sup
[
d
W
T
d
1
+
λγa
]
(23)
where
k
=
λ
min
K
and
κ
(
·
)
is the condition number.
Proof:
Rearranging the composite tracking error dy-
namics and the parameter estimate dynamics, defined in (15)
and (22), and using the derivative of
̄
P
1
given in (20), we
get the combined closed loop dynamics
[
H
(
q
)
0
0
̄
P
1
][
̇
s
̇
̃
a
]
+
[
C
(
q,
̇
q
) +
K
φ
(
q,
̇
q
)
φ
(
q,
̇
q
)
T
W
T
W
+
λγI
][
s
̃
a
]
=
[
d
(
q,
̇
q
)
(
W
T
d
1
+
λγa
)
]
(24)
Consider the Lyapunov-like function
V
=
y
T
My
, with
y
=
[
s
T
̃
a
T
]
T
and metric function,
M
, given by
M
=
[
H
(
q
)
0
0
̄
P
1
]
(25)
Using the closed loop dynamics given in (24) and the
skew symmetric property of
̇
M
2
C
, we get the following
inequality relationship the derivative of
V
.
d
dt
(
y
T
My
) = 2
y
T
M
̇
y
+
y
T
̇
My
(26)
=
y
T
[
2
K
0
0
λ
̄
P
1
+
λγI
+
W
T
W
]
y
+
y
T
[
d
(
W
T
d
1
+
λγa
)
]
(27)
≤−
2
λ
con
V
+
̄
d
V
(28)
where
λ
con
=
min[
k,
λ
2
(
λ
min
(
̄
P
1
) +
γ
)]
λ
max
(
M
)
(29)
and
̄
d
=
1
min
(
M
)
[
d
W
T
d
1
+
λγa
]
(30)
Applying the transformation
W
=
y
T
My
and the
comparison lemma from [20], we arrive at
y
‖≤
κ
(
M
)
y
(0)
e
λ
con
t
+
̄
d
λ
con
λ
min
M
(
1
e
λ
con
t
)
(31)
thus proving the exponential convergence to the error ball
given in (23).
V. EXPERIMENTAL VALIDATION
We implemented and tested our learning-based composite-
adaptation controller on an Intel Aero Ready to Fly Drone
and tested it with three different trajectories for each of
three different kernel functions. In each test, wind conditions
were generated using CAST’s open air wind tunnel, pictured
in Fig. I. The first test had the drone hover in increasing
wind speeds. The second test had the drone move quickly
between different set points with increasing wind speeds.
These time varying wind conditions showed the ability of
the controller to adapt to new conditions in real time. The
the third test had the drone fly in a figure 8 pattern in constant
wind to demonstrate performance over a dynamic trajectory.
The prediction error, composite velocity tracking error, and
position tracking error for each kernel and each trajectory is
listed in Fig. 7.
The Intel Aero Drone incorporates a PX4 flight controller
with the Intel Aero Compute Board, which runs Linux on
a 2.56GHz Intel Atom x7 processor with 4 GB RAM. The
controller was implemented on the Linux board and sent
thrust and attitude commands to the PX4 flight controller
using MAVROS software. CAST’s Optitrack motion capture
system was used for global position information, which was
broadcast to the drone via wifi. On EKF running on the PX4
controller filtered the IMU and motion capture information,
to produce position and velocity estimates.
−6
−4
−2
0
2
f
a
,
x
(g)
0.00
0.25
0.50
0.75
1.00
1.25
1.50
0 m/s
1.3 m/s
2.5 m/s
3.7 m/s
4.9 m/s
−4
−2
0
2
4
f
a
,
y
(g)
0.0
0.5
1.0
1.5
0 m/s
1.3 m/s
2.5 m/s
3.7 m/s
4.9 m/s
−4
−2
0
2
4
6
f
a
,
z
(g)
0.0
0.2
0.4
0.6
0.8
0 m/s
1.3 m/s
2.5 m/s
3.7 m/s
4.9 m/s
Fig. 2. Distributions of collected data. There are significant shifts in
f
a,x
and
f
a,z
under different wind speed.
37.5
40.0
42.5
45.0
47.5
−2
−1
0
1
2
Trajectory of adaptation data
px
py
pz
37.5
40.0
42.5
45.0
47.5
−5
−4
−3
−2
−1
0
1
2
f
a
,
x
of adaptation data
37.5
40.0
42.5
45.0
47.5
−4
−2
0
2
4
f
a
,
z
of adaptation data
48
50
52
−1
0
1
2
Trajectory of validation data
48
50
52
−4
−3
−2
−1
0
f
a
,
x
of validation data
ground truth
vector kernel pred.
scalar kernel pred.
constant kernel pred.
48
50
52
−3
−2
−1
0
1
2
f
a
,
z
of validation data
Fig. 3. These three plots show the validation process for each of the three
kernel methods discussed in the Experiments section. The first row shows
the least squares fit of the parameters
a
for each kernel choice. The second
row shows the prediction performance for
a
taken from the least squares
estimator in the first row.
The CAST open air wind tunnel consists of approximately
1,400 distributed fans, each individually controllable, in a 3
by 3 meter grid.
A. Data collection and Kernel Training
Position, velocity, acceleration, and motor speed data was
gathered by flying the drone on a random walk trajectory for
at 0, 1.3, 2.5, 3.7, and 4.9 m/s wind speeds for 2 minutes
each, to generate training data. The trajectory was generated
by randomly moving to different set points in a predefined
cube centered in front of the wind tunnel. Then, using the
dynamics equations defined previously, we computed the
aerodynamic disturbance force,
f
.
Three different kernel functions were used in the tests.
The first was an identity kernel,
φ
I
. Note that with a
only a tracking error update term in the adaptation law, this
would be equivalent to integral control. The second and third
kernels were the vector and scalar kernels, defined in (2) and
(3), respectively.
During offline training, the kernels were validated by
estimating
a
using a least squares estimator on a continuous
segment of a validation trajectory. Then, the predicted force
was compared to the measured force for another part of the
validation trajectory. This can be seen in Fig. 3.
Fig. 4. These plots visualize the results from the hovering in variable wind
conditions. Note the increase in wind speed can be clearly seen at 0, 15, and
25 seconds in each plot. Plot (a) shows each choice of kernel funciton leads
to similar composite velocity tracking error convergence. Plot (b) shows that
the drone gained efficiency and required a lower throttle to maintain hover
as wind speed increased. Plot (c) shows the increase in the magnitude of
the measured aerodynamic force as the wind speed increased.
B. Hovering in Increasing Wind
In this test, the drone was set to hover at a fixed height
centered in the wind tunnel test section. The wind tunnel
was set to 2.5 m/s for 15 seconds, then 4.3 m/s for second
10 seconds, then 6.2 m/s for 10 seconds. The results from
this test are shown in Fig. 4.
In this test we see that each controller achieves similar
parameter convergence. The facts that for each case, as the
drone converges to the target hover position, the kernel
functions approach a constant value, and each uses the
same adaptation law, probably leads to similar convergence
properties for each controller. Note however, as seen in
Fig. 7, that the prediction error is lower for the learned
kernels, as the learned kernels likely capture some of the
variation in aerodynamic forces with the change in state.
C. Random Walk with Increasing Wind
The second test had the drone move quickly between
random set points in a cube centered in front of the wind
tunnel, with a new set point generated every second for 60
seconds. For the first 20 seconds, the wind tunnel was set
to 2.5 m/s, for the second 20 seconds, 4.3 m/s, and for the
last 20 seconds, 6.2 m/s. The random number generator seed
was fixed before each test so that each controller received
the exact same set points.
Note that the desired trajectory for this test had sudden
changes in desired acceleration and velocity when the set
point was moved. Thus, the composite velocity error is
significantly higher than in the other tests. The learned kernel
methods in both cases outperformed the constant kernel
method in prediction error performance, but all three methods
had similar tracking error performance, as seen in Figure 7. A
histogram of the prediction error at each time step the control
input was calculated is shown in Figure 5. Here we can see a
slight skew of the constant kernel to higher prediction error
as compared to the learned kernel methods. Not shown in the
plots here, we also noticed that for each choice of kernel, at
several points throughout the test the input was completely
saturated due to the discontinuities in the desired trajectory.
Fig. 5. Histogram of prediction error for each choice of kernel in set point
tracking test. A slight bias of constant kernel towards higher prediction error
can be seen here.
-0.5
0
0.5
1
1.5
x-position (m)
1.5
2
2.5
z-position (m)
a
Constant kernel
Vector kernel
Scalar kernel
desired
-0.5
0
0.5
1
1.5
x-position (m)
1.5
2
2.5
z-position (m)
b
Fig. 6.
In plot (a), the first four laps around the figure 8 trajectory are
shown. The parameters were initialized to
0
for each kernel, leading to
very high initial tracking error. For each choice of kernel, the tracking error
converged near some limit cycle behavior, as seen in plot (b).
D. Figure 8 with Constant Wind
The third trajectory was a figure 8 pattern oriented up and
down (z-axis) and towards and away from the wind tunnel (x-
axis). This test used a fixed wind speed of 4.3 m/s. In each
test, the drone was started from hovering near the center
of the figure 8. Then, the wind tunnel was turned on and
allowed to begin ramping up for 5 seconds, and the figure
8 trajectory was flown repeated for one minute. Each loop
around the figure 8 took 8 seconds.
In this test, we see the most striking difference between the
prediction error performance of the learned kernels versus the
constant kernel, as seen in Table 7. Despite the significantly
better prediction error performance however, the velocity
and position tracking error were still comparable between
the three kernel choices, constant kernel, vector kernel, and
scalar kernel. Examining the estimated xz position for each
test, plotted in Figure 6, can offer some additional insight.
For each controller, the first one to two laps around the
figure 8 pattern show large tracking errors. However, after the
parameters converge to a limit cycle value, each controller is
unable to turn fast enough around the corners of the figure 8.
This suggests that the tracking error in the attitude controller
is causing the position tracking performance degradation.
Two possible causes of this could be that the trajectory
required quick turning speed, causing input saturation, or
as a result of disturbance torques from aerodynamic effects
associated with the wind.
E. Results
Three key error metrics are given for each kernel and
trajectory in Fig. 7. The learned kernels were able to con-
sistently outperform the constant kernel function in predic-
tion error, suggesting that the learned kernels were able to
Trajectory Kernel
Mean error metric
e
1
(N)
s
(m/s)
q
q
d
(m)
Hover
constant
0.75
0.11
0.13
vector
0.72
0.12
0.13
scalar
0.71
0.11
0.12
Set
point
constant
1.58
0.63
0.25
vector
1.45
0.61
0.24
scalar
1.42
0.60
0.24
Figure
8
constant
0.91
0.24
0.20
vector
0.74
0.26
0.22
scalar
0.75
0.25
0.21
Fig. 7.
Prediction error (
mean(
e
1
)
), composite velocity error
(mean
(
s
)
), and position tracking error (mean
(
q
q
d
)
) for each kernel
function and trajectory tested
capture part of the position and velocity dependence of the
aerodynamic effects. This trend becomes more pronounced in
the more dynamic trajectories. However, in each test, similar
composite velocity and position tracking error is seen.
VI. RESULTS AND DISCUSSION
In this paper we have presented an integrated approach
that uses prior data to develop a drone controller capable
of adapting to new and changing wind conditions. A meta-
learning formulation to the offline training helped us design
kernel functions that can represent the dynamics effects
observed the training data. Then we designed an adaptive
controller that can exponentially stabilize the system.
In the our experiments, we saw the the learned kernels
were able to reduce prediction error performance over a
constant kernel. However, this did not translate into improved
tracking error performance. We believe that this could be
caused by a combination of attitude tracking error, input sat-
uration, and dependence of unmodelled dynamics on control
input. In our tests we saw both input saturation and attitude
tracking error lead to increased position tracking error. We
also know that different aerodynamic effects can cause a
change in rotor thrust, usually modelled as a change in the
coefficient of thrust.
All of our tests have demonstrated the viability of our
approach. Our tests have lead us to two conclusions. First,
for the tests we ran, the adaptive control formulation (with
either constant kernel or learned kernel) is able to effectively
compensate for the unmodelled aerodynamic effects and
adapt to changing wind conditions in real time. Second,
we have demonstrated our approach to incorporate learned
dynamics into a robust control design.
VII. A
CKNOWLEDGEMENTS
We thank Yisong Yue, Animashree Anandkumar, Kamyar
Azizzadenesheli, Joel Burdick, Mory Gharib, Daniel Pastor
Moreno, and Anqi Liu for helpful discussions. The work is
funded in part by Caltech’s Center for Autonomous Systems
and Technologies and Raytheon Company.
R
EFERENCES
[1] J.-J. E. Slotine, W. Li,
et al.
,
Applied nonlinear control
. Prentice hall
Englewood Cliffs, NJ, 1991, vol. 199, no. 1.
[2] J. A. Farrell and M. M. Polycarpou,
Adaptive approximation based
control: unifying neural, fuzzy and traditional adaptive approximation
approaches
. John Wiley & Sons, 2006, vol. 48.
[3] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-
shot learning,” in
Advances in Neural Information Processing Systems
,
2017, pp. 4077–4087.
[4] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,
A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska,
et al.
, “Overcoming catastrophic forgetting in neural networks,”
Pro-
ceedings of the national academy of sciences
, vol. 114, no. 13, pp.
3521–3526, 2017.
[5] F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synap-
tic intelligence,” in
Proceedings of the 34th International Conference
on Machine Learning-Volume 70
. JMLR. org, 2017, pp. 3987–3995.
[6] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap,
“Meta-learning with memory-augmented neural networks,” in
Interna-
tional conference on machine learning
, 2016, pp. 1842–1850.
[7] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning
for fast adaptation of deep networks,” in
Proceedings of the 34th
International Conference on Machine Learning-Volume 70
.
JMLR.
org, 2017, pp. 1126–1135.
[8] D. Mellinger and V. Kumar, “Minimum snap trajectory generation
and control for quadrotors,” in
2011 IEEE International Conference
on Robotics and Automation
. IEEE, 2011, pp. 2520–2525.
[9] L. Meier, P. Tanskanen, L. Heng, G. H. Lee, F. Fraundorfer, and
M. Pollefeys, “Pixhawk: A micro aerial vehicle design for autonomous
flight using onboard computer vision,”
Autonomous Robots
, vol. 33,
no. 1-2, pp. 21–39, 2012.
[10] E. Tal and S. Karaman, “Accurate tracking of aggressive quadrotor
trajectories using incremental nonlinear dynamic inversion and differ-
ential flatness,” in
2018 IEEE Conference on Decision and Control
(CDC)
. IEEE, 2018, pp. 4282–4288.
[11] M. Faessler, A. Franchi, and D. Scaramuzza, “Differential flatness
of quadrotor dynamics subject to rotor drag for accurate tracking of
high-speed trajectories,”
IEEE Robotics and Automation Letters
, vol. 3,
no. 2, pp. 620–626, 2017.
[12] X. Shi, K. Kim, S. Rahili, and S.-J. Chung, “Nonlinear control of
autonomous flying cars with wings and distributed electric propulsion,”
in
2018 IEEE Conference on Decision and Control (CDC)
.
IEEE,
2018, pp. 5326–5333.
[13] J. Nakanishi, J. A. Farrell, and S. Schaal, “A locally weighted learning
composite adaptive controller with structure adaptation,” in
IEEE/RSJ
International Conference on Intelligent Robots and Systems
, vol. 1.
IEEE, 2002, pp. 882–889.
[14] M. Bisheban and T. Lee, “Geometric adaptive control with neu-
ral networks for a quadrotor uav in wind fields,”
arXiv preprint
arXiv:1903.02091
, 2019.
[15] N. Levine, T. Zahavy, D. J. Mankowitz, A. Tamar, and S. Mannor,
“Shallow updates for deep reinforcement learning,” in
Advances in
Neural Information Processing Systems
, 2017, pp. 3135–3145.
[16] K. Azizzadenesheli, E. Brunskill, and A. Anandkumar, “Efficient
exploration through bayesian deep q-networks,” in
2018 Information
Theory and Applications Workshop (ITA)
. IEEE, 2018, pp. 1–9.
[17] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral
normalization for generative adversarial networks,”
arXiv preprint
arXiv:1802.05957
, 2018.
[18] G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anand-
kumar, Y. Yue, and S.-J. Chung, “Neural lander: Stable drone landing
control using learned dynamics,”
arXiv preprint arXiv:1811.08027
,
2018.
[19] J.-J. E. Slotine and W. Li, “Composite adaptive control of robot
manipulators,”
Automatica
, vol. 25, no. 4, pp. 509–519, 1989.
[20] H. K. Khalil,
Nonlinear systems; 3rd ed.
Upper Saddle River, NJ:
Prentice-Hall, 2002, the book can be consulted by contacting: PH-AID:
Wallet, Lionel. [Online]. Available: https://cds.cern.ch/record/1173048