of 17
Supplementary Materials for
Neural-Fly enables rapid learning for agile flight in strong winds
Michael O'Connell
et al.
Corresponding author: Soon-Jo Chung, sjchung@caltech.edu
Sci. Robot.
7
, eabm6597 (2022)
DOI: 10.1126/scirobotics.abm6597
This PDF file includes:
Sections S1 to S8
Figs. S1 to S4
Tables S1 to S3
References (
53
58
)
SUPPLEMENTARY TEXT
Section S1 Drone Configuration Details
Table S1 presents the configuration information of the custom built drone (fig. 1(A)) and the Intel Aero drone. We
use both drones for data collection and use the custom built drone exclusively for experiments.
Precision tracking for drones often relies on specialized hardware and optimized vehicle design, whereas our
method achieves precise tracking using improved dynamics prediction through online learning. Although most
researchers report the numeric tracking error of their method, it can be difficult to disentangle the improvement of
the controller resulting from the algorithmic advancement versus the improvement from specialized hardware. For
example moment of inertia generally scales with the radius squared and the lever arm for the motors scales with
the radius, so the attitude maneuverability roughly scales with the inverse of the vehicle radius. Similarly, high
thrust to weight ratio provides more attitude control authority during high acceleration maneuvers. More powerful
motors, electronic speed controllers, and batteries together allow faster motor response time further improving
maneuverability. Thus, state-of-the-art (SOTA) tracking performance usually requires specialized hardware often
used for racing drones, resulting in a vehicle with greater maneuverability than our platform, a higher thrust to
weight ratio, and using high-rate controllers sometimes even including direct motor RPM control. In contrast,
our custom drone is more representative of typical consumer drone hardware. A detailed comparison with the
hardware from some recent work in agile flight control is provided in Table S2.
Section S2 The Expressiveness of the Learning Architecture
In this section, we theoretically justify the decomposition
f
(
x
,
w
)
φ
(
x
)
a
(
w
)
. In particularly, we prove that any
analytic function
̄
f
(
x
,
w
) : [
1
,
1]
n
×
[
1
,
1]
m
R
can be split into a
w
-invariant part
̄
φ
(
x
)
and a
w
-dependant
part
̄
a
(
w
)
in the structure
̄
φ
(
x
)
̄
a
(
w
)
with arbitrary precision

, where
̄
φ
(
x
)
and
̄
a
(
w
)
are two polynomials. Further,
the dimension of
̄
a
(
w
)
only scales polylogarithmically with
1
/
.
We
first
introduce
the
following
multivariate
polynomial
approximation
lemma
in
the
hypercube
proved
in
(
5
2
).
Lemma 2.
(Multivariate polynomial approximation in the hypercube) Let
̄
f
(
x
,
w
) : [
1
,
1]
n
×
[
1
,
1]
m
R
be
a smooth function of
[
x
,
w
]
[
1
,
1]
n
+
m
for
n,m
1
. Assume
̄
f
(
x
,
w
)
is analytic for all
[
x
,
w
]
C
n
+
m
with
<
(
x
2
1
+
···
+
x
2
n
+
w
2
1
+
···
+
w
2
m
)
≥−
t
2
for some
t >
0
, where
<
(
·
)
denotes the real part of a complex number.
Then
̄
f
has a uniformly and absolutely convergent multivariate Chebyshev series
k
1
=0
···
k
n
=0
l
1
=0
···
l
m
=0
b
k
1
,
···
,k
n
,l
1
,
···
,l
m
T
k
1
(
x
1
)
···
T
k
n
(
x
n
)
T
l
1
(
w
1
)
···
T
l
m
(
w
m
)
.
Define
s
= [
k
1
,
···
,k
n
,l
1
,
···
,l
m
]
. The multivariate Chebyshev coefficients satisfy the following exponential
decay property:
b
s
=
O
(
(1 +
t
)
−‖
s
2
)
.
Note that this lemma shows that the truncated Chebyshev expansions
C
p
=
p
k
1
=0
···
p
k
n
=0
p
l
1
=0
···
p
l
m
=0
b
k
1
,
···
,k
n
,l
1
,
···
,l
m
T
k
1
(
x
1
)
···
T
k
n
(
x
n
)
T
l
1
(
w
1
)
···
T
l
m
(
w
m
)
will converge to
̄
f
with the rate
O
((1+
t
)
p
n
+
m
)
for some
t >
0
, i.e.,
sup
[
x
,
w
]
[
1
,
1]
n
+
m
̄
f
(
x
,
w
)
−C
p
(
x
,
w
)
‖≤
O
((1 +
t
)
p
n
+
m
)
. Finally we are ready to present the following representation theorem.
Theorem 3.
̄
f
(
x
,
w
)
is a function satisfying the assumptions in Lemma 2. For any
 >
0
, there exist
h
Z
+
, and
two Chebyshev polynomials
̄
φ
(
x
) : [
1
,
1]
n
R
1
×
h
and
̄
a
(
w
) : [
1
,
1]
m
R
h
×
1
such that
sup
[
x
,
w
]
[
1
,
1]
n
+
m
̄
f
(
x
,
w
)
̄
φ
(
x
)
̄
a
(
w
)
‖≤

and
h
=
O
((log(1
/
))
m
)
.
Proof.
First note that there exists
p
=
O
(
log(1
/
)
n
+
m
)
such that
sup
[
x
,
w
]
[
1
,
1]
n
+
m
̄
f
(
x
,
w
)
−C
p
(
x
,
w
)

. To
simplify the notation, define
g
(
x
,
k
,
l
) =
g
(
x
1
,
···
,x
n
,k
1
,
···
,k
n
,l
1
,
···
,l
m
) =
b
k
1
,
···
,k
n
,l
1
,
···
,l
m
T
k
1
(
x
1
)
···
T
k
n
(
x
n
)
g
(
w
,
l
) =
g
(
w
1
,
···
,w
m
,l
1
,
···
,l
m
) =
T
l
1
(
w
1
)
···
T
l
n
(
w
m
)
Then we have
C
p
(
x
,
w
) =
p
k
1
,
···
,k
n
=0
p
l
1
,
···
,l
m
=0
g
(
x
,k
1
,
···
,k
n
,l
1
,
···
,l
m
)
g
(
w
,l
1
,
···
,l
m
)
Then we rewrite
C
p
as
C
p
(
x
,
w
) =
̄
φ
(
x
)
̄
a
(
w
)
:
̄
φ
(
x
)
>
=
p
k
1
,
···
,k
n
=0
g
(
x
,k
1
,
···
,k
n
,
l
= [0
,
0
,
···
,
0])
p
k
1
,
···
,k
n
=0
g
(
x
,k
1
,
···
,k
n
,
l
= [1
,
0
,
···
,
0])
p
k
1
,
···
,k
n
=0
g
(
x
,k
1
,
···
,k
n
,
l
= [2
,
0
,
···
,
0])
.
.
.
p
k
1
,
···
,k
n
=0
g
(
x
,k
1
,
···
,k
n
,
l
= [
p,p,
···
,p
])
,
̄
a
(
w
) =
g
(
w
,
l
= [0
,
0
,
···
,
0])
g
(
w
,
l
= [1
,
0
,
···
,
0])
g
(
w
,
l
= [2
,
0
,
···
,
0])
.
.
.
g
(
w
,
l
= [
p,p,
···
,p
])
Note that the dimension of
̄
φ
(
x
)
and
̄
a
(
w
)
is
h
= (
p
+ 1)
m
=
O
((
1 +
log(1
/
)
n
+
m
)
m
)
=
O
((log(1
/
))
m
)
Note
that
Theorem
3
can
be
generalized
to
vector-valued
functions
with
bounded
input
space
straightforwardly.
Finally,
since
deep
neural
networks
are
universal
approximators
for
polynomials
(
5
3
),
Theorem
3
immediately
guarantees
the
expressiveness
of
our
learning
structure,
i.e.,
φ
(
x
)
a
(
w
)
can
approximate
f
(
x
,
w
)
with
arbitrary
precision,
where
φ
(
x
)
is
a
deep
neural
network
and
a
ˆ
includes
the
linear
coefficients
for
all
the
elements
of
f
.
In
experiments,
we
show
that
a
four-layer
neural
network
can
efficiently
learn
an
effective
representation
for
the
underlying
unknown
dynamics
f
(
x
,
w
)
.
Section
S3
Hyperparameters
for
DAIML
and
the
Interpretation
We
implemented
DAIML
(Algorithm
1)
using
PyTorch,
with
hyperparameters
reported
in
Table
S3.
We
iteratively
tuned
these
hyperparameters
by
trial
and
error.
We
notice
that
the
behavior
of
the
learning
algorithm
is
not
sensitive
to
most
of
parameters
in
Table
S3.
The
training
process
is
shown
in
fig.
S1,
where
we
present
the
f
loss
curve
on
both training set and validation set using three random seeds. The
f
loss is defined by
i
B
y
(
i
)
k
φ
(
x
(
i
)
k
)
a
2
(see Line 7 in Algorithm 1), which reflects how well
φ
can approximate the unknown dynamics
f
(
x
,
w
)
. The
validation set we considered is from the figure-8 trajectory tracking tasks using the PID and nonlinear baseline
methods. Note that the training set consists of a very different set of trajectories (using random waypoint tracking,
see Results) , and this difference is for studying whether and when the learned model
φ
starts over-fitting during
the training process.
We
emphasize
a
few
important
parameters
as
follows.
(i)
The
frequency
0
<
η
1
is
to
control
how
often
the
discriminator
h
is
updated.
Note
that
η
=
1
corresponds
to
the
case
that
φ
and
h
are
both
updated
in
each
iteration.
We
use
η
=
0
.
5
for
training
stability,
which
is
also
commonly
used
in
training
generative
adversarial
networks
(
49
).
(ii)
The
regularization
parameter
α
0
.
Note
that
α
=
0
corresponds
to
the
non-adversarial
meta-
learning
case
which
does
not
incorporate
the
adversarial
regularization
term
in
Eq.
(5).
From
fig.
S1,
clearly
a
proper
choice
of
α
can
effectively
avoid
over-fitting.
Moreover,
another
benefit
of
having
α
>
0
is
that
the
learned
model
is
more
explainable.
As
observed
in
fig.
fig:training-tsne,
α
>
0
disentangles
the
linear
coefficients
a
between
wind
conditions.
However,
if
α
is
too
high
it
may
degrade
the
prediction
performance,
so
we
recommend
using
relatively
small
value
for
α
such
as
0
.
1
.
The
importance
of
having
a
domain-invariant
representation.
We
use
the
following
example
to
illustrate
the
importance
of
having
a
domain-invariant
representation
φ
(
x
)
for
online
adaptation.
Suppose
the
data
distribution
in
wind
conditions
1
and
2
are
P
1
(
x
)
and
P
2
(
x
)
,
respectively,
and
they
do
not
overlap.
Ideally,
we
would
hope
these
two
conditions
share
an
invariant
representation
and
the
latent
variables
are
distinct
(
a
(1)
and
a
(2)
in
the
first
line
in
fig.
S2
shown
b elow).
However,
because
of
the
expressiveness
of
DNNs,
φ
may
memorize
P
1
and
P
2
and
learn
two
modes
φ
1
(
x
)
and
φ
2
(
x
)
.
In
the
second
line
in
the
following
figure,
φ
1
and
φ
2
are
triggered
if
x
is
in
P
1
and
P
2
,
respectively
(
1
x
P
1
and
1
x
P
2
are
indicator
functions),
such
that
the
latent
variable
a
is
identical
in
both
wind
conditions.
Such
an
overfitted
φ
is
not
robust
and
not
g eneralizable:
for
example,
if
the
drone
flies
to
P
1
in
wind
condition
2,
the
wrong
mode
φ
1
will
be
triggered.
The
key
idea
to
tackle
this
challenge
is
to
encourage
diversity
in
the
latent
space,
which
is
why
we
introduced
a
discriminator
in
DAIML.
Figure
4
shows
DAIML
indeed
makes
the
latent
space
much
more
disentangled.
Section
S4
Discrete
Version
of
the
Proposed
Controller
In
practice,
we
implement
Neural-Flyon
a
digital
system,
and
therefore,
we
require
a
discrete
version
of
the
con-
troller.
The
feedback
control
policy
u
remains
the
same
as
presented
in
the
main
body
of
this
article.
However,
the
adaptation
law
must
be
integrated
and
therefore
we
must
be
concerned
with
both
the
numerical
accuracy
and
k
computation
time
of
this
integration,
particularly
for
the
covaraince
matrix
P
.
During
the
development
of
our
algo-
rithm,
we
observed
that
a
naive
one-step
Euler
integration
of
the
continuous
time
adaptation
law
would
sometimes
result
P
becoming
non-positive-definite
due
to
a
large
P
̇
magnitude
and
a
coarse
integration
step
size
(see
(
5
4
)
for
more
discussion
on
the
positive
definiteness
of
numerical
integration
of
the
differential
Riccati
equation).
To
avoid
this
issue,
we
instead
implemented
the
adaptation
law
in
two
discrete
steps,
a
propagation
and
an
update
step,
summarized
as
below.
We
denote
the
time
at
step
k
as
t
k
,
the
value
of
a
parameter
before
the
update
step
but
after
the
propagation
step
with
a
subscript
t
,
and
the
value
after
both
the
propagation
and
update
step
with
a subscript
t
+
k
. The value used in the controller is the value after both the propagation and update steps, that is
ˆ
a
(
t
k
) =
ˆ
a
t
+
k
. During the propagation step in Eq. (15) and (16) both
ˆ
a
and
P
are regularized. Then, in the update
step
in
Eq.
(18)
and
(19),
P
and
a
ˆ
are
updated
according
to
the
gain
in
Eq.
(17).
This
mirrors
a
discrete
Kalman
filter
implementation
(
55
)
with
the
tracking
error
term
added
in
the
update
step.
The
discrete
Kalman
filter
ex-actly
integrates
the
continuous
time
Kalman
filter
when
the
prediction
error
e
,
tracking
error
s
,
and
learned
basis
functions
φ
are
constant
between
time
steps
ensuring
the
positive
definiteness
of
P
.
ˆ
a
t
k
= (1
λ
t
k
)
︷︷
damping
ˆ
a
t
+
k
1
(15)
P
t
k
= (1
λ
t
k
)
2
P
t
+
k
1
+
Q
t
k
(16)
K
t
k
=
P
t
k
φ
>
t
k
(
φ
t
k
P
t
k
φ
>
t
k
+
R
t
k
)
1
(17)
ˆ
a
t
+
k
=
ˆ
a
t
k
K
t
k
(
φ
t
k
ˆ
a
t
k
y
t
k
)
︷︷
prediction error adaptation
P
t
k
φ
>
t
k
s
t
k
︷︷
tracking error adaptation
(18)
P
t
+
k
= (
I
K
t
k
φ
t
k
)
P
t
k
(
I
K
t
k
φ
t
k
)
>
+
K
t
k
R
t
k
K
>
t
k
(19)
Section S5 Stability and Robustness Formal Guarantees and Proof
We divide the proof of Eq. (12) into two steps. First, in Theorem 4, we show that the combined composite
velocity tracking error and adaptation error,
[
s
;
̃
a
]
, exponentially converges to a bounded error ball. This implies
the exponential convergence of
s
. Then in Corollary 5 we show that when
s
is exponentially bounded,
̃
q
is
also exponentially bounded. Combining the exponential bound from Theorem 4 and the ultimate bound from
Corollary 5 proves Theorem 1.
Before discussing the main proof, let us consider the robustness properties of the feedback controller without
considering any specific adaptation law. Taking the dynamics Eq. (1), control law Eq. (7), the composite velocity
error definition Eq. (10), and the parameter estimation error
̃
a
=
ˆ
a
a
, we find
M
̇
s
+ (
C
+
K
)
s
=
φ
̃
a
+
d
(20)
We can use the Lyapunov function
V
=
s
>
M
s
under the assumption of bounded
̃
a
to show that
lim
t
→∞
s
‖≤
sup
t
d
φ
̃
a
λ
max
(
M
)
λ
min
(
K
)
λ
min
(
M
)
(21)
Taking this results alone, one might expect that any online estimator or learning algorithm will lead to good
performance. However, the boundedness of
̃
a
is not guaranteed; Slotine and Li discuss this topic thoroughly (
15
).
In the full proof below, we show the stability and robustness of the Neural-Fly adaptation algorithm.
First, we introduce the parameter measurement noise
̄

, where
̄

=
y
φ
a
. Thus,
̄

=

+
d
and
̄

‖≤‖

+
d
by the triangle inequality. Using the above closed loop dynamics Eq. (20), the parameter estimation error
̃
a
, and
the adaption law Eq. (8) and (9), the combined velocity and parameter-error closed-loop dynamics are given by
[
M
0
0
P
1
][
̇
s
̇
̃
a
]
+
[
C
+
K
φ
φ
T
φ
>
R
1
φ
+
λP
1
][
s
̃
a
]
=
[
d
φ
>
R
1
̄

P
1
λ
a
P
1
̇
a
]
(22)
d
dt
(
P
1
)
=
P
1
̇
PP
1
=
P
1
(
2
λP
Q
+
>
R
1
φP
)
P
1
(23)
For
our
stability
proof,
we
rely
on
the
fact
that
P
1
is
both
uniformly
positive
definite
and
uniformly
bounded,
that
is,
there
exists
some
positive
definite,
constant
matrices
A
and
B
such
that
A

P
1

B
.
Dieci
and
Eirola
(
5
4
)
show
the
slightly
weaker
result
that
that
P
is
positive
definite
and
finite
when
φ
is
bounded
under
the
looser
assumption
Q

0
.
Following
the
proof
from
(
5
4
)
with
the
additional
assumption
that
Q
is
uniformly
positive
definite,
one
can
show
the
uniform
definiteness
and
uniform
boundedness
of
P
.
Hence,
P
1
is
also
uniformly
positive
definite
and
uniformly
bounded.
Theorem
4.
Given
dynamics
that
evolve
according
to
Eq.
(22)
and
(23)
,
uniform
positive
definiteness
and
uniform
boundedness of
P
1
, the norm of
[
s
̃
a
]
exponentially converges to the bound given in Eq.
(24)
with rate
α
.
lim
t
→∞
[
s
̃
a
]
1
αλ
min
(
M
)
(
sup
t
d
+ sup
t
(
φ
>
R
1
̄

) +
λ
max
(
P
1
) sup
t
(
λ
a
+
̇
a
)
)
(24)
where
α
and
M
are functions of
φ,R,Q,K,M
and
λ
, and
λ
min
(
·
)
and
λ
max
(
·
)
are the minimum and maximum
eigenvalues of
(
·
)
over time, respectively. Given Corollary 5 and Eq.
(24)
, the bound in Eq.
(12)
is proven. Note
λ
max
(
P
1
) = 1
min
(
P
)
and a sufficiently large value of
λ
min
(
P
)
will make the RHS of Eq.
(24)
small.
Proof.
Now consider the Lyapunov function
V
given by
V
=
[
s
̃
a
]
>
[
M
0
0
P
1
][
s
̃
a
]
(25)
This Lyapunov function has the derivative
̇
V
= 2
[
s
̃
a
]
>
[
M
0
0
P
1
][
̇
s
̇
̃
a
]
+
[
s
̃
a
]
>
[
̇
M
0
0
d
dt
(
P
1
)
][
s
̃
a
]
(26)
=
2
[
s
̃
a
]
>
[
C
+
K
φ
φ
T
φ
>
R
1
φ
+
λP
1
][
s
̃
a
]
+ 2
[
s
̃
a
]
>
[
d
φ
>
R
1
̄

P
1
λ
a
P
1
̇
a
]
+
(27)
[
s
̃
a
]
>
[
̇
M
0
0
d
dt
(
P
1
)
][
s
̃
a
]
(28)
=
2
[
s
̃
a
]
>
[
K
φ
φ
T
φ
>
R
1
φ
+
λP
1
][
s
̃
a
]
+ 2
[
s
̃
a
]
>
[
d
φ
>
R
1
̄

P
1
λ
a
P
1
̇
a
]
(29)
+
[
s
̃
a
]
>
[
0
0
0 2
λP
1
P
1
QP
1
+
φ
>
R
1
φ
][
s
̃
a
]
(30)
=
[
s
̃
a
]
>
[
2
K
0
0
φ
>
R
1
φ
+
P
1
QP
1
][
s
̃
a
]
+ 2
[
s
̃
a
]
>
[
d
φ
>
R
1
̄

P
1
λ
a
P
1
̇
a
]
(31)
where we used the fact
̇
M
2
C
is skew-symmetric. As
K
,
P
1
QP
1
,
M
, and
P
1
are all uniformly positive
definite and uniformly bounded, and
φ
>
R
1
φ
is positive semidefinite, there exists some
α >
0
such that
[
2
K
0
0
φ
>
R
1
φ
+
P
1
QP
1
]
−
2
α
[
M
0
0
P
1
]
(32)
for all
t
.
Define an upper bound for the disturbance term
D
as
D
= sup
t
[
d
φ
>
R
1
̄

P
1
λ
a
P
1
̇
a
]
(33)
and define the function
M
,
M
=
[
M
0
0
P
1
]
(34)
By Eq. (32), the Cauchy-Schwartz inequality, and the definition of the minimum eigenvalue, we have the following
inequality for
̇
V
:
̇
V ≤−
2
α
V
+ 2
V
λ
min
(
M
)
D
(35)
Consider the related systems,
W
where
W
=
V
,
2
̇
WW
=
̇
V
, and the following three equations hold
2
̇
WW ≤−
2
α
W
2
+
2
D
W
λ
min
(
M
)
(36)
̇
W ≤−
α
W
+
D
λ
min
(
M
)
(37)
By
the
Comparison
Lemma
(
5
6
),
V
=
W ≤
e
αt
(
W
(0)
D
α
λ
min
(
M
)
)
+
D
α
λ
min
(
M
)
(38)
and the stacked state exponentially converges to the ball
lim
t
→∞
[
s
̃
a
]
D
αλ
min
(
M
)
(39)
This completes the proof.
Next, we present a corollary which shows the exponential convergence of
̃
q
when
s
is exponentially stable.
Corollary 5.
If
s
(
t
)
‖≤
A
exp(
αt
) +
B/α
for some constants
A
,
B
, and
α
, and
s
=
̇
̃
q
+ Λ
̃
q
, then
̃
q
‖≤
e
λ
min
(Λ)
t
̃
q
(0)
+
t
0
e
λ
min
(Λ)(
t
τ
)
A
e
ατ
d
τ
+
t
0
e
λ
min
(Λ)(
t
τ
)
B
α
d
τ
(40)
thus
̃
q
exponentially approaches the bound
lim
t
→∞
̃
q
‖≤
B
αλ
min
(Λ)
(41)
Proof.
From
the
Comparison
Lemma
(
5
6
),
we
can
easily
show
Eq.
(40).
This
can
be
further
reduced
as
follows.
̃
q
‖≤
e
λ
min
(Λ)
t
̃
q
(0)
+
A
e
λ
min
(Λ)
t
t
0
e
(
λ
min
(Λ)
α
)
τ
d
τ
+
t
0
e
λ
min
(Λ)(
t
τ
)
B
α
d
τ
(42)
e
λ
min
(Λ)
t
̃
q
(0)
+
A
e
αt
e
λ
min
(Λ)
t
λ
min
(Λ)
α
+
B
(
1
e
λ
min
(Λ)
t
)
αλ
min
(Λ)
(43)
Taking the limit, we arrive at Eq. (41)
With the following corollary, we will justify that
α
is strictly positive even when
φ
0
, and thus the adaptive
control algorithm guarantees robustness even in the absence of persistent excitation or with ineffective learning. In
practice we expect some measurement information about all the elements of
a
, that is, we expect a non-zero
φ
.
Corollary 6.
If
φ
0
, then the bound in Eq.
(24)
can be simplified to
lim
t
→∞
[
s
̃
a
]
sup
d
+
λ
max
(
P
1
) sup(
λ
a
+
̇
a
)
min (
λ,λ
min
(
K
)
max
(
M
))
λ
min
(
M
)
(44)
Proof.
Assuming
φ
0
immediately leads to
α
of
α
= min
(
1
2
λ
min
(
P
1
Q
)
,
λ
min
(
K
)
λ
max
(
M
)
)
(45)
φ
0
also simplifies the
̇
P
equation to a stable first-order differential matrix equation. By integrating this simpli-
fied
̇
P
equation, we can show
P
exponentially converges to the value
P
=
Q
2
λ
. This leads to bound in Eq. (44).
We now introduce another corollary for the Neural-Fly-Constant, when
φ
=
I
. In this case, the regularization
term is not needed, as it is intended to regularize the linear coefficient estimate in the absence of persistent excita-
tion, so we set
λ
= 0
. This corollary also shows that Neural-Fly-Constant is sufficient for perfect tracking control
when
f
is constant; though in this case, even the nonlinear baseline controller with integral control will converge to
perfect tracking. In practice for quadrotors, we only expect
f
to be constant when the drone air-velocity is constant,
such as in hover or steady level flight with constant wind velocity.
Corollary 7.
If
φ
I
,
Q
=
qI
,
R
=
rI
,
λ
= 0
, and
P
(0) =
p
0
I
is diagonal, where
q
,
r
and
p
0
are strictly
positive scalar constants, then the bound in Eq.
(24)
can be simplified to
lim
t
→∞
[
s
̃
a
]
((
1 +
r
1
)
sup
t
f
a
+
/r
)
λ
max
(
M
)
λ
min
(
K
)
λ
min
(
M
)
(46)
Proof.
Under these assumptions, the matrix differential equation for
P
is reduced to the scalar differential equation
dp
dt
=
q
p
2
/r
(47)
where
P
(
t
) =
p
(
t
)
I
. This equation can be integrated to find that
p
exponentially converges to
p
=
qr
. Then by
Eq. (32),
α
q/r
and
α
λ
min
(
K
)
max
(
M
)
. If we choose
q
and
r
such that
q/r
=
λ
min
(
K
)
max
(
M
)
,
then we can take
α
=
λ
min
(
K
)
max
(
M
)
. Then, the error bound reduces to
lim
t
→∞
[
s
̃
a
]
max
(
M
)
λ
min
(
K
)
λ
min
(
M
)
(48)
Take
a
as a constant. Then
̇
a
= 0
,
d
=
f
a
, and
D
is bounded by
D
(
1 +
r
1
)
sup
t
f
a
+
/r
(49)
Section
S6
Gain
Tuning
The
attitude
controller
was
tuned
following
the
method
in
(
5
7
).
The
gains
for
all
of
the
position
controllers
tested
were
tuned
on
a
step
input
of
1
m
in
the
x-direction.
The
proportional
(P)
and
derivative
(D)
gains
were
tuned
using
the
baseline
nonlinear
controller
for
good
rise
time
with
minimal
overshoot
or
oscillations.
The
same
P
and
D
gains
were
used
across
all
methods.
The
integral
and
adaptation
gains
were
tuned
separately
for
each
method.
In
each
case,
the
gains
were
increased
to
minimize
response
time
until
we
observed
having
large
overshoot,
noticeably
jittery,
or
oscillatory
behavior.
For
L
1
and
INDI
this
gave
a
first-order
filters
with
a
cutoff
frequency
of
5
Hz
.
For
each
of
the
Neural-Fly
methods,
we
used
R
=
rI
and
Q
=
qI
,
where
r
and
q
are
a
scalar
values.
The
tuning
method
gave
an
R
gains
similar
to
the
measurement
noise
of
the
residual
force,
a
Q
values
on
the
order
of
0
.
1
,
and
λ
values
of
0
.
01
.
Section S7 Force Prediction Performance
The section discusses fig. S3, which is useful for understanding why learning improves force prediction (which in
turn improves control).
For the nonlinear baseline method, the integral (I) term compensates for the average wind effect, as seen in
fig. S3. Thus, the UAV trajcetory remains roughly centered on the desired trajectory for all wind conditions, as
seen in fig. 5. The relative velocity of the drone changes too quickly for the integral-action to compensate for the
changes in the wind effect. Although increasing the I gain would allow the integral control to react more quickly, a
large I gain can also lead to overshoot and instability, thus the gain is effectively limited by the combined stability
of the P, D, and I gains.
Next, consider the two SOTA baseline methods, INDI and
L
1
, along with the non-learning version of our
method, Neural-Fly-Constant. These methods represent different adaptive control approaches that assume no prior
model for the residual dynamics. Instead, each of these methods effectively outputs a filtered version of the
measured residual force and the controller compensates for this adapted term. In fig. S3, we observe that each
of these methods has a slight lag behind the measured residual force, in grey. This lag is reduced by increasing
the adaptation gain, however, increasing the adaptation gain leads to noise amplification. Thus, these
reactive
approaches are limited by some more inherent system properties, like measurement noise.
Finally, consider the two learning versions of our method, Neural-Fly and Neural-Fly-Transfer. These methods
use a learned model in the adaptive control algorithm. Thus, once the linear parameters have adapted to the current
wind condition, the model can predict future aerodynamic effects with minimal changes to the coefficients. As we
extrapolate to higher wind speeds and time-varying conditions, some model mismatch occurs and is manifested
as discrepancies between the predicted force,
ˆ
f
, and the measured force,
f
, as seen in fig. S3. Thus, our learning
based control is limited by the learning representation error. This matches the conclusion drawn in our theoretical
analysis, where tracking error scales linearly with representation error.