A Appendix
A.1 Proof of Lemma 1
Proof.
Using the e-ISS property in Assumption 1, we have:
1
TN
N
∑
i
=1
T
∑
t
=1
‖
x
(
i
)
t
‖≤
1
TN
N
∑
i
=1
T
∑
t
=1
(
γ
t
−
1
∑
k
=1
ρ
t
−
1
−
k
‖
B
(
i
)
k
u
(
i
)
k
−
f
(
i
)
k
+
w
(
i
)
k
‖
)
(
a
)
≤
γ
1
−
ρ
1
TN
N
∑
i
=1
T
−
1
∑
t
=1
‖
B
(
i
)
t
u
(
i
)
t
−
f
(
i
)
t
+
w
(
i
)
t
‖
(
b
)
≤
γ
1
−
ρ
√
1
TN
√
√
√
√
N
∑
i
=1
T
∑
t
=1
‖
B
(
i
)
t
u
(
i
)
t
−
f
(
i
)
t
+
w
(
i
)
t
‖
2
,
(8)
where
(
a
)
and
(
b
)
are from geometric series and Cauchy-Schwarz inequality respectively.
A.2 Proof of Lemma 2
This proof is based on the proof of Theorem 4.1 in [28].
Proof.
For any
̄
Θ
∈K
1
and
̄
c
(1:
N
)
∈K
2
we have
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
−
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(
̄
Θ
,
̄
c
(
i
)
)
(
a
)
≤
N
∑
i
=1
T
∑
t
=1
∇
ˆ
Θ
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
·
(
ˆ
Θ
(
i
)
−
̄
Θ) +
N
∑
i
=1
T
∑
t
=1
∇
ˆ
c
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
·
(ˆ
c
(
i
)
t
−
̄
c
(
i
)
)
=
N
∑
i
=1
[
G
(
i
)
(
ˆ
Θ
(
i
)
)
−
G
(
i
)
(
̄
Θ)
]
+
N
∑
i
=1
T
∑
t
=1
[
g
(
i
)
t
(ˆ
c
(
i
)
t
)
−
g
(
i
)
t
( ̄
c
(
i
)
)
]
≤
N
∑
i
=1
G
(
i
)
(
ˆ
Θ
(
i
)
)
−
min
Θ
∈K
1
N
∑
i
=1
G
(
i
)
(Θ)
︸
︷︷
︸
the total regret of
A
1
,T
·
o
(
N
)
+
N
∑
i
=1
T
∑
t
=1
g
(
i
)
t
(ˆ
c
(
i
)
t
)
−
N
∑
i
=1
min
c
(
i
)
∈K
2
T
∑
t
=1
g
(
i
)
t
(
c
(
i
)
)
︸
︷︷
︸
the total regret of
A
2
,N
·
o
(
T
)
.
(9)
where we have
(
a
)
because
`
(
i
)
t
is convex. Note that the total regret of
A
1
is
T
·
o
(
N
)
because
G
(
i
)
is
scaled up by a factor of
T
.
A.3 Proof of Theorem 3
Proof.
Since
Θ
∈K
1
and
c
(1:
N
)
∈K
2
, applying Lemma 2 we have
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
−
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(Θ
,c
(
i
)
)
≤
T
·
o
(
N
) +
N
·
o
(
T
)
(10)
Recall that the definition of
`
(
i
)
t
is
`
(
i
)
t
(
ˆ
Θ
,
ˆ
c
) =
‖
F
(
φ
(
x
(
i
)
t
;
ˆ
Θ)
,
ˆ
c
)
−
y
(
i
)
t
‖
2
, and
y
(
i
)
t
=
f
(
i
)
t
−
w
(
i
)
t
.
Therefore we have
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
) =
‖
ˆ
f
(
i
)
t
−
f
(
i
)
t
+
w
(
i
)
t
‖
2
=
‖
B
(
i
)
t
u
(
i
)
t
−
f
(
i
)
t
+
w
(
i
)
t
‖
2
`
(
i
)
t
(Θ
,c
(
i
)
) =
‖
w
(
i
)
t
‖
2
≤
W
2
.
(11)
14
Then applying Lemma 1 we have
ACE
≤
γ
1
−
ρ
√
∑
N
i
=1
∑
T
t
=1
‖
B
(
i
)
t
u
(
i
)
t
−
f
(
i
)
t
+
w
(
i
)
t
‖
2
TN
=
γ
1
−
ρ
√
∑
N
i
=1
∑
T
t
=1
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
TN
(
a
)
≤
γ
1
−
ρ
√
T
·
o
(
N
) +
N
·
o
(
T
) +
∑
N
i
=1
∑
T
t
=1
`
(
i
)
t
(Θ
,c
(
i
)
)
TN
≤
γ
1
−
ρ
√
W
2
+
o
(
T
)
T
+
o
(
N
)
N
,
(12)
where
(
a
)
uses (10).
A.4 Proof of Corollary 4
Before the proof, we first present a lemma [
27
] which shows that the regret of an Online Gradient
Descent (OGD) algorithm.
Lemma 7
(Regret of OGD [
27
])
.
Suppose
f
1:
T
(
x
)
is a sequence of differentiable convex cost
functions from
R
n
to
R
, and
K
is a convex set in
R
n
with diameter
D
, i.e.,
∀
x
1
,x
2
∈ K
,
‖
x
1
−
x
2
‖ ≤
D
. We denote by
G >
0
an upper bound on the norm of the gradients of
f
1:
T
over
K
, i.e.,
‖∇
f
t
(
x
)
‖≤
G
for all
t
∈
[1
,T
]
and
x
∈K
.
The OGD algorithm initializes
x
1
∈K
. At time step
t
, it plays
x
t
, observes cost
f
t
(
x
t
)
, and updates
x
t
+1
by
Π
K
(
x
t
−
η
t
∇
f
t
(
x
t
))
where
Π
K
is the projection onto
K
, i.e.,
Π
K
(
y
) = arg min
x
∈K
‖
x
−
y
‖
.
OGD with learning rates
{
η
t
=
D
G
√
t
}
guarantees the following:
T
∑
t
=1
f
t
(
x
t
)
−
min
x
∗
∈K
T
∑
t
=1
f
t
(
x
∗
)
≤
3
2
GD
√
T.
(13)
Define
R
(
A
1
)
as the total regret of the outer-adapter
A
1
, and
R
(
A
2
)
as the total regret of the inner-
adapter
A
2
. Recall that in Theorem 3 we show that
ACE
(OMAC)
≤
γ
1
−
ρ
√
W
2
+
R
(
A
1
)+
R
(
A
2
)
TN
.
Now we will prove Corollary 4 by analyzing
R
(
A
1
)
and
R
(
A
2
)
respectively.
Proof of Corollary 4.
Since the true dynamics
f
(
x,c
(
i
)
) =
Y
1
(
x
)Θ +
Y
2
(
x
)
c
(
i
)
, we have
`
(
i
)
t
(
ˆ
Θ
,
ˆ
c
) =
‖
Y
1
(
x
(
i
)
t
)
ˆ
Θ +
Y
2
(
x
(
i
)
t
)ˆ
c
−
Y
1
(
x
(
i
)
t
)Θ
−
Y
2
(
x
(
i
)
t
)
c
(
i
)
+
w
(
i
)
t
‖
2
.
(14)
Recall that
g
(
i
)
t
(ˆ
c
) =
∇
ˆ
c
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
·
ˆ
c
, which is convex (linear) w.r.t.
ˆ
c
. The gradient of
g
(
i
)
t
is
upper bounded as
‖∇
ˆ
c
g
(
i
)
t
‖
=
∥
∥
∥
2
Y
2
(
x
(
i
)
t
)
>
(
Y
1
(
x
(
i
)
t
)
ˆ
Θ
(
i
)
+
Y
2
(
x
(
i
)
t
)ˆ
c
(
i
)
t
−
Y
1
(
x
(
i
)
t
)Θ
−
Y
2
(
x
(
i
)
t
)
c
(
i
)
+
w
(
i
)
t
)
∥
∥
∥
≤
2
K
2
K
1
K
Θ
+ 2
K
2
2
K
c
+ 2
K
2
K
1
K
Θ
+ 2
K
2
2
K
c
+ 2
K
2
W
= 4
K
1
K
2
K
Θ
+ 4
K
2
2
K
c
+ 2
K
2
W
︸
︷︷
︸
C
2
.
(15)
From Lemma 7, using learning rates
η
(
i
)
t
=
2
K
c
C
2
√
t
for all
i
, the regret of
A
2
at each outer iteration is
upper bounded by
3
K
c
C
2
√
T
. Then the total regret of
A
2
is bounded as
R
(
A
2
)
≤
3
K
c
C
2
N
√
T.
(16)
15
Now let us study
A
1
. Similarly, recall that
G
(
i
)
(
ˆ
Θ) =
∑
T
t
=1
∇
ˆ
Θ
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
·
ˆ
Θ
, which is convex
(linear) w.r.t.
ˆ
Θ
. The gradient of
G
(
i
)
is upper bounded as
‖∇
ˆ
Θ
G
(
i
)
‖
=
∥
∥
∥
∥
∥
T
∑
t
=1
2
Y
1
(
x
(
i
)
t
)
>
(
Y
1
(
x
(
i
)
t
)
ˆ
Θ
(
i
)
+
Y
2
(
x
(
i
)
t
)ˆ
c
(
i
)
t
−
Y
1
(
x
(
i
)
t
)Θ
−
Y
2
(
x
(
i
)
t
)
c
(
i
)
+
w
(
i
)
t
)
∥
∥
∥
∥
∥
≤
T
(
2
K
2
1
K
Θ
+ 2
K
1
K
2
K
c
+ 2
K
2
1
K
Θ
+ 2
K
1
K
2
K
c
+ 2
K
1
W
)
=
T
(
4
K
2
1
K
Θ
+ 4
K
1
K
2
K
c
+ 2
K
1
W
︸
︷︷
︸
C
1
)
.
(17)
From Lemma 7, using learning rates
̄
η
(
i
)
=
2
K
Θ
TC
1
√
i
, the total regret of
A
1
is upper bounded as
R
(
A
1
)
≤
3
K
Θ
TC
1
√
N.
(18)
Finally using Theorem 3 we have
ACE
(OMAC)
≤
γ
1
−
ρ
√
W
2
+
R
(
A
1
) +
R
(
A
2
)
TN
≤
γ
1
−
ρ
√
W
2
+ 3(
K
Θ
C
1
1
√
N
+
K
c
C
2
1
√
T
)
.
(19)
Now let us analyze
ACE
(baseline adaptive control)
. To simplify notations, we define
̄
Y
(
x
) =
[
Y
1
(
x
)
Y
2
(
x
)] :
R
n
→
R
n
×
(
p
+
h
)
and
ˆ
α
= [
ˆ
Θ; ˆ
c
]
∈
R
p
+
h
. The baseline adaptive controller updates
the whole vector
ˆ
α
at every time step. We denote the ground truth parameter by
α
(
i
)
= [Θ;
c
(
i
)
]
,
and the estimation by
ˆ
α
(
i
)
t
= [
ˆ
Θ
(
i
)
t
; ˆ
c
(
i
)
t
]
. We have
‖
α
(
i
)
‖≤
√
K
2
Θ
+
K
2
c
. Define
̄
K
=
{
ˆ
α
= [
ˆ
Θ; ˆ
c
] :
‖
ˆ
Θ
‖≤K
Θ
,
‖
ˆ
c
‖≤K
c
}
, which is a convex set in
R
p
+
h
.
Note that the loss function for the baseline adaptive control is
̄
`
(
i
)
t
( ˆ
α
) =
‖
̄
Y
(
x
(
i
)
t
) ˆ
α
−
Y
1
(
x
(
i
)
t
)Θ
−
Y
2
(
x
(
i
)
t
)
c
(
i
)
+
w
(
i
)
t
‖
2
.
The gradient of
̄
`
(
i
)
t
is
∇
ˆ
α
̄
`
(
i
)
t
( ˆ
α
) = 2
[
Y
1
(
x
(
i
)
t
)
>
Y
2
(
x
(
i
)
t
)
>
]
(
Y
1
(
x
(
i
)
t
)
ˆ
Θ +
Y
2
(
x
(
i
)
t
)ˆ
c
−
Y
1
(
x
(
i
)
t
)Θ
−
Y
2
(
x
(
i
)
t
)
c
(
i
)
+
w
(
i
)
t
)
,
(20)
whose norm on
̄
K
is bounded by
√
4(
K
2
1
+
K
2
2
)(2
K
1
K
Θ
+ 2
K
2
K
c
+
W
)
2
=
√
C
2
1
+
C
2
2
.
(21)
Therefore, from Lemma 7, running OGD on
̄
K
with learning rates
2
√
K
2
Θ
+
K
2
c
√
C
2
1
+
C
2
2
√
t
gives the following
guarantee at each outer iteration:
T
∑
t
=1
̄
`
(
i
)
t
( ˆ
α
(
i
)
t
)
−
̄
`
(
i
)
t
(
α
(
i
)
)
≤
3
√
K
2
Θ
+
K
2
c
√
C
2
1
+
C
2
2
√
T.
(22)
Finally, similar as (12) we have
ACE
(baseline adaptive control)
≤
γ
1
−
ρ
√
∑
N
i
=1
∑
T
t
=1
̄
`
(
i
)
t
( ˆ
α
(
i
)
t
)
TN
≤
γ
1
−
ρ
√
∑
N
i
=1
3
√
K
2
Θ
+
K
2
c
√
C
2
1
+
C
2
2
√
T
+
∑
N
i
=1
∑
T
t
=1
̄
`
(
i
)
t
(
α
(
i
)
)
TN
≤
γ
1
−
ρ
√
W
2
+ 3
√
K
2
Θ
+
K
2
c
√
C
2
1
+
C
2
2
1
√
T
.
(23)
Note that this bound does not improve as the number of environments (i.e.,
N
) increases.
16
A.5 Proof of Theorem 5
Proof.
For any
Θ
∈K
1
and
c
(1:
N
)
∈K
2
we have
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
−
N
∑
i
=1
T
∑
t
=1
`
(
i
)
t
(Θ
,c
(
i
)
)
=
N
∑
i
=1
T
∑
t
=1
[
`
(
i
)
t
(
ˆ
Θ
(
i
)
,
ˆ
c
(
i
)
t
)
−
`
(
i
)
t
(
ˆ
Θ
(
i
)
,c
(
i
)
)
]
+
N
∑
i
=1
T
∑
t
=1
[
`
(
i
)
t
(
ˆ
Θ
(
i
)
,c
(
i
)
)
−
`
(
i
)
t
(Θ
,c
(
i
)
)
]
=
N
∑
i
=1
T
∑
t
=1
[
g
(
i
)
t
(
c
(
i
)
t
)
−
g
(
i
)
t
(
c
(
i
)
)
]
︸
︷︷
︸
≤
o
(
T
)
+
N
∑
i
=1
[
G
(
i
)
(
ˆ
Θ
(
i
)
)
−
G
(
i
)
(Θ)
]
︸
︷︷
︸
≤
T
·
o
(
N
)
(24)
Then combining with Lemma 1 results in the
ACE
bound.
A.6 Proof of Theorem 6
Proof.
Note that in this case the available measurement of
f
at the end of the outer iteration
i
is:
y
(
j
)
t
=
Y
(
x
(
j
)
t
)Θ
c
(
j
)
−
w
(
j
)
t
,
1
≤
j
≤
i,
1
≤
t
≤
T.
(25)
Recall that the Ridge-regression estimation of
ˆ
Θ
is given by
ˆ
Θ
(
i
+1)
= arg min
ˆ
Θ
λ
‖
ˆ
Θ
‖
2
F
+
i
∑
j
=1
T
∑
t
=1
‖
Y
(
x
(
j
)
t
)
ˆ
Θ
c
(
j
)
−
y
(
j
)
t
‖
2
= arg min
ˆ
Θ
λ
‖
ˆ
Θ
‖
2
F
+
i
∑
j
=1
T
∑
t
=1
‖
Z
(
j
)
t
vec(
ˆ
Θ)
−
y
(
j
)
t
‖
2
.
(26)
Note that
y
(
j
)
t
= (
c
(
j
)
>
⊗
Y
(
x
(
j
)
t
))
·
vec(Θ)
−
w
(
j
)
t
=
Z
(
j
)
t
vec(Θ)
−
w
(
j
)
t
. Define
V
i
=
λI
+
∑
i
j
=1
∑
T
t
=1
Z
(
j
)
>
t
Z
(
j
)
t
. Then from the Theorem 2 of [32] we have
‖
vec(
ˆ
Θ
(
i
+1)
−
Θ)
‖
V
i
≤
R
√
̄
ph
log(
1 +
iT
·
nK
2
Y
K
2
c
/λ
δ
) +
√
λK
Θ
(27)
for all
i
with probability at least
1
−
δ
. Note that the environment diversity condition implies:
V
i
Ω(
i
)
I
. Finally we have
‖
ˆ
Θ
(
i
+1)
−
Θ
‖
2
F
=
‖
vec(
ˆ
Θ
(
i
+1)
−
Θ)
‖
2
≤
O
(
1
i
)
O
(log(
iT/δ
)) =
O
(
log(
iT/δ
)
i
)
.
(28)
Then with a fixed
ˆ
Θ
(
i
+1)
, in outer iteration
i
+ 1
we have
g
(
i
+1)
t
(ˆ
c
) =
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
ˆ
c
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
.
(29)
Since
A
2
gives sublinear regret, we have
T
∑
t
=1
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
ˆ
c
(
i
+1)
t
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
−
min
ˆ
c
∈K
2
T
∑
t
=1
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
ˆ
c
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
=
o
(
T
)
.
(30)
Note that
min
ˆ
c
∈K
2
T
∑
t
=1
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
ˆ
c
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
≤
T
∑
t
=1
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
c
(
i
+1)
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
(
a
)
≤
TW
2
+
T
·
K
2
Y
·
O
(
log(
iT/δ
)
i
)
·
K
2
c
,
(31)
17
where
(
a
)
uses (28).
Finally we have
T
∑
t
=1
‖
ˆ
f
(
i
+1)
t
−
f
(
i
+1)
t
+
w
(
i
+1)
t
‖
2
=
T
∑
t
=1
‖
Y
(
x
(
i
+1)
t
)
ˆ
Θ
(
i
+1)
ˆ
c
(
i
+1)
t
−
Y
(
x
(
i
+1)
t
)Θ
c
(
i
+1)
+
w
(
i
+1)
t
‖
2
(
b
)
≤
o
(
T
) +
TW
2
+
O
(
T
log(
iT/δ
)
i
)
(32)
for all
i
with probability at least
1
−
δ
.
(
b
)
is from
(30)
and
(31)
. Then with Lemma 1 we have (with
probability at least
1
−
δ
)
ACE
≤
γ
1
−
ρ
√
∑
N
i
=1
o
(
T
) +
TW
2
+
O
(
T
log(
iT/δ
)
i
)
TN
≤
γ
1
−
ρ
√
√
√
√
W
2
+
o
(
T
)
T
+
O
(log(
NT/δ
))
N
N
∑
i
=1
1
i
≤
γ
1
−
ρ
√
W
2
+
o
(
T
)
T
+
O
(
log(
NT/δ
) log(
N
)
N
)
.
(33)
If we relax the environment diversity condition to
Ω(
√
i
)
, in
(28)
we will have
O
(
log(
iT/δ
)
√
i
)
. Therefore
in (33) the last term becomes
O
(log(
NT/δ
))
N
∑
N
i
=1
1
√
i
≤
O
(log(
NT/δ
))
√
N
.
A.7 Experimental details
A.7.1 Theoretical justification of Deep OMAC
Recall that in Deep OMAC (Table 4 in Section 5) the model class is
F
(
φ
(
x
;
ˆ
Θ)
,
ˆ
c
) =
φ
(
x
;
ˆ
Θ)
·
ˆ
c
,
where
φ
:
R
n
→
R
n
×
h
is a neural network parameterized by
ˆ
Θ
. We provide the following proposition
to justify such choice of model class.
Proposition 1.
Let
̄
f
(
x,
̄
c
) : [
−
1
,
1]
n
×
[
−
1
,
1]
̄
h
→
R
be an analytic function of
[
x,
̄
c
]
∈
[
−
1
,
1]
n
+
̄
h
for
n,
̄
h
≥
1
. Then for any
>
0
, there exist
h
(
)
∈
Z
+
, a polynomial
̄
φ
(
x
) : [
−
1
,
1]
n
→
R
h
(
)
and
another polynomial
c
( ̄
c
) : [
−
1
,
1]
̄
h
→
R
h
(
)
such that
max
[
x,
̄
c
]
∈
[
−
1
,
1]
n
+
̄
h
‖
̄
f
(
x,
̄
c
)
−
̄
φ
(
x
)
>
c
( ̄
c
)
‖≤
and
h
(
) =
O
((log(1
/
))
̄
h
)
.
Note that here the dimension of
c
depends on the precision
1
/
. In practice, for OMAC algorithms,
the dimension of
ˆ
c
or
c
(i.e., the latent space dimension) is a hyperparameter, and not necessarily
equal to the dimension of
̄
c
(i.e., the dimension of the actual environmental condition). A variant of
this proposition is proved in [
34
]. Since neural networks are universal approximators for polynomials,
this theorem implies that the structure
φ
(
x
;
ˆ
Θ)ˆ
c
can approximate any analytic function
̄
f
(
x,
̄
c
)
, and
the dimension of
ˆ
c
only increases polylogarithmically as the precision increases.
A.7.2 Pendulum dynamics model and controller design
In experiments, we consider a nonlinear pendulum dynamics with unknown gravity, damping and
external 2D wind
w
= [
w
x
;
w
y
]
∈
R
2
. The continuous-time dynamics model is given by
ml
2
̈
θ
−
ml
ˆ
g
sin
θ
=
u
+
f
(
θ,
̇
θ,c
(
w
))
︸
︷︷
︸
unknown
,
(34)
18
0
10
20
30
40
50
60
0.4
0.2
0.0
0.2
0.4
0.6
baseline adaptation
ACE: 0.204
0
10
20
30
40
50
60
time (s)
1.0
0.5
0.0
0.5
1.0
1.5
N
m
f
f
0
10
20
30
40
50
60
0.4
0.2
0.0
0.2
0.4
0.6
OMAC (convex)
ACE: 0.128
0
10
20
30
40
50
60
time (s)
1.0
0.5
0.0
0.5
1.0
1.5
0
10
20
30
40
50
60
0.4
0.2
0.0
0.2
0.4
0.6
OMAC (element-wise convex)
ACE: 0.112
0
10
20
30
40
50
60
time (s)
1.0
0.5
0.0
0.5
1.0
1.5
0
10
20
30
40
50
60
0.4
0.2
0.0
0.2
0.4
0.6
OMAC (deep learning)
ACE: 0.062
0
10
20
30
40
50
60
time (s)
1.0
0.5
0.0
0.5
1.0
1.5
Figure 2: Trajectories (top) and force predictions (bottom) in the pendulum experiment from one
random seed. The wind condition is switched randomly every
2 s
(indicated by the dashed red lines).
The performance of OMAC improves as it encounters more environments while baseline not.
where
f
(
θ,
̇
θ,c
(
w
)) =
~
l
×
F
wind
︸
︷︷
︸
air drag
−
α
1
̇
θ
︸︷︷︸
damping
+
ml
(
g
−
ˆ
g
) sin
θ
︸
︷︷
︸
gravity mismatch
,
F
wind
=
α
2
·‖
r
‖
2
·
r,r
=
w
−
[
l
̇
θ
cos
θ
−
l
̇
θ
sin
θ
]
.
(35)
This model generalizes the pendulum with external wind model in [
35
] by introducing extra modelling
mismatches (e.g., gravity mismatch and unknown damping). In this model,
α
1
is the damping
coefficient,
α
2
is the air drag coefficient,
r
is the relative velocity of the pendulum to the wind,
F
wind
is the air drag force vector, and
~
l
is the pendulum vector. Define the state of the pendulum as
x
= [
θ
;
̇
θ
]
. The discrete dynamics is given by
x
t
+1
=
[
θ
t
+
δ
·
̇
θ
t
̇
θ
t
+
δ
·
ml
ˆ
g
sin
θ
t
+
u
t
+
f
(
θ
t
,
̇
θ
t
,c
)
ml
2
]
=
[
1
δ
0 1
]
︸
︷︷
︸
A
x
t
+
[
0
δ
ml
2
]
︸
︷︷
︸
B
(
u
t
+
ml
ˆ
g
sin
θ
t
+
f
(
x
t
,c
))
,
(36)
where
δ
is the discretization step. We use the controller structure
u
t
=
−
Kx
t
−
ml
ˆ
g
sin
θ
t
−
ˆ
f
for all
6 controllers in the experiments, but different controllers have different methods to calculate
ˆ
f
(e.g.,
the
no-adapt
controller uses
ˆ
f
= 0
and the
omniscient
one uses
ˆ
f
=
f
). We choose
K
such that
A
−
BK
is stable (i.e., the spectral radius of
A
−
BK
is strictly smaller than 1), and then the e-ISS
assumption in Assumption 1 naturally holds. We visualize the pendulum experiment results in fig. 2.
A.7.3 Quadrotor dynamics model and controller design
Now we introduce the quadrotor dynamics with aerodynamic disturbance. Consider states given by
global position,
p
∈
R
3
, velocity
v
∈
R
3
, attitude rotation matrix
R
∈
SO(3)
, and body angular
velocity
ω
∈
R
3
. Then dynamics of a quadrotor are
̇
p
=
v,
m
̇
v
=
mg
+
Rf
T
+
f,
(37a)
̇
R
=
RS
(
ω
)
,
J
̇
ω
=
Jω
×
ω
+
τ,
(37b)
where
m
is the mass,
J
is the inertia matrix of the quadrotor,
S
(
·
)
is the skew-symmetric mapping,
g
is the gravity vector,
f
T
= [0
,
0
,T
]
>
and
τ
= [
τ
x
,τ
y
,τ
z
]
>
are the total thrust and body torques from
four rotors, and
f
= [
f
x
,f
y
,f
z
]
>
are forces resulting from unmodelled aerodynamic effects and
varying wind conditions. In the simulator,
f
is implemented as the aerodynamic model given in [
36
].
Controller design.
Quadrotor control, as part of multicopter control, generally has a cascaded
structure to separate the design of the position controller, attitude controller, and thrust mixer
19
(allocation). In this paper, we incorporate the online learned aerodynamic force
ˆ
f
in the position
controller via the following equation:
f
d
=
−
mg
−
m
(
K
P
·
p
+
K
D
·
v
)
−
ˆ
f,
(38)
where
K
P
,K
D
∈
R
3
×
3
are gain matrices for the PD nominal term, and different controllers have
different methods to calculate
ˆ
f
(e.g., the
omniscient
controller uses
ˆ
f
=
f
). Given the desired
force
f
d
, a kinematic module decomposes it into the desired
R
d
and the desired thrust
T
d
so that
R
d
·
[0
,
0
,T
d
]
>
≈
f
d
. Then the desired attitude and thrust are sent to a lower level attitude controller
(e.g., the attitude controller in [51]).
20