of 7
Neural Lander: Stable Drone Landing Control
using Learned Dynamics
Guanya Shi
1
, Xichen Shi
1
, Michael O’Connell
1
, Rose Yu
2
, Kamyar Azizzadenesheli
3
,
Animashree Anandkumar
1
, Yisong Yue
1
, and Soon-Jo Chung
1
Abstract
— Precise trajectory control near ground is difficult
for multi-rotor drones, due to the complex ground effects
caused by interactions between multi-rotor airflow and the en-
vironment. Conventional control methods often fail to properly
account for these complex effects and fall short in accomplishing
smooth landing. In this paper, we present a novel deep-learning-
based robust nonlinear controller (
Neural-Lander
) that im-
proves control performance of a quadrotor during landing. Our
approach blends together a nominal dynamics model coupled
with a Deep Neural Network (DNN) that learns the high-
order interactions. We employ a novel application of spectral
normalization to constrain the DNN to have bounded Lipschitz
behavior. Leveraging this Lipschitz property, we design a
nonlinear feedback linearization controller using the learned
model and prove system stability with disturbance rejection. To
the best of our knowledge, this is the first DNN-based nonlinear
feedback controller with stability guarantees that can utilize
arbitrarily large neural nets. Experimental results demonstrate
that the proposed controller significantly outperforms a baseline
linear proportional-derivative (PD) controller in both 1D and
3D landing cases. In particular, we show that compared to the
PD controller,
Neural-Lander
can decrease error in
z
direction
from 0.13m to zero, and mitigate average
x
and
y
drifts by
90% and 34% respectively, in 1D landing. Meanwhile,
Neural-
Lander
can decrease
z
error from 0.12m to zero, in 3D landing.
We also empirically show that the DNN generalizes well to new
test inputs outside the training domain.
I. I
NTRODUCTION
Unmanned Aerial Vehicles (UAVs) require high precision
control of aircraft positioning, especially during landing and
take-off. This problem is challenging largely due to complex
interactions of rotor airflows with the ground. The aerospace
community has long identified the change in aerodynamic
forces when helicopters or aircraft fly close to the ground.
Such ground effects cause an increased lift force and a reduced
aerodynamic drag, which can be both helpful and disruptive
in flight stability [1], and the complications are exacerbated
with multiple rotors. Therefore, performing automatic landing
of UAVs is risk-prone, and requires expensive high-precision
sensors as well as carefully designed controllers.
Compensating for ground effect is a long-standing problem
in the aerial robotics community. Prior work has largely
focused on mathematical modeling (e.g. [2]) as part of
system identification (ID). These ground-effect models are
later used to approximate aerodynamics forces during flights
close to the ground and combined with controller design
for feed-forward cancellation (e.g. [3]). However, existing
1
California Institute of Technology,
2
Northeastern University,
3
University
of California, Irvine.
theoretical ground effect models are derived based on steady-
flow conditions, whereas most practical cases exhibit unsteady
flow. Alternative approaches, such as integral or adaptive
control methods, often suffer from slow response and delayed
feedback. [4] employs Bayesian Optimization for open-air
control but not for take-off/landing. Given these limitations,
the precision of existing fully automated systems for UAVs are
still insufficient for landing and take-off, thereby necessitating
the guidance of a human UAV operator during those phases.
To capture complex aerodynamic interactions without
not being overly-constrained by conventional modeling as-
sumptions, we take a machine-learning (ML) approach to
build a black-box ground effect model using Deep Neural
Networks (DNNs). However, incorporating black-box models
into a UAV controller faces three key challenges. First,
DNNs are notoriously data-hungry and it is challenging to
collect sufficient real-world training data. Second, due to
high-dimensionality, DNNs can be unstable and generate
unpredictable output, which makes the system susceptible
to instability in the feedback control loop. Third, DNNs are
often difficult to analyze, which makes it difficult to design
provably stable DNN-based controllers.
The aforementioned challenges pervade previous works
using DNNs to capture high-order non-stationary dynamics.
For example, [5], [6] use DNNs to improve system ID of
helicopter aerodynamics but not for downstream controller
design. Other approaches aim to generate reference inputs or
trajectories from DNNs [7]–[10]. However, such approaches
can lead to challenging optimization problems [7], or heavily
rely on well-designed closed-loop controller and require a
large number of labeled training data [8]–[10]. A more classi-
cal approach of using DNNs is direct inverse control [11]–[13]
but the non-parametric nature of a DNN controller also makes
it challenging to guarantee stability and robustness to noise.
[14] proposes a provably stable model-based Reinforcement
Learning method based on Lyapunov analysis. However, their
approach requires a potentially expensive discretization step
and relies on the native Lipschitz constant of the DNN.
Contributions.
In this paper, we propose a learning-
based controller,
Neural-Lander
, to improve the precision of
quadrotor landing with guaranteed stability. Our approach
does directly learns the ground effect on coupled unsteady
aerodynamics and vehicular dynamics. We use deep learning
for system ID of residual dynamics and then integrate it with
nonlinear feedback linearization control.
We train DNNs with spectral normalization of layer-
wise weight matrices. We prove that the resulting controller
arXiv:1811.08027v1 [cs.RO] 19 Nov 2018
is globally exponentially stable under bounded learning
errors. This is achieved by exploiting the Lipschitz bound of
spectrally normalized DNNs. It has earlier been shown that
spectral normalization of DNNs leads to good generalization,
i.e. stability in a learning-theoretic sense [15]. It is intriguing
that spectral normalization simultaneously guarantees stability
both in a learning-theoretic and a control-theoretic sense.
We evaluate
Neural-Lander
for trajectory tracking of
quadrotor during take-off, landing and near ground maneuvers.
Neural-Lander
is able to land a quadrotor much more
accurately than a naive PD controller with a pre-identified
system. In particular, we show that compared to the PD
controller,
Neural-Lander
can decrease error in
z
direction
from 0.13m to zero, and mitigate
x
and
y
drifts by 90% and
34% respectively, in 1D landing. Meanwhile,
Neural-Lander
can decrease
z
error from 0.12m to zero, in 3D landing.
1
We
also demonstrate that the learned ground-effect model can
handle temporal dependency, and is an improvement over the
steady-state theoretical models in use today.
II. P
ROBLEM
S
TATEMENT
: Q
UADROTOR
L
ANDING
Given quadrotor states as global position
p
R
3
, velocity
v
R
3
, attitude rotation matrix
R
SO(3)
, and body angular
velocity
ω
R
3
, we consider the following dynamics:
̇
p
=
v
,
m
̇
v
=
m
g
+
R
f
u
+
f
a
,
̇
R
=
RS
(
ω
)
, J
̇
ω
=
J
ω
×
ω
+
τ
u
+
τ
a
,
(1)
where
g
= [0
,
0
,
g
]
>
is the gravity vector, and
f
u
=
[0
,
0
,T
]
>
and
τ
u
= [
τ
x
y
z
]
>
are the total thrust and
body torques from four rotors predicted by a nominal model.
We use
η
= [
T,τ
x
y
z
]
>
to denote the output wrench.
The linear equation
u
= [
n
2
1
,n
2
2
,n
2
3
,n
2
4
]
>
relates the control
input of squared motor speeds to the output wrench with its
nominal relation given as
η
=
B
0
u
:
B
0
=
[
c
T
c
T
c
T
c
T
0
c
T
l
arm
0
c
T
l
arm
c
T
l
arm
0
c
T
l
arm
0
c
Q
c
Q
c
Q
c
Q
]
,
(2)
where
c
T
and
c
Q
denote some empirical coefficient values
for force and torque generated by an individual rotor, and
l
arm
denotes the length of each rotor arm.
The key difficulty of precise landing is the influence
of unknown disturbance forces
f
a
= [
f
a,x
,f
a,y
,f
a,z
]
>
and torques
τ
a
= [
τ
a,x
a,y
a,z
]
>
, which originate from
complex aerodynamic interactions between the quadrotor and
the environment. For example, during the landing process,
when the quadrotor is close to ground, vertical aerodynamic
force
f
a,z
will be significant. Also, as
v
increases, air drag
will be exacerbated, which contributes to
f
a
.
Problem Statement:
For
(1)
, our goal is to learn the
unknown disturbance forces
f
a
and torques
τ
a
from partial
states and control inputs, in order to improve the controller
accuracy. In this paper, we are only interested in position
dynamics (the first two equations in (1)). As we mainly focus
on landing and take-off, the attitude dynamics is limited and
the aerodynamic disturbance torque
τ
a
is bounded. We take
1
Demo videos:
https://youtu.be/C_K8MkC_SSQ
a deep learning approach by approximating
f
a
using a Deep
Neural Network (DNN) followed by spectral normalization to
guarantee the stability of the DNN outputs. We then design an
exponentially-stabilizing controller with superior robustness
than using only the nominal system dynamics. Training is
done off-line, and the learned dynamics is applied in the
on-board controller in real-time.
III. L
EARNING
S
TABLE
DNN D
YNAMICS
To learn the residual dynamics, we employ a deep neural
network with Rectified Linear Units (ReLU) activation. In
general, DNNs equipped with ReLU converge faster during
training, demonstrate more robust behavior with respect to
hyperparameters changes, and have fewer vanishing gradient
problems compared to other activation functions such as
sigmoid, tanh
[16].
A. ReLU Deep Neural Networks
A ReLU deep neural network represents the functional
mapping from the input
x
to the output
f
(
x
;
θ
)
, parameterized
by the DNN weights
θ
=
W
1
,
···
,W
L
+1
:
f
(
x
;
θ
) =
W
L
+1
φ
(
W
L
(
φ
(
W
L
1
(
···
φ
(
W
1
x
)
···
))))
,
(3)
where the activation function
φ
(
·
) = max(
·
,
0)
is called the
element-wise ReLU function. ReLU is less computationally
expensive than
tanh
and
sigmoid
because it involves simpler
mathematical operations. However, deep neural networks are
usually trained by first-order gradient based optimization,
which is highly dependent on the curvature of the training
objective and can be very unstable [17]. To alleviate this
issue, we apply the spectral normalization technique [15] in
the feedback control loop to guarantee stability.
B. Spectral Normalization
Spectral normalization stabilizes DNN training by con-
straining the Lipschitz constant of the objective function.
Spectral normalization has also been shown to generalize well
[18] and in machine learning generalization is a notion of
stability. Mathematically, the Lipschitz constant of a function
f
Lip
is defined as the smallest value such that
x
,
x
:
f
(
x
)
f
(
x
)
2
/
x
x
2
≤‖
f
Lip
.
It is known that the Lipschitz constant of a general differen-
tiable function
f
is the maximum spectral norm (maximum
singular value) of its gradient over its domain
f
Lip
=
sup
x
σ
(
f
(
x
))
.
The ReLU DNN in (3) is a composition of functions. Thus
we can bound the Lipschitz constant of the network by con-
straining the spectral norm of each layer
g
L
(
x
) =
φ
(
W
L
x
)
.
Therefore, for a linear map
g
(
x
) =
W
x
, the spectral norm
of each layer is given by
g
Lip
= sup
x
σ
(
g
(
x
)) =
sup
x
σ
(
W
) =
σ
(
W
)
. Using the fact that the Lipschitz norm
of ReLU activation function
φ
(
·
)
is equal to 1, with the
inequality
g
1
g
2
Lip
≤ ‖
g
1
Lip
·‖
g
2
Lip
, we can find the
following bound on
f
Lip
:
f
Lip
≤‖
g
L
+1
Lip
·‖
φ
Lip
···‖
g
1
Lip
=
L
+1
l
=1
σ
(
W
l
)
.
(4)
In practice, we can apply spectral normalization to the weight
matrices in each layer during training as follows:
̄
W
=
W/σ
(
W
)
.
(5)
The following lemma bounds the Lipschitz constant of a
ReLU DNN with spectral normalization.
Lemma 3.1:
For a multi-layer ReLU network
f
(
x
;
θ
)
,
defined in (3) without an activation function on the output
layer. Using spectral normalization, the Lipschitz constant of
the entire network satisfies:
f
(
x
;
̄
θ
)
Lip
1
,
with spectrally-normalized parameters
̄
θ
=
̄
W
1
,
···
,
̄
W
L
+1
.
Proof:
As in (4), the Lipschitz constant can be written
as a composition of spectral norms over all layers. The proof
follows from the spectral norms constrained as in (5).
C. Constrained Training
We apply first-order gradient-based optimization to train
the ReLU DNN. Estimating
f
a
in (1) boils down to optimizing
the parameters
θ
in the ReLU network model in (3), given
observed value of
x
and the target output. In particular, we
want to control the Lipschitz constant of the ReLU network.
The optimization objective is as follows, where we mini-
mize the prediction error with constrained Lipschitz constant:
minimize
θ
T
t
=1
1
T
y
t
f
(
x
t
;
θ
)
2
subject to
f
Lip
1
.
(6)
Here
y
t
is the observed disturbance forces and
x
t
is the
observed states and control inputs. According to the upper
bound in (4), we can substitute the constraint by minimizing
the spectral norm of the weights in each layer. We use
stochastic gradient descent (SGD) to optimize (6) and apply
spectral normalization to regulate the weights. From Lemma
3.1, the trained ReLU DNN has a bounded Lipschitz constant.
IV. N
EURAL
L
ANDER
C
ONTROLLER
D
ESIGN
We design our controller to allow 3D landing trajectory
tracking for quadrotors. Our controller integrates a DNN-
based dynamic learning module with a proportional-derivative
(PD) controller. In order to keep the design simple, we re-
design the PD controller to account for the disturbance force
term learned from the ReLU DNN. We solve for the resulting
nonlinear controller using fixed-point iteration.
A. Reference Trajectory Tracking
The position tracking error is defined as
̃
p
(
t
) =
p
(
t
)
p
d
(
t
)
. We design an integral controller with the composite
variable:
s
=
̇
p
v
r
=
̇
̃
p
+ 2Λ
̃
p
+ Λ
2
t
0
̃
p
(
τ
)
dτ,
(7)
with
Λ
as a positive diagonal matrix. Then
s
= 0
is a
manifold on which
̃
p
(
t
)
0
exponentially quickly. Now we
have transformed the position tracking problem to a velocity
tracking one, we would like the actual force exerted by the
rotor to satisfy:
̄
f
d
=
m
̇
v
r
K
v
s
m
g
,
(
R
f
u
)
d
=
̄
f
d
f
a
,
(8)
so that the closed-loop dynamics would simply become
m
̇
s
+
K
v
s
=
f
a
ˆ
f
a
=

. Hence, these exponentially-stabilizing
dynamics guarantee that
p
(
t
)
converge exponentially and
globally to
p
d
(
t
)
with bounded error, if
f
a
ˆ
f
is bounded [19],
[20](see Sec. V). Let
f
d
denote the total desired force vector
from the quadrotor, then total thrust
T
and desired force
direction
ˆ
k
d
can be computed from (8),
T
d
=
f
d
·
ˆ
k,
ˆ
k
d
=
f
d
/
f
d
,
(9)
with
ˆ
k
being the direction of rotor thrust (typically
z
-axis
of quadrotors). Using
ˆ
k
d
and fixing a desired yaw angle,
R
d
SO(3) or a desired value of any attitude representation
q
d
can be obtained [21]. We assume the attitude controller
comes in the form of desired torque
τ
d
to be generated by
the four rotors. One such example is:
τ
d
=
J
̇
ω
r
J
ω
×
ω
r
K
ω
(
ω
ω
r
)
,
(10)
where
ω
r
=
Z
1
(
̇
q
d
r
̃
q
)
with
̇
q
=
Z
(
q
)
ω
, or see [20] for
SO(3) tracking control. Note that (10) guarantees exponential
tracking of a desired attitude trajectory within some bounded
error in the presence of some disturbance torques.
B. Learning-based Discrete-time Nonlinear Controller
Using methods described in Sec. III, we define
ˆ
f
a
(
ζ
,
u
)
as the approximation to the disturbance aerodynamic forces,
with
ζ
being the partial states used as input features. Then
desired total force is revised as
f
d
=
̄
f
d
ˆ
f
a
(
ζ
,
u
)
.
Because of the dependency of
ˆ
f
a
on
u
, the control synthesis
problem here uses a non-affine control input for
u
:
B
0
u
=
[(
̄
f
d
ˆ
f
a
(
ζ
,
u
)
)
·
ˆ
k
τ
d
]
.
(11)
With
η
d
= [
T
d
,
τ
>
d
]
>
, We propose the following fixed-point
iterative method for solving (11)
u
(
t
) =
u
k
=
B
1
0
η
d
(
u
k
1
)
,
(12)
and
u
k
1
is the control input from the previous time-step in
the controller. The stability of the system and convergence
of controller (12) will be proved in Sec. V.
V. N
ONLINEAR
S
TABILITY
A
NALYSIS
The closed-loop tracking error analysis provides a direct
correlation on how to tune the neural network and controller
parameter to improve control performance and robustness.
A. Control Allocation as Contraction Mapping
We first show that
u
k
converges to the solution of (11)
when all states are fixed.
Lemma 5.1:
Fixing all current states, define mapping
u
k
=
F
(
u
k
1
)
based on (12):
F
(
u
) =
B
1
0
[(
̄
f
d
ˆ
f
a
(
ζ
,
u
)
)
·
ˆ
k
τ
d
]
.
(13)
If
ˆ
f
a
(
ζ
,
u
)
is
L
a
-Lipschitz continuous, and
σ
(
B
1
0
)
·
L
a
<
1
;
then
F
(
·
)
is a contraction mapping, and
u
k
converges to
unique solution of
u
=
F
(
u
)
.
Proof:
u
1
,
u
2
∈ U
with
U
being a compact set of
feasible control inputs; and given fixed states as
̄
f
d
,
τ
d
and
ˆ
k
, then:
‖F
(
u
1
)
−F
(
u
2
)
=
B
1
0
(
ˆ
f
a
(
ζ
,
u
1
)
ˆ
f
a
(
ζ
,
u
2
)
)
σ
(
B
1
0
)
·
L
a
u
1
u
2
.
Thus,
γ <
1
,
‖F
(
u
1
)
− F
(
u
2
)
< γ
u
1
u
2
. Hence,
F
(
·
)
is a contraction mapping.
B. Stability of Learning-based Nonlinear Controller
Before continuing to prove stability of the full system, we
make the following assumptions.
Assumption 1:
The desired states along the position tra-
jectory
p
d
(
t
)
,
̇
p
d
(
t
)
, and
̈
p
d
(
t
)
are bounded.
Note that trajectory generation can guarantee tight bounds
through optimization [21], [22] or simple clipping.
Assumption 2:
u
updates much faster than position con-
troller. And one-step difference of control signal satisfies
u
k
u
k
1
‖≤
ρ
s
with a small positive
ρ
.
Tikhonovs’s Theorem (Theorem 11.1 [23]) provides a foun-
dation for such a time-scale separation, where
u
converges
much faster than the slower
s
dynamics. From (13), we
can derive the following approximate relation with
∆(
·
)
k
=
(
·
)
k
(
·
)
k
1
:
u
k
σ
(
B
1
0
)
(
L
a
u
k
1
+
L
a
ζ
k
+
m
∆ ̇
v
r,k
+
λ
max
(
K
v
)∆
s
k
+ ∆
τ
d,k
)
.
By using the fact that the frequencies of attitude control (
>
100 Hz
) and motor speed control (
>
5 kHz
) are much higher
than that of the position controller (
10 Hz
) in practice, we
can safely assume that
s
k
,
∆ ̇
v
r,k
, and
ζ
k
in one update
step become negligible. Furthermore,
τ
d,k
can be limited
internally by the attitude controller. It leads to:
u
k
σ
(
B
1
0
)
(
L
a
u
k
1
+
c
)
.
With
c
being a small constant and
σ
(
B
1
0
)
·
L
a
<
1
from
Lemma. 5.1, we can deduce that
u
rapidly converges to a
small ultimate bound between each position controller update.
Assumption 3:
The approximation error of
ˆ
f
a
(
ζ
,
u
)
over
the compact sets
Z
,
U
is upper bounded by

m
=
sup
ζ
∈Z
,
u
∈U

(
ζ
,
u
)
, where

(
ζ
,
u
) =
f
a
(
ζ
,
u
)
ˆ
f
a
(
ζ
,
u
)
.
DNNs have been shown to generalize well to the set of
unseen events that are from almost the same distribution as
a training set [24], [25]. This empirical observation is also
theoretically studied in order to shed more light toward an
understanding of the complexity of these models [18], [26]–
[28]. Our experimental results show that our proposed training
method in Sec. III generalizes well on unseen events and
results in a better performance on unexplored data (Sec. VI-C).
Composing our stability result rigorously with generalization
error would be an interesting direction for future work.
Based on these assumptions, we can now present our overall
robustness result.
Theorem 5.2:
Under Assumptions 1-3, for a time-varying
desired trajectory
p
d
(
t
)
, the controller defined in (8) and (12)
with
λ
min
(
K
v
)
> L
a
ρ
achieves exponential convergence to
the error ball
lim
t
→∞
s
(
t
)
=

m
/
(
λ
min
(
K
v
)
L
a
ρ
)
and
lim
t
→∞
p
p
d
=
|
̇
e
m
|
/
(
λ
min
(Λ)
2
(
λ
min
(
K
v
)
L
a
ρ
)
)
(14)
where

(
ζ
,
u
)
is further broken into

(
ζ
,
u
) =
e
0
+
e
(
ζ
,
u
)
with a constant
e
0
and
|
̇
e
m
|
= sup
ζ
∈Z
,
u
∈U
̇
e
(
ζ
,
u
)
.
Proof:
We begin the proof by selecting a Lyapunov
function based on as
V
(
s
) =
1
2
m
s
2
, then by applying the
controller (8), we get the time-derivative of
V
:
̇
V
=
s
>
(
K
v
s
+
ˆ
f
a
(
ζ
k
,
u
k
)
ˆ
f
a
(
ζ
k
,
u
k
1
) +

(
ζ
k
,
u
k
)
)
≤−
s
>
K
v
s
+
s
(
ˆ
f
a
(
ζ
k
,
u
k
)
ˆ
f
a
(
ζ
k
,
u
k
1
)
+

m
)
Let
λ
=
λ
min
(
K
v
)
denote the minimum eigenvalue of
the positive-definite matrix
K
v
. By applying the Lipschitz
property of the network approximator theorem 3.1 and
Assumption 2, we obtain
̇
V ≤−
(
λ
L
a
ρ
)
s
2
+
s

m
≤−
2 (
λ
L
a
ρ
)
m
V
+
2
V
m

m
Using the Comparison Lemma [23], we define
W
(
t
) =
V
(
t
) =
m/
2
s
and
̇
W
=
̇
V
/
(
2
V
)
to obtain
s
(
t
)
‖≤‖
s
(
t
0
)
exp
(
λ
L
a
ρ
m
(
t
t
0
)
)
+

m
λ
L
a
ρ
It can be shown that this leads to finite-gain
L
p
stability
and input-to-state stability (ISS) [29]. Furthermore, the
hierarchical combination between
s
and
̃
p
in (7) yields (14).
Note that disabling integral control in (7) (i.e.,
s
=
̇
̃
p
+ Λ
̃
p
)
results in
lim
t
→∞
̃
p
(
t
)
= lim
t
→∞
s
(
t
)
min
(Λ)
.
By designing the controller gain
K
and Lipschitz constant
L
a
of the DNN, we ensure
λ
L
a
ρ >
0
and achieve
exponential tracking within bound
B
(
μ
)
.
VI. E
XPERIMENTS
In our experiments, we evaluate both the generalization per-
formance of our DNN as well as overall control performance
of
Neural-Lander
. The experimental setup is composed of
17 motion capture cameras, the communication router for
sending signals and the drone. The data was collected from
an Intel Aero quadrotor weighting 1.47 kg with a computer
running on it (2.56 GHz Intel Atom x7 processor, 4 GB
DDR3 RAM). We retrofit the drone with eight reflective
markers to allow for accurate position, attitude and velocity
estimation at 100Hz. The Intel Aero drone and the test space
are shown in Fig. 1.
A. Bench Test
To identify a good nominal model, we first performed
bench tests to estimate
m
,
D
,
ρ
,
g
, and
c
T
, which are mass,
diameter of rotor, air density, gravity, and thrust coefficient,
respectively. The nondimensional thrust coefficient,
C
T
, is
defined as
C
T
=
c
T
ρD
4
. Note that
C
T
is a function of propeller
speed,
n
, and here we used a nominal value when
n
= 2000
RPM (the idle RPM) for following data collection session.
How
C
T
changes with
n
is also discussed in Sec. VI-C.
Fig. 1: Intel Aero drone during experiments.
Part I
Part II
Fig. 2: Training data trajectory.
B. Real-World Flying Data and Preprocessing
In order to estimate the effect of disturbance force
f
a
, we
collected states and control inputs, while flying the drone
close to the ground, manually controlled by an expert pilot.
Our training data is shown in Fig. 2. We collected a single
trajectory with varying heights and velocities. The trajectory
has two parts. Part I (0s-250s in Fig. 2) contains maneuver
at different fixed
z
(0.05m-1.5m) with random
x
and
y
. This
can be used to estimate the ground effect. Part II (250s-350s
in Fig. 2) includes random
x
,
y
and
z
motions to cover the
feasible state space as much as possible. For this part, we
aim to learn about non-dominant aerodynamics such as air
drag. We note that our training data is quite modest in size
by the standards of deep learning.
Since our learning task is to regress
f
a
from state and
control inputs, we also need output data of
f
a
. We utilized
the relation
f
a
=
m
̇
v
m
g
R
f
u
from (1) to calculate
f
a
.
Here
f
u
is calculated based on the nominal
c
T
from the bench
test (Sec. VI-A). Our training set consists of sequences of
{
(
p
,
v
,R,
u
)
,
y
}
, where
y
is the observed value of
f
a
. The
entire dataset was split into training (60%), test (20%) and
validation set (20%) for model hyper-parameter tuning.
C. DNN Prediction Performance
We train
f
a
using a deep ReLU network, where
ˆ
f
a
=
ˆ
f
a
(
z,
v
,R,
u
) =
ˆ
f
a
(
ζ
,
u
)
, with
z
,
v
,
R
,
u
corresponding to
global height, global velocity, attitude, and control input. We
build the ReLU network using PyTorch, an open-source deep
learning library [30]. Our ReLU network consists of four fully-
connected hidden layers, with input and the output dimensions
12 and 3, respectively. We use spectral normalization (SN)
(5) to bound the Lipschitz constant.
To investigate how well our DNN can estimate
f
a
, espe-
cially when close to the ground, we compare with a well-
known 1D steady ground effects model [1], [3]:
T
(
n,z
) =
n
2
1
μ
(
D
8
z
)
2
c
T
(
n
) =
n
2
c
T
(
n
0
) +
̄
f
a,z
,
(15)
where
T
is the thrust generated by propellers,
n
is the rotation
speed,
n
0
is the idle RPM, and
μ
depends on the number
Training set
domain
Training set
domain
New
domain
New
domain
(a)
(b)
(c)
Fig. 3: (a) Learned
ˆ
f
a,z
compared to the ground effect model
with respect to height
z
.
v
z
= 0
m/s and other dimensions
of state are fixed (
v
x
,v
y
= 0
m/s,
R
=
I
,
u
= 6400
RPM).
Ground truth points are from hovering data at different heights.
(b) Learned
ˆ
f
a,z
with respect to rotation speed
n
(
z
= 0
.
2
m,
v
z
= 0
m/s and other dimensions of state are fixed), compared
to
C
T
measured in the bench test. (c) Heatmaps of learned
ˆ
f
a,z
versus
z
and
v
z
, and other dimensions are fixed. (Left)
Learned
ˆ
f
a,z
from ReLU network with spectral normalization
with
f
Lip
= 1
. (Right) Learned
ˆ
f
a,z
from ReLU network
without spectral normalization with
f
Lip
= 4
.
97
.
and arrangement of propellers (
μ
= 1
for a single propeller,
but must be tuned for multiple propellers). Note that
c
T
is a
function of
n
. Thus, we can derive
̄
f
a,z
(
n,z
)
from
T
(
n,z
)
.
Fig. 3(a) shows the comparison between the estimated
f
a
from DNN and the theoretical ground effect model (15) as we
vary the global height
z
(assuming
T
=
mg
when
z
=
).
We see that our DNN can achieve much better estimates than
the theoretical ground effect model. We further investigate the
trend of
̄
f
a,z
with respect to the rotation speed
n
. Fig. 3(b)
shows the learned
ˆ
f
a,z
over the rotation speed
n
at a given
height, in comparison with the
C
T
measured from the bench
test. We observe that the increasing trend of the estimates
ˆ
f
a,z
is consistent with bench test results for
C
T
.
To understand the benefits of SN, we compared
ˆ
f
a,z
predicted by DNNs trained both with and without SN.
Fig. 3(c) shows the results. Note that -1 m/s to 1 m/s is
covered in our training set but -2 m/s to -1 m/s is not. We
see differences in:
1)
Ground effect:
ˆ
f
a,z
increases as
z
decreases, which is
also shown in Fig. 3(a).
2)
Air drag:
ˆ
f
a,z
increases as the drone goes down (
v
z
<
0
) and it decreases as the drone goes up (
v
z
>
0
).
final error: zero
final error: 0.13 m
mean L1 error: 0.007 m
mean L1 error: 0.072 m
mean L1 error: 0.021 m
mean L1 error: 0.032 m
Fig. 4: PD and
Neural-Lander
performance in 1D take-off
and landing. Means (solid curves) and standard deviations
(shaded areas) of 10 trajectories.
3)
Generalization: the spectral normalized DNN is much
smoother and can also generalize to new input domains
not contained in the training set.
In [18], the authors theoretically show that spectral nor-
malization can provide tighter generalization guarantees on
unseen data, which is consistent with our empirical results.
An interesting future direction is to connect generalization
theory more tightly with our robustness guarantees.
D. Control Performance
We used PD controller as the baseline controller and
implemented both the baseline and
Neural-Lander
without
an integral term in (7-8). First we tested these two controller
for the 1D take-off/landing task, i.e., moving the drone from
(0
,
0
,
0)
to
(0
,
0
,
1)
and then returning it to
(0
,
0
,
0)
, as shown
in Fig. 4. Second we compare the controllers for the 3D
take-off/landing task, i.e., moving the drone from
(0
,
0
,
0)
to
(0
.
5
,
0
.
5
,
1)
and then returning it to
(0
,
0
,
0)
, as shown in
Fig. 5. For both tasks, we repeated the experiments
10
times
and computed the means and the standard deviations of the
take-off/landing trajectories.
2
From Figs. 4 and 5, we can conclude that the main benefits
of our
Neural-Lander
are: (a) In both 1D and 3D cases,
Neural-Lander
can control the drone to precisely land on the
ground surface while the baseline controller cannot land due
to the ground effect. (b) In both 1D and 3D cases,
Neural-
Lander
could mitigate drifts in
x
and
y
directions, as it also
learned about non-dominant aerodynamics such as air drag.
2
Demo videos:
https://youtu.be/C_K8MkC_SSQ
(a)
(b)
final z error: 0.119m
final z error: zero
Fig. 5: PD and
Neural-Lander
performance in 3D take-off
and landing. Means (solid curves) and standard deviations
(shaded areas) of 10 trajectories.
In experiments, we observed a naive un-normalized DNN
(
f
Lip
= 247
) can even result in crash, which also implies
the importance of spectral normalization.
VII. C
ONCLUSIONS
In this paper, we present
Neural-Lander
, a deep learning
based nonlinear controller with guaranteed stability for precise
quadrotor landing. Compared to traditional ground effect
models,
Neural-Lander
is able to significantly improve control
performance. The main benefits are: (1) our method can learn
from coupled unsteady aerodynamics and vehicle dynamics,
and provide more accurate estimates than theoretical ground
effect models; (2) our model can capture both the ground
effect and the nondominant aerodynamics, and outperforms
the conventional controller in all directions (
x
,
y
and
z
); (3)
we provide rigorous theoretical analysis of our method and
guarantee the stability of the controller, which also implies
generalization to unseen domains.
Future work includes further generalization of the capabil-
ities of
Neural-Lander
handling unseen state and disturbance
domains even generated by a wind fan array. Another
interesting direction would be to capture a long-term temporal
correlation with RNNs.
ACKNOWLEDGEMENT
The authors thank Joel Burdick, Mory Gharib and Daniel
Pastor Moreno. The work is funded in part by Caltech’s Center
for Autonomous Systems and Technologies and Raytheon
Company.
R
EFERENCES
[1]
I. Cheeseman and W. Bennett, “The effect of ground on a helicopter
rotor in forward flight,” 1955.
[2]
K. Nonaka and H. Sugizaki, “Integral sliding mode altitude control for a
small model helicopter with ground effect compensation,” in
American
Control Conference (ACC), 2011
. IEEE, 2011, pp. 202–207.
[3] L. Danjun, Z. Yan, S. Zongying, and L. Geng, “Autonomous landing
of quadrotor based on ground effect modelling,” in
Control Conference
(CCC), 2015 34th Chinese
. IEEE, 2015, pp. 5647–5652.
[4]
F. Berkenkamp, A. P. Schoellig, and A. Krause, “Safe controller
optimization for quadrotors with Gaussian processes,” in
Proc. of the
IEEE International Conference on Robotics and Automation (ICRA)
,
2016, pp. 493–496. [Online]. Available: https://arxiv.org/abs/1509.
01066
[5]
P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics
through apprenticeship learning,”
The International Journal of Robotics
Research
, vol. 29, no. 13, pp. 1608–1639, 2010.
[6]
A. Punjani and P. Abbeel, “Deep learning helicopter dynamics
models,” in
Robotics and Automation (ICRA), 2015 IEEE International
Conference on
. IEEE, 2015, pp. 3223–3230.
[7]
S. Bansal, A. K. Akametalu, F. J. Jiang, F. Laine, and C. J. Tomlin,
“Learning quadrotor dynamics using neural network for flight control,”
in
Decision and Control (CDC), 2016 IEEE 55th Conference on
. IEEE,
2016, pp. 4653–4660.
[8]
Q. Li, J. Qian, Z. Zhu, X. Bao, M. K. Helwa, and A. P. Schoellig,
“Deep neural networks for improved, impromptu trajectory tracking
of quadrotors,” in
Robotics and Automation (ICRA), 2017 IEEE
International Conference on
. IEEE, 2017, pp. 5183–5189.
[9]
S. Zhou, M. K. Helwa, and A. P. Schoellig, “Design of deep neural
networks as add-on blocks for improving impromptu trajectory tracking,”
in
Decision and Control (CDC), 2017 IEEE 56th Annual Conference
on
. IEEE, 2017, pp. 5201–5207.
[10]
C. S
́
anchez-S
́
anchez and D. Izzo, “Real-time optimal control via deep
neural networks: study on landing problems,”
Journal of Guidance,
Control, and Dynamics
, vol. 41, no. 5, pp. 1122–1135, 2018.
[11]
S. Balakrishnan and R. Weil, “Neurocontrol: A literature survey,”
Mathematical and Computer Modelling
, vol. 23, no. 1-2, pp. 101–
117, 1996.
[12]
M. T. Frye and R. S. Provence, “Direct inverse control using an artificial
neural network for the autonomous hover of a helicopter,” in
Systems,
Man and Cybernetics (SMC), 2014 IEEE International Conference on
.
IEEE, 2014, pp. 4121–4122.
[13]
H. Suprijono and B. Kusumoputro, “Direct inverse control based on
neural network for unmanned small helicopter attitude and altitude
control,”
Journal of Telecommunication, Electronic and Computer
Engineering (JTEC)
, vol. 9, no. 2-2, pp. 99–102, 2017.
[14]
F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe
model-based reinforcement learning with stability guarantees,” in
Proc.
of Neural Information Processing Systems (NIPS)
, 2017. [Online].
Available: https://arxiv.org/abs/1705.08551
[15]
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral
normalization for generative adversarial networks,”
arXiv preprint
arXiv:1802.05957
, 2018.
[16]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in
Advances in neural
information processing systems
, 2012, pp. 1097–1105.
[17]
T. Salimans and D. P. Kingma, “Weight normalization: A simple
reparameterization to accelerate training of deep neural networks,” in
Advances in Neural Information Processing Systems
, 2016, pp. 901–
909.
[18]
P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, “Spectrally-normalized
margin bounds for neural networks,” in
Advances in Neural Information
Processing Systems
, 2017, pp. 6240–6249.
[19]
J. Slotine and W. Li,
Applied Nonlinear Control
. Prentice Hall, 1991.
[20]
S. Bandyopadhyay, S.-J. Chung, and F. Y. Hadaegh, “Nonlinear attitude
control of spacecraft with a large captured object,”
Journal of Guidance,
Control, and Dynamics
, vol. 39, no. 4, pp. 754–769, 2016.
[21]
D. Morgan, G. P. Subramanian, S.-J. Chung, and F. Y. Hadaegh,
“Swarm assignment and trajectory optimization using variable-swarm,
distributed auction assignment and sequential convex programming,”
Int. J. Robotics Research
, vol. 35, no. 10, pp. 1261–1285, 2016.
[22]
D. Mellinger and V. Kumar, “Minimum snap trajectory generation and
control for quadrotors,” in
2011 IEEE International Conference on
Robotics and Automation
, May 2011, pp. 2520–2525.
[23]
H. Khalil,
Nonlinear Systems
, ser. Pearson Education.
Prentice Hall,
2002.
[24]
C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand-
ing deep learning requires rethinking generalization,”
arXiv preprint
arXiv:1611.03530
, 2016.
[25]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE conference on computer vision
and pattern recognition
, 2016, pp. 770–778.
[26]
B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “A pac-
bayesian approach to spectrally-normalized margin bounds for neural
networks,”
arXiv preprint arXiv:1707.09564
, 2017.
[27]
G. K. Dziugaite and D. M. Roy, “Computing nonvacuous generaliza-
tion bounds for deep (stochastic) neural networks with many more
parameters than training data,”
arXiv preprint arXiv:1703.11008
, 2017.
[28]
B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Explor-
ing generalization in deep learning,” in
Advances in Neural Information
Processing Systems
, 2017, pp. 5947–5956.
[29]
S.-J. Chung, S. Bandyopadhyay, I. Chang, and F. Y. Hadaegh, “Phase
synchronization control of complex networks of Lagrangian systems on
adaptive digraphs,”
Automatica
, vol. 49, no. 5, pp. 1148–1161, 2013.
[30]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in
pytorch,” 2017.