of 40
Neural-Fly Enables Rapid Learning
for Agile Flight in Strong Winds
Michael O’Connell
, Guanya Shi
, Xichen Shi,
Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, Soon-Jo Chung
Division of Engineering and Applied Science, California Institute of Technology
The first two authors contributed equally to this article. Alphabetical order.
Corresponding authors. Email: sjchung@caltech.edu
This is the accepted version of Science Robotics Vol. 7, Issue 66, eabm6597 (2022)
DOI: 10.1126/scirobotics.abm6597
Video:
https://youtu.be/TuF9teCZX0U
Data and training code:
https://github.com/aerorobotics/neural-fly
Abstract
Executing safe and precise flight maneuvers in dynamic high-speed winds is important for the on-
going commoditization of uninhabited aerial vehicles (UAVs). However, since the relationship between
various wind conditions and its effect on aircraft maneuverability is not well understood, it is challeng-
ing to design effective robot controllers using traditional control design methods. We present Neural-Fly,
a learning-based approach that allows rapid online adaptation by incorporating pre-trained representa-
tions through deep learning. Neural-Fly builds on two key observations that aerodynamics in different
wind conditions share a common representation and that the wind-specific part lies in a low-dimensional
space. To that end, Neural-Fly uses a proposed learning algorithm, Domain Adversarially Invariant
Meta-Learning (DAIML), to learn the shared representation, only using 12 minutes of flight data. With
the learned representation as a basis, Neural-Fly then uses a composite adaptation law to update a set
of linear coefficients for mixing the basis elements. When evaluated under challenging wind conditions
generated with the Caltech Real Weather Wind Tunnel with wind speeds up to
43
.
6 km
/
h
(
12
.
1 m
/
s
),
Neural-Fly achieves precise flight control with substantially smaller tracking error than state-of-the-art
nonlinear and adaptive controllers. In addition to strong empirical performance, the exponential stability
of Neural-Fly results in robustness guarantees. Finally, our control design extrapolates to unseen wind
conditions, is shown to be effective for outdoor flights with only on-board sensors, and can transfer
across drones with minimal performance degradation.
1 INTRODUCTION
The commoditization of uninhabited aerial vehicles (UAVs) requires that the control of these vehicles be-
come more precise and agile. For example, drone delivery requires transporting goods to a narrow target
area in various weather conditions; drone rescue and search require entering and searching collapsed build-
ings with little space; urban air mobility needs a flying car to follow a planned trajectory closely to avoid
collision in the presence of strong unpredictable winds.
1
arXiv:2205.06908v1 [cs.RO] 13 May 2022
Figure 1:
Agile flight through narrow gates.
(
A
) Caltech Real Weather Wind Tunnel system, the quadrotor
UAV, and the gate. In our flight tests, the UAV follows an agile trajectory through narrow gates, which are
slightly wider than the UAV itself, under challenging wind conditions. (
B-C
) Trajectories used for the gate
tests. In (B), the UAV follows a figure-8 through one gate, with wind speed
3
.
1 m
/
s
or time-varying wind
condition. In (C), the UAV follows an ellipse in the horizontal plane through two gates, with wind speed
3
.
1 m
/
s
. (
D-E
) Long-exposure photos (with an exposure time of
5 s
) showing one lap in two tasks. (
F-I
)
High-speed photos (with a shutter speed of 1/200
s
) showing the moment the UAV passed through the gate
and the interaction between the UAV and the wind.
2
Unmodeled and often complex aerodynamics are among the most notable challenges to precise flight
control. Flying in windy environments (as shown in Fig. 1) introduces even more complexity because of
the unsteady aerodynamic interactions between the drone, the induced airflow, and the wind (see Fig. 1(F)
for a smoke visualization). These unsteady and nonlinear aerodynamic effects substantially degrade the
performance of conventional UAV control methods that neglect to account for them in the control design.
Prior approaches partially capture these effects with simple linear or quadratic air drag models, which limit
the tracking performance in agile flight and cannot be extended to external wind conditions [1, 2]. Although
more complex aerodynamic models can be derived from computational fluid dynamics [3], such modelling
is often computationally expensive, and is limited to steady non-dynamic wind conditions. Adaptive control
addresses this problem by estimating linear parametric uncertainty in the dynamical model in real time to
improve tracking performance. Recent state-of-the-art in quadrotor flight control has used adaptive con-
trol methods that directly estimate the unknown aerodynamic force without assuming the structure of the
underlying physics, but relying on high-frequency and low-latency control [4, 5, 6, 7]. In parallel, there
has been increased interest in data-driven modeling of aerodynamics (e.g., [8, 9, 10, 11]), however exist-
ing approaches cannot effectively adapt in changing or unknown environments such as time-varying wind
conditions.
In this article, we present a data-driven approach called Neural-Fly, which is a deep-learning-based tra-
jectory tracking controller that learns to quickly adapt to rapidly-changing wind conditions. Our method, de-
picted in Fig. 2, advances and offers insights into both adaptive flight control and deep-learning-based robot
control. Our experimental demonstrates that Neural-Fly achieves centimeter-level position-error tracking of
an agile and challenging trajectory in dynamic wind conditions on a standard UAV.
Our method has two main components: an offline learning phase and an online adaptive control phase
used as real-time online learning. For the offline learning phase, we have developed Domain Adversarially
Invariant Meta-Learning (DAIML) that learns a wind-condition-independent deep neural network (DNN)
representation of the aerodynamics in a data-efficient manner. The output of the DNN is treated as a set
of basis functions that represent the aerodynamic effects. This representation is adapted to different wind
conditions by updating a set of linear coefficients that mix the output of the DNN. DAIML is data effi-
cient and uses only 12 total minutes of flight data in just 6 different wind conditions to train the DNN.
DAIML incorporates several key features which not only improve the data efficiency but also are informed
by the downstream online adaptive control phase. In particular, DAIML uses spectral normalization [8, 12]
to control the Lipschitz property of the DNN to improve generalization to unseen data and provide closed-
loop stability and robustness guarantees. DAIML also uses a discriminative network, which ensures that the
learned representation is wind-invariant and that the wind-dependent information is only contained in the
linear coefficients that are adapted in the online control phase.
For the online adaptive control phase, we have developed a regularized composite adaptive control
law, which we derived from a fundamental understanding of how the learned representation interacts with
the closed-loop control system and which we support with rigorous theory. The adaptation law updates the
wind-dependent linear coefficients using a composite of the position tracking error term and the aerodynamic
force prediction error term. Such a principled approach effectively guarantees stable and fast adaptation to
any wind condition and robustness against imperfect learning. Although this adaptive control law could be
used with a number of learned models, the speed of adaptation is further aided by the concise representation
learned from DAIML.
Using Neural-Fly, we report an average improvement of
66 %
over a nonlinear tracking controller,
42 %
over an
L
1
adaptive controller, and
35 %
over an Incremental Nonlinear Dynamics Inversion (INDI) con-
troller. These results are all accomplished using standard quadrotor UAV hardware, while running the PX4’s
3
Velocity
Attitude
PWM
Learned
basis function
net
Drone state
Wind-invariant
representation
Adaptive
control
Wind-specific
linear coefficients
Residual
force prediction
Adaptation
set
Training
set
net
Least
squares
Domain
adversarially
invariant
meta-learning
(DAIML)
SGD
Dataset from
K wind
conditions
A Online adaptation
B Offline meta-learning
C Control diagram
Learned basis
function
Feedforward
Gravity
Feedback
+
Model
kinematics
Flight
control unit
Nominal
dynamics
Online adaptation block
Desired
trajectory
-
Drone state
Residual
force error
Tracking-based adaptation
Prediction-based adaptation
T
racking error
Adaptive
control
+
Tracking-based
adaptation
Prediction-based
adaptation
Vehicle plant
Figure 2:
Offline meta-learning and online adaptive control design.
(
A
) The online adaptation block in
our adaptive controller. Our controller leverages the meta-trained basis function
φ
, which is a wind-invariant
representation of the aerodynamic effects, and uses composite adaptation (that is, including tracking-error-
based and prediction-error-based adaptation) to update wind-specific linear weights
ˆ
a
. The output of this
block is the wind-effect force estimate,
ˆ
f
=
φ
ˆ
a
. (
B
) The illustration of our meta-learning algorithm DAIML.
We collected data from wind conditions
{
w
1
,
···
,w
K
}
and applied Algorithm 1 to train the
φ
net. (
C
) The
diagram of our control method, where the grey part corresponds to (A). Interpreting the learned block as an
aerodynamic force allows it to be incorporated into the feedback control easily.
4
default regulation attitude control. Our tracking performance is competitive even compared to related work
without external wind disturbances and with more complex hardware (for example, [4] requires a 10-time
higher control frequency and onboard optical sensors for direct motor speed feedback). We also com-
pare Neural-Fly with two variants of our method: Neural-Fly-Transfer, which uses a learned representation
trained on data from a different drone, and Neural-Fly-Constant, which only uses our adaptive control law
with a trivial non-learning basis. Neural-Fly-Transfer demonstrates that our method is robust to changes
in vehicle configuration and model mismatch. Neural-Fly-Constant,
L
1
, and INDI all directly adapt to the
unknown dynamics without assuming the structure of the underlying physics, and they have similar perfor-
mance. Furthermore, we demonstrate that our method enables a new set of capabilities that allow the UAV
to fly through low-clearance gates following agile trajectories in gusty wind conditions (Fig. 1).
Related Work for Precise Quadrotor Control
Typical quadrotor control consists of a cascaded or hierarchical control structure which separates the
design of the position controller, attitude controller, and thrust mixer (allocation). Commonly-used off-the-
shelf controllers, such as PX4, design each of these loops as proportional-integral-derivative (PID) regulation
controllers [13]. The control performance can be substantially improved by designing each layer of the
cascaded controller as a tracking controller using the concept of differential flatness [14], or, as has recently
been popular, using a single optimization based controller such as model predictive control (MPC) to directly
compute motor speed commands from desired trajectories. State-of-the-art tracking performance relies on
MPC with fast adaptive inner loops to correct for modeling errors [4, 7], however, this approach requires
full custom flight controllers. In contrast, our method is designed to be integrated with a typical PX4 flight
controller, yet it achieves state-of-the-art flight performance in wind.
Prior work on agile quadrotor control has achieved impressive results by considering aerodynamics [4,
7, 11, 2]. However, those approaches require specialized onboard hardware [4], full custom flight control
stacks [4, 7], or cannot adapt to external wind disturbances [11, 2]. For example, state-of-the-art tracking
performance has been demonstrated using incremental nonlinear dynamics inversion to estimate aerody-
namic disturbance forces, with a root-mean-square tracking error of
6
.
6 cm
and drone ground speeds up
to
12
.
9 m
/
s
[4]. However, [4] relies on high-frequency control updates (
500 Hz
) and direct motor speed
feedback using optical encoders to rapidly estimate external disturbances. Both are challenging to deploy
on standard systems. [7] simplifies the hardware setup and does not require optical motor speed sensors
and has demonstrated state-of-the-art tracking performance. However, [7] relies on a high-rate
L
1
adaptive
controller inside a model predictive controller and uses a racing drone with a fully customized control stack.
[11] leverages an aerodynamic model learned offline and represented as Gaussian Processes. However, [11]
cannot adapt to unknown or changing wind conditions and provides no theoretical guarantees. Another
recent work focuses on deriving simplified rotor-drag models that are differentially flat [2]. However, [2]
focuses on horizontal,
xy
plane trajectories at ground speeds of
4 m
/
s
without external wind, where the
thrust is more constant than ours, achieves
6 cm
tracking error [2], uses an attitude controller running at
4000 Hz
, and is not extensible to faster flights as pointed out in [11].
Relation between Neural-Fly and Conventional Adaptive Control
Adaptive control theory has been extensively studied for online control and identification problems with
parametric uncertainty, for example, unknown linear coefficients for mixing known basis functions [15, 16,
17, 18, 19, 20]. There are three common aspects of adaptive control which must be addressed carefully in
any well-designed system and which we address in Neural-Fly: designing suitable basis functions for online
5
adaptation, stability of the closed-loop system, and persistence of excitation, which is a property related to
robustness against disturbances. These challenges arise due to the coupling between the unknown underlying
dynamics and the online adaptation. This coupling precludes naive combinations of online learning and
control. For example, gradient-based parameter adaptation has well-known stability and robustness issues
as discussed in [15].
The basis functions play a crucial role in the performance of adaptive control, but designing or selecting
proper basis functions might be challenging. A good set of basis functions should reflect important features
of the underlying physics. In practice, basis functions are often designed using physics-informed modeling
of the system, such as the nonlinear aerodynamic modeling in [21]. However, physics-informed modeling
requires a tremendous amount of prior knowledge and human labor, and is often still inaccurate. Another
approach is to use random features as the basis set, such as random Fourier features [22, 23], which can
model all possible underlying physics as long as the number of features is large enough. However, the
high-dimensional feature space is not optimal for a specific system because many of the features might be
redundant or irrelevant. Such suboptimality and redundancy not only increase the computational burden but
also slow down the convergence speed of the adaptation process.
Given a set of basis functions, naive adaptive control designs may cause instability and fragility in the
closed-loop system, due to the nontrivial coupling between the adapted model and the system dynamics.
In particular, asymptotically stable adaptive control cannot guarantee robustness against disturbances and
so exponential stability is desired. Even so, often, existing adaptive control methods only guarantee expo-
nential stability when the desired trajectory is persistently exciting, by which information about all of the
coefficients (including irrelevant ones) is constantly provided at the required spatial and time scales. In prac-
tice, persistent excitation requires either a succinct set of basis functions or perturbing the desired trajectory,
which compromises tracking performance.
Recent multirotor flight control methods, including INDI [4] and
L
1
adaptive control, presented in [5]
and demonstrated inside a model predictive control loop in [7], achieve good results by abandoning complex
basis functions. Instead, these methods directly estimate the aerodynamic residual force vector. The residual
force is observable, thus, these methods bypass the challenge of designing good basis functions and the
associated stability and persistent excitation issues. However, these methods suffer from lag in estimating
the residual force and encounter the the filter design performance trade of reduced lag versus amplified
noise. Neural-Fly-Constant only uses Neural-Fly’s composite adaptation law to estimate the residual force,
and therefore, Neural-Fly-Constant also falls into this class of adaptive control structures. The results of this
article demonstrate that the inherent estimation lag in these existing methods limits performance on agile
trajectories and in strong wind conditions.
Neural-Fly solves the aforementioned issues of basis function design and adaptive control stability, us-
ing newly developed methods for meta-learning and composite adaptation that can be seamlessly integrated
together. Neural-Fly uses DAIML and flight data to learn an effective and compact set of basis functions,
represented as a DNN. The regularized composite adaptation law uses the learned basis functions to quickly
respond to wind conditions. Neural-Fly enjoys fast adaptation because of the conciseness of the feature
space, and it guarantees closed-loop exponential stability and robustness without assuming persistent exci-
tation.
Related to Neural-Fly, neural network based adaptive control has been researched extensively, but by and
large was limited to shallow or single-layer neural networks without pretraining. Some early works focus
on shallow or single-layer neural networks with unknown parameters which are adapted online [19, 24, 25,
26, 27]. A recent work applies this idea to perform an impressive quadrotor flip [28]. However, the existing
neural network based adaptive control work does not employ multi-layer DNNs, and lacks a principled
6
and efficient mechanism to pretrain the neural network before deployment. Instead of using shallow neural
networks, recent trends in machine learning highly rely on DNNs due to their representation power [29]. In
this work, we leverage modern deep learning advances to pretrain a DNN which represents the underlying
physics compactly and effectively.
Related Work in Multi-environment Deep Learning for Robot Control
Recently, researchers have been addressing the data and computation requirements for DNNs to help the
field progress towards the fast online-learning paradigm. In turn, this progress has been enabling adaptable
DNN-based control in dynamic environments. The most popular learning scheme in dynamic environments
is meta-learning, or “learning-to-learn”, which aims to learn an efficient model from data across different
tasks or environments [30, 31, 32]. The learned model, typically represented as a DNN, ideally should be
capable of rapid adaptation to a new task or an unseen environment given limited data. For robotic appli-
cations, meta-learning has shown great potential for enabling autonomy in highly-dynamic environments.
For example, it has enabled quick adaptation against unseen terrain or slopes for legged robots [33, 34],
changing suspended payload for drones [35], and unknown operating conditions for wheeled robots [36].
In general, learning algorithms typically can be decomposed into two phases: offline learning and online
adaptation. In the offline learning phase, the goal is to learn a model from data collected in different envi-
ronments, such that the model contains shared knowledge or features across all environment, for example,
learning aerodynamic features shared by all wind conditions. In the online adaptation phase, the goal is
to adapt the offline-learned model, given limited online data from a new environment or a new task, for
example, fine tuning the aerodynamic features in a specific wind condition.
There are two ways that the offline-learned model can be adapted. In the first class, the adaptation phase
adapts the whole neural network model, typically using one or more gradient descent steps [30, 33, 35, 37].
However, due to the notoriously data-hungry and high-dimensional nature of neural networks, for real-world
robots it is still impossible to run such adaptation on-board as fast as the feedback control loop (e.g.,
100Hz
for quadrotor). Furthermore, adapting the whole neural network often lacks explainability and robustness
and could generate unpredictable outputs that make the closed-loop unstable.
In the second class (including Neural-Fly), the online adaptation only adapts a relatively small part of
the learned model, for example, the last layer of the neural network [38, 36, 39, 40]. The intuition is that,
different environments share a common representation (e.g., the wind-invariant representation in Fig. 2(A)),
and the environment-specific part is in a low-dimensional space (e.g., the wind-specific linear weight in
Fig. 2(A)), which enables the real-time adaptation as fast as the control loop. In particular, the idea of inte-
grating meta-learning with adaptive control is first presented in our prior work [38], later followed by [39].
However, the representation learned in [38] is ineffective and the tracking performance in [38] is similar as
the baselines; [39] focuses on a planar and fully-actuated rotorcraft simulation without experiment valida-
tion and there is no stability or robustness analysis. Neural-Fly instead learns an effective representation
using a our meta-learning algorithm called DAIML, demonstrates state-of-the-art tracking performance on
real drones, and achieves non-trivial stability and robustness guarantees.
Another popular deep-learning approach for control in dynamic environments is robust policy learning
via domain randomization [41, 42, 43]. The key idea is to train the policy with random physical parame-
ters such that the controller is robust to a range of conditions. For example, the quadrupedal locomotion
controller in [41] retains its robustness over challenging natural terrains. However, robust policy learning
optimizes average performance under a broad range of conditions rather than achieving precise control by
adapting to specific environments.
7