Neural-Swarm: Decentralized Close-Proximity Multirotor Control
Using Learned Interactions
Guanya Shi, Wolfgang H
̈
onig, Yisong Yue, and Soon-Jo Chung
Abstract
— In this paper, we present
Neural-Swarm
, a non-
linear decentralized stable controller for close-proximity flight
of multirotor swarms. Close-proximity control is challenging
due to the complex aerodynamic interaction effects between
multirotors, such as downwash from higher vehicles to lower
ones. Conventional methods often fail to properly capture these
interaction effects, resulting in controllers that must maintain
large safety distances between vehicles, and thus are not capable
of close-proximity flight. Our approach combines a nominal
dynamics model with a regularized permutation-invariant Deep
Neural Network (DNN) that accurately learns the high-order
multi-vehicle interactions. We design a stable nonlinear tracking
controller using the learned model. Experimental results demon-
strate that the proposed controller significantly outperforms a
baseline nonlinear tracking controller with up to four times
smaller worst-case height tracking errors. We also empirically
demonstrate the ability of our learned model to generalize to
larger swarm sizes.
I. I
NTRODUCTION
The ongoing commoditization of unmanned aerial vehicles
(UAVs) is propelling interest in advanced control methods for
large aerial swarms [1], [2]. Potential applications are plen-
tiful, including manipulation, search, surveillance, mapping,
amongst many others. Many settings require the UAVs to fly in
close proximity to each other, also known as dense formation
control. For example, consider a search-and-rescue mission
where the aerial swarm must enter and search a collapsed
building. In such scenarios, close-proximity flight enables
the swarm to navigate the building much faster compared to
swarms that must maintain large distances from each other.
A major challenge of close-proximity control is that the
small distance between UAVs creates complex aerodynamic
interactions. For instance, one multirotor flying above another
causes the so-called downwash effect on the lower one, which
is difficult to model using conventional approaches [3]. In lieu
of better downwash interaction modeling, one must require
a large safety distance between vehicles, e.g.,
60 cm
for
the small Crazyflie 2.0 quadrotor (
9 cm
rotor-to-rotor) [4].
However, the downwash for two Crazyflie quadrotors hovering
30 cm
on top of each other is only
−9 g
, which is well within
their thrust capabilities, and suggests that proper modeling
of downwash and other interaction effects can lead to more
precise dense formation control.
Demo videos:
https://youtu
.
be/v4j-9pH11Q8
The authors are with California Institute of Technology, USA.
{
gshi,
whoenig, yyue, sjchung
}
@caltech.edu
.
We thank Michael O’Connell and Xichen Shi for helpful discussions and
Anya Vinogradsky for help with the firmware implementation.
The work is funded in part by Caltech’s Center for Autonomous Systems
and Technologies (CAST) and the Raytheon Company.
Fig. 1. We learn complex interaction between multirotors using regularized
deep sets and design nonlinear stable controller for close-proximity flight.
In this paper, we propose a learning-based controller,
Neural-Swarm
, to improve the precision of close-proximity
control of homogeneous multirotor swarms. In particular,
we train a regularized permutation-invariant deep neural
network (DNN) to predict the residual interaction forces
not captured by nominal models of free-space aerodynamics.
The DNN only requires relative positions and velocities
of neighboring multirotors as inputs, similar to existing
collision-avoidance techniques [5], which enables a fully
decentralized computation. We use the predicted interaction
forces as a feed-forward term in the multirotors’ position
controller, which enables close-proximity flight. Our solution
is computationally efficient and can run in real-time on a
small 32-bit microcontroller. We validate our approach on
different tasks using two to five quadrotors. To our knowledge,
our approach is the first that models interactions between
more than two multirotor vehicles.
From a learning perspective, we leverage two state-of-the-
art tools to arrive at effective DNN models. The first is spectral
normalization [6], which ensures the DNN is Lipschitz
continuous. As in our prior work [7], Lipschitz continuity
enables us to derive stability guarantees, and also helps the
DNN generalize well on test examples that lie outside the
training set. We also employ deep sets [8] to encode multi-
vehicle interactions in an index-free or permutation-invariant
manner, enabling better generalization to new formations and
varying number of vehicles.
Related Work:
The use of DNNs to learn higher-order
residual dynamics or control outputs is becoming increasingly
common across a range of control and reinforcement learning
settings [7], [9]–[14]. The closest approach to ours is the
Neural Lander [7], which uses a DNN to capture the
arXiv:2003.02992v1 [cs.RO] 6 Mar 2020
interaction between a single UAV and the ground, i.e., the
well-studied ground effect [3], [15], [16]. In contrast, our work
focuses on learning inter-vehicle aerodynamic interactions
between several multirotors.
The interaction between two rotor blades has been studied
in a lab setting to optimize the placement of rotors on
a multirotor [17]. However, it remains an open question
how this influences the flight of two or more multirotors in
close proximity. Interactions between two multirotors can
be estimated using a propeller velocity field model [18].
Unfortunately, this method is hard to generalize to the multi-
robot case and this method only considers the stationary
case, which will not work for many scenarios like swapping
in Fig. 1. We instead use a learning-based method that can
directly estimate the interaction forces of multiple neighboring
robots from training data.
For motion planning, empirical models have been used
to avoid harmful interactions [2], [19]–[21]. Typical safe
interaction shapes are ellipsoids or cylinders and such models
work for homogeneous and heterogeneous multirotor teams.
Estimating such shapes requires potentially dangerous flight
tests and the shapes are in general conservative. In contrast,
we use learning to estimate the interaction forces accurately
and use those forces in the controller to improve trajectory
tracking performance in close-proximity flight. The learned
forces can potentially be used for motion planning as well.
II. P
ROBLEM
S
TATEMENT
: S
WARM
I
NTERACTIONS
A. Single Multirotor Dynamics
A single multirotor’s state comprises of the global position
p
∈
R
3
, global velocity
v
∈
R
3
, attitude rotation matrix
R
∈
SO(3)
, and body angular velocity
ω
∈
R
3
. We consider
the following dynamics:
̇
p
=
v
,
m
̇
v
=
m
g
+
R
f
u
+
f
a
,
(1a)
̇
R
=
RS
(
ω
)
, J
̇
ω
=
J
ω
×
ω
+
τ
u
+
τ
a
,
(1b)
where
m
and
J
are the mass and inertia matrix of the
system, respectively;
S
(
·
)
is a skew-symmetric mapping;
g
=
[0
,
0
,
−
g
]
>
is the gravity vector; and
f
u
= [0
,
0
,T
]
>
and
τ
u
=
[
τ
x
,τ
y
,τ
z
]
>
are the total thrust and body torques from the
rotors, respectively. The output wrench
η
= [
T,τ
x
,τ
y
,τ
z
]
>
is linearly related to the control input
η
=
B
0
u
, where
u
=
[
n
2
1
,n
2
2
,...,n
2
k
]
>
is the squared motor speeds for a vehicle
with
k
rotors and
B
0
is the actuation matrix. The key difficulty
stems from disturbance forces
f
a
= [
f
a,x
,f
a,y
,f
a,z
]
>
and
disturbance torques
τ
a
= [
τ
a,x
,τ
a,y
,τ
a,z
]
>
, generated by
other multirotors.
B. Swarm Dynamics
Consider
n
homogeneous multirotors. To simplify notations,
we use
x
(
i
)
= [
p
(
i
)
;
v
(
i
)
;
R
(
i
)
;
ω
(
i
)
]
to denote the state of the
i
th
multirotor. Then (1) can be simplified as:
̇
x
(
i
)
=
f
(
x
(
i
)
,
u
(
i
)
) +
0
f
(
i
)
a
0
τ
(
i
)
a
, i
= 1
,
···
,n,
(2)
where
f
(
x
(
i
)
,
u
(
i
)
)
is the nominal dynamics and
f
(
i
)
a
and
τ
(
i
)
a
are unmodeled force and torque from interactions between
other multirotors.
We use
x
(
ij
)
to denote the relative state component between
robot
i
and
j
, e.g.,
x
(
ij
)
= [
p
(
j
)
−
p
(
i
)
;
v
(
j
)
−
v
(
i
)
]
. For robot
i
, the unmodeled force and torque in (2) are functions of
relative states to its neighbors,
f
(
i
)
a
=
f
a
(
N
(
i
)
)
and
τ
(
i
)
a
=
τ
a
(
N
(
i
)
)
,
(3)
where
N
(
i
)
=
{
x
(
ij
)
|
j
∈
neighbor
(
i
)
}
is the set of the
relative states of the neighbors of
i
. Note that here we assume
the swarm system is homogeneous, i.e., each robot has the
same functions
f
,
f
a
, and
τ
a
.
C. Problem Statement & Approach
We aim to improve the control performance of a multirotor
swarm during close formation flight, by learning the unknown
interaction terms
f
a
and
τ
a
. Here, we focus on the position
dynamics (1a) so
f
a
is our primary concern.
We first approximate
f
a
using a permutation invariant deep
neural network (DNN), and then incorporate the DNN in our
exponentially-stabilizing controller. Training is done offline,
and the learned interaction dynamics model is applied in the
on-board controller in real-time.
III. L
EARNING
A
PPROACH
We employ state-of-the-art deep learning methods to
capture the unknown (or residual) multi-vehicle interaction
effects. In particular, we require that the deep neural nets
(DNNs) have strong Lipschitz properties (for stability analy-
sis), can generalize well to new test cases, and use compact
encodings to achieve high computational and statistical
efficiency. To that end, we employ deep sets [8] and spectral
normalization [6] in conjunction with a standard feed-forward
neural architecture.
1
A. Permutation-Invariant Neural Networks
The permutation-invariant aspect of the interaction term
f
a
(3) can be characterized as:
f
a
(
N
(
i
)
) =
f
a
(
π
(
N
(
i
)
))
,
(4)
for any permutation
π
. Since our goal is to learn the function
f
a
using DNNs, we need to guarantee that the learned DNN
is permutation-invariant. The following lemma (a corollary of
Theorem 7 in [8]) gives the necessary and sufficient condition
for a DNN to be permutation-invariant.
Lemma 1 (adapted from Thm 7 in [8]):
A
continuous
function
h
(
{
x
1
,
···
,
x
k
}
)
, with
x
i
∈
[
x
min
,
x
max
]
n
, is
permutation-invariant if and only if it is decomposable into
ρ
(
∑
k
i
=1
φ
(
x
i
))
, for some functions
φ
and
ρ
.
The proof from [8] is highly non-trivial and only holds
for a fixed number of vehicles
k
. Furthermore, their proof
1
An alternative approach is to discretize the input space and employ
convolutional neural networks (CNNs), which also yields a permutation-
invariant encoding. However, CNNs suffer from two limitations: 1) they
require much more training data and computation; and 2) they are restricted
to a pre-determined resolution and input domain.
technique (which is likely loose) involves a large expansion
in the intrinsic dimensionality (specifically
ρ
) compared to
the dimensionality of
h
. We will show in our experiments
that
ρ
and
φ
can be learned using relatively compact DNNs,
and can generalize well to larger swarms.
Lemma 1 implies we can consider the following “deep
sets” [8] architecture to approximate
f
a
:
f
(
i
)
a
=
f
a
(
N
(
i
)
)
≈
ρ
∑
x
(
ij
)
∈N
(
i
)
φ
(
x
(
ij
)
,
θ
φ
)
,
θ
ρ
=
ˆ
f
(
i
)
a
,
(5)
where
φ
(
·
)
and
ρ
(
·
)
are two DNNs, and
θ
φ
and
θ
ρ
are their
corresponding parameters. The output of
φ
is a hidden state
to represent “contributions” from each neighbor, and
ρ
is a
nonlinear mapping from the summation of these hidden states
to the total effect. The advantages of this approach are:
•
Representation ability.
Since Lemma 1 is necessary
and sufficient, we do not lose approximation power by
using this constrained framework. We demonstrate strong
empirical performance using relatively compact DNNs
for
ρ
and
φ
.
•
Computational and sampling efficiency and scala-
bility.
Since the input dimension of
φ
(
·
)
is always
the same as the single vehicle case, the feed-forward
computational complexity of
(5)
grows linearly with
the number of neighboring vehicles. Moreover, given
training data from
n
vehicles, under the homogeneous
dynamics assumption, we can reuse the data
n
times.
In practice, we found that a few minutes flight data is
sufficient to accurately learn interactions between two
to five multirotors.
•
Generalization to varying swarm size.
Given learned
φ
(
·
)
and
ρ
(
·
)
, (5) can be used to predict interactions
for any swarm size. In other words, a model trained on
swarms of a certain size may also accurately model
(slightly) larger swarms. In practice, we found that
trained with data from three multirotor swarms, our
model can give good predictions for five multirotor
swarms.
B. Spectral Normalization for Robustness and Generalization
To improve robustness and generalization of DNNs, we use
spectral normalization [6] for training optimization. Spectral
normalization stabilizes DNN training by constraining its
Lipschitz constant. Spectrally normalized DNNs have been
shown to generalize well, which is an indication of stability
in machine learning. Spectrally normalized DNNs have also
been shown to be robust, which can be used to provide
control-theoretic stability guarantees [7], [22].
Mathematically, the Lipschitz constant of a function
‖
g
‖
Lip
is defined as the smallest value such that:
∀
x
,
x
′
:
‖
g
(
x
)
−
g
(
x
′
)
‖
2
/
‖
x
−
x
′
‖
2
≤‖
g
‖
Lip
.
Let
g
(
x
,
θ
)
be a ReLU DNN parameterized by the DNN
weights
θ
=
W
1
,
···
,W
L
+1
:
g
(
x
,
θ
) =
W
L
+1
σ
(
W
L
σ
(
···
σ
(
W
1
x
)
···
))
,
(6)
where the activation function
σ
(
·
) = max(
·
,
0)
is called
the element-wise ReLU function. In practice, we apply the
spectral normalization to the weight matrices in each layer
after each batch gradient descent as follows:
W
i
←
W
i
/
‖
W
i
‖
2
·
γ
1
L
+1
,i
= 1
,
···
,L
+ 1
,
(7)
where
‖
W
i
‖
2
is the maximum singular value of
W
i
and
γ
is
a hyperparameter. With (7),
‖
g
‖
Lip
will be upper bounded by
γ
. Since spectrally normalized
g
is
γ
−
Lipschitz continuous,
it is robust to noise
∆
x
, i.e.,
‖
g
(
x
+ ∆
x
)
−
g
(
x
)
‖
2
is always
bounded by
γ
‖
∆
x
‖
2
. In this paper, we apply the spectral
normalization on both the
φ
(
·
)
and
ρ
(
·
)
DNNs in (5).
C. Data Collection
Learning a DNN to approximate
f
a
requires collecting close
formation flight data. However, the downwash effect causes
the nominally controlled multirotors (without compensation
for the interaction forces) to move apart from each other,
see Fig. 1. Thus, we use a cumulative/curriculum learning
approach: first, we collect data for two multirotors without a
DNN and learn a model. Second, we repeat the data collection
using our learned model as feed-forward term, which allows
closer-proximity flight of the two vehicle. Third, we repeat
the procedure with increasing number of vehicles, using the
current best model.
Note that our data collection and learning are independent
of the controller used and independent of the
f
a
compensation.
In particular, if we actively compensate for a learned
f
a
, this
will only affect
f
u
in (1a) and not the observed
f
a
.
IV. N
ONLINEAR
D
ECENTRALIZED
C
ONTROLLER
D
ESIGN
Our
Neural-Swarm
controller is a nonlinear feedback
linearization controller using the learned interaction term
ˆ
f
(
i
)
a
. Note that
Neural-Swarm
is decentralized, since
ˆ
f
(
i
)
a
is
a function of the neighbor set,
N
(
i
)
, of vehicle
i
. Moreover,
the computational complexity of
ˆ
f
(
i
)
a
(
N
(
i
)
)
grows linearly as
the size of
N
(
i
)
, since we employ deep sets to encode
ˆ
f
(
i
)
a
.
A. Reference Trajectory Tracking
Similar to [7], we employ an integral controller that
accounts for the predicted residual dynamics, which in our
case are the multi-vehicle interaction effects. For vehicle
i
,
we define the position tracking error as
̃
p
(
i
)
=
p
(
i
)
−
p
(
i
)
d
and the composite variable
s
(
i
)
as:
s
(
i
)
=
̇
̃
p
(
i
)
+ 2Λ
̃
p
(
i
)
+ Λ
2
∫
̃
p
(
i
)
dt
=
̇
p
(
i
)
−
v
(
i
)
r
,
(8)
where
v
(
i
)
r
=
̇
p
(
i
)
d
−
2Λ
̃
p
(
i
)
−
Λ
2
∫
̃
p
(
i
)
dt
is the reference
velocity. We design the total desired rotor force
f
(
i
)
d
as:
f
(
i
)
d
=
m
̇
v
(
i
)
r
−
K
s
(
i
)
−
m
g
−
ˆ
f
(
i
)
a
,
ˆ
f
(
i
)
a
=
ρ
(
∑
x
(
ij
)
∈N
(
i
)
φ
(
x
(
ij
)
,
θ
φ
)
,
θ
ρ
)
.
(9)
Note that the position control law in (9) is decentralized,
because we only consider the relative states
x
(
ij
)
∈N
(
i
)
in
the controller.
Using
f
(
i
)
d
, the desired total thrust
T
(
i
)
d
and desired attitude
R
(
i
)
d
can be easily computed [23]. Given
R
(
i
)
d
, we can
use any attitude controller to compute
τ
(
i
)
d
, for example
robust nonlinear tracking control with global exponential
stability [23], or geometric tracking control on
SE(3)
[24].
From this process, we get
η
(
i
)
d
= [
T
(
i
)
d
;
τ
(
i
)
d
]
, and then the
desired control signal of each vehicle is
u
(
i
)
d
=
B
†
0
η
(
i
)
d
, which
can be computed in a decentralized manner for each vehicle.
B. Nonlinear Stability and Robustness Analysis
Note that since
‖
ˆ
f
(
i
)
a
−
f
(
i
)
a
‖ 6
= 0
, we can not guarantee
the tracking error
̃
p
(
i
)
→
0
. However, under some mild
assumptions, we can guarantee input-to-state stability (ISS)
using exponential stability [25] for all the vehicles.
Assumption 1:
The desired position trajectory
p
(
i
)
d
,
̇
p
(
i
)
d
,
and
̈
p
(
i
)
d
are bounded for all
i
.
Assumption 2:
Define the learning error as
ˆ
f
(
i
)
a
−
f
(
i
)
a
, with
two components:
ˆ
f
(
i
)
a
−
f
(
i
)
a
=
(
i
)
0
+
(
i
)
(
t
)
, where
(
i
)
0
is
some constant bias and
(
i
)
(
t
)
is a time-varying term. We
assume that for vehicle
i
,
‖
̇
(
i
)
(
t
)
‖
is upper bounded by
d
(
i
)
m
.
Theorem 2:
Under Assumptions 1 and 2, for vehicle
i
,
for some desired trajectory
p
(
i
)
d
(
t
)
, (9) achieves exponential
convergence of the tracking error to an error ball:
lim
t
→∞
‖
̃
p
(
i
)
(
t
)
‖
=
d
(
i
)
m
λ
2
min
(Λ)
λ
min
(
K
)
.
(10)
Proof:
For vehicle
i
, consider the Lyapunov function
V
(
i
)
=
1
2
m
‖
̇
s
(
i
)
‖
2
. With controller (9), we get the time-
derivative of
V
:
̇
V
(
i
)
=
̇
s
(
i
)
>
(
−
K
̇
s
(
i
)
+
̇
(
i
)
)
≤−
̇
s
(
i
)
>
K
̇
s
(
i
)
+
‖
̇
s
(
i
)
‖
d
(
i
)
m
.
Using
V
(
i
)
=
1
2
m
‖
̇
s
(
i
)
‖
2
, we have
̇
V
(
i
)
≤−
λ
min
(
K
)
2
V
(
i
)
m
+
√
2
V
(
i
)
m
d
(
i
)
m
.
(11)
Using the Comparison Lemma [26], we obtain
∥
∥
∥
̇
s
(
i
)
(
t
)
∥
∥
∥
≤
∥
∥
∥
̇
s
(
i
)
(0)
∥
∥
∥
exp
(
−
λ
min
(
K
)
m
·
t
)
+
d
(
i
)
m
λ
min
(
K
)
.
Note that
̇
s
(
i
)
=
̈
̃
p
(
i
)
+ 2Λ
̇
̃
p
(
i
)
+ Λ
2
̃
p
(
i
)
, and the hi-
erarchical combination between
̇
s
(
i
)
and
̃
p
(
i
)
results in
lim
t
→∞
‖
̃
p
(
i
)
‖
= lim
t
→∞
‖
̇
s
(
i
)
‖
/λ
2
min
(Λ)
, yielding (10).
V. E
XPERIMENTS
We use a slightly modified Crazyflie 2.0 (CF) as our quadro-
tor platform, a small (
9 cm
rotor-to-rotor) and lightweight
(
34 g
) product that is commercially available. We use the
Crazyswarm [27] package to control multiple Crazyflies si-
multaneously. Each quadrotor is equipped with four reflective
markers for pose tracking at 100 Hz using a motion capture
system. The nonlinear controller, extended Kalman filter, and
neural network evaluation are running on-board the STM32
microcontroller.
For data collection, we use the uSD card extension board
and store binary encoded data roughly every
10 ms
. Each
dataset is timestamped using the on-board microsecond timer
and the clocks are synchronized before takeoff using broadcast
radio packets. The drift of the clocks of different Crazyflies
can be ignored for our short flight times (less than 2 min).
A. Calibration and System Identification
Prior to learning the residual term
f
a
, we first calibrate
the nominal dynamics model
f
(
x
,
u
)
. We found that existing
motor thrust models [28], [29] are not very accurate, because
they only consider a single motor and ignore the effect
of the battery state of charge. We calibrate each Crazyflie
by mounting the whole quadrotor on a
100 g
load cell
which is directly connected to a custom extension board. We
collect the current battery voltage, PWM signals (identical
for all 4 motors), and measured force from the load cell
for various motor speeds. We use this data to find two
polynomial functions. The first computes the PWM signal
given the current battery voltage and desired force. The second
computes the maximum achievable force, given the current
battery voltage. This second function is important for thrust
mixing when motors are saturated [30].
We notice that the default motors and propellers can only
produce a total force of about
48 g
with a full battery, resulting
in a best-case thrust-to-weight ratio of 1.4. Thus, we replaced
the motors with more powerful ones (that have the same
physical dimensions) to improve the best-case thrust-to-weight
ratio to 2.6. We use the remaining parameters (
J
, thrust-to-
torque ratio) from the existing literature [29].
B. Data Collection and Learning
We utilize two types data collection tasks: random walk and
swapping. For random walk, we implement a simple reactive
collision avoidance approach based on artificial potentials
on-board each Crazyflie [31]. The host computer randomly
selects new goal points within a small cube for each vehicle in
a fixed frequency. Those goal points are used as an attractive
force, while neighboring vehicles contribute a repulsive force.
For swapping, we place vehicles in different horizontal planes
on a cylinder and let them move to the opposite side. All
vehicles are vertically aligned for one time instance, causing a
large interaction force, see Fig. 1, 2, and 4 for examples with
two, three, and four vehicles. The random walk data helps
us to explore the whole space quickly, while the swapping
data ensures that we have data for a specific task of interest.
For both task types, we varied the scenarios from two to four
vehicles, and collected one minute of data for each scenario.
To learn the interaction function
f
a
(
N
(
i
)
)
, we collect the
timestamped states
x
(
i
)
= [
p
(
i
)
;
v
(
i
)
;
̇
v
(
i
)
;
R
(
i
)
;
f
(
i
)
u
]
for each
vehicle
i
. We then compute
y
(
i
)
as the observed value of
f
a
(
N
(
i
)
)
. We compute
f
(
i
)
a
=
f
a
(
N
(
i
)
)
using
m
̇
v
(
i
)
=
m
g
+
R
(
i
)
f
(
i
)
u
+
f
(
i
)
a
in (1a), where
f
(
i
)
u
is calculated based on our
system identification in Sec. V-A. Our training data consists
of sequences of
(
N
(
i
)
,
y
(
i
)
)
pairs, where
N
(
i
)
=
{
x
(
ij
)
|
j
∈
neighbor
(
i
)
}
is the set of the relative states of the neighbors
of
i
. In practice, we compute the relative states from our
collected data as
x
(
ij
)
= [
p
(
j
)
−
p
(
i
)
;
v
(
j
)
−
v
(
i
)
]
∈
R
6
(i.e.,
relative global position and relative global velocity), since
the attitude information
R
and
ω
are not dominant for
f
a
.
TABLE I
M
AXIMUM
z
-
ERROR
(
IN METERS
)
FOR VARYING SWARM SIZE SWAPPING
TASKS AND NEURAL NETWORKS
. T
RAINING ON MORE VEHICLES LEADS
TO THE BEST OVERALL PERFORMANCE FOR ALL SWARM SIZES
.
Controller
Flight test
2 CF
Swap
3 CF
Swap
4 CF
Swap
5 CF
Swap
Baseline
0.094
0.139
0.209
0.314
Trained w/ 2 CF
0.027
0.150
0.294
N.A.
Trained w/ 3 CF
0.026
0.082
0.140
0.159
Trained w/ 4 CF
0.024
0.061
0.102
0.150
In this work, we only learn the
z
component of
f
a
since we
found the other two components,
x
and
y
, are very small,
and do not significantly alter the nominal dynamics.
Since our swarm is homogeneous, each vehicle has the
same function
f
a
. Thus, we stack all the vehicle’s data and
train on them together, which implies more training data
overall for larger swarms. Let
D
(
i
)
denote the training data
of vehicle
i
, where the input-output pair is
(
N
(
i
)
,
y
(
i
)
)
. We
use the ReLU network class for both
φ
and
ρ
neural networks
and our training loss is:
∑
i
∑
D
(
i
)
∥
∥
∥
ρ
(
∑
x
(
ij
)
∈N
(
i
)
φ
(
x
(
ij
)
,
θ
φ
)
,
θ
ρ
)
−
y
(
i
)
∥
∥
∥
2
2
,
(12)
where
θ
φ
and
θ
ρ
are neural network weights to be learned.
Our
φ
DNN has four layers with architecture
6
→
25
→
40
→
40
→
40
, and our
ρ
DNN also has four layers,
with architecture
40
→
40
→
40
→
40
→
1
. We use
PyTorch [32] for training and implementation of spectral
normalization (see Sec. III-B) of
φ
and
ρ
. We found that
spectral normalization is in particular important for the
small Crazyflie quadrotors, because their IMUs are directly
mounted on the PCB frame causing more noisy measurements
compared to bigger quadrotors.
Using the learned weights
θ
φ
and
θ
ρ
, we generate C-
code to evaluate both networks efficiently on-board the
quadrotor, similar to prior work [33]. The STM32
168 MHz
microcontroller can evaluate each of the networks in about
550 μs
. Thus, we can compute
f
a
in less than
4 ms
for 6 or
less neighbors, which is sufficient for real-time operations.
C. Neural-Swarm Control Performance
We study the performance and generalization of different
controllers on a swapping task using varying number of
quadrotors. An example of our swapping task for two vehicles
is shown in Fig. 1. The swapping task for multiple vehicles
causes them to align vertically at one point in time with
vertical distances of
0.2 m
to
0.25 m
between neighbors.
This task is challenging, because: i) the lower vehicles
experience downwash from multiple vehicles flying above;
ii) the different velocity vectors of each vehicle creates
interesting effects, including an effect where
f
a
is positive for
a short period of time (see Fig. 3(b) for an example); and iii)
for the case with more than two vehicles, the aerodynamic
effect is not a simple superposition of each pair (see Fig. 3(c-f)
for examples).
We use the following four controllers: 1) The baseline
controller uses our position tracking controller (9) with
Fig. 2. Three vehicles moving using different control policies (corresponding
to Table I, column 2). Each quadrotor flies at a different fixed height (
25 cm
vertical separation) and swaps sides such that all vehicles align vertically at
t
= 2 s
and
t
= 6
.
5 s
. Our approach trained on 3 or 4 vehicles controls the
height much better than the baseline approach.
ˆ
f
(
i
)
a
≡
0
,
∀
i
and a nonlinear attitude tracking controller [24];
2) – 4) The same controller with the same gains, but
ˆ
f
(
i
)
a
computed using different neural networks (trained on data
flying 2, 3, and 4 quadrotors, respectively.) Note that all
controllers, including the baseline controller, always have
integral control compensation parts. Though an integral gain
can cancel steady-state error during set-point regulation, it
can struggle with complex time-variant interactions between
vehicles. This issue is also reflected in the tracking error
bound in Theorem 2. In Theorem 2, the tracking error will
converge to
d
(
i
)
m
/λ
2
min
(Λ)
λ
min
(
K
)
. For our baseline we have
d
(
i
)
m
= sup
t
‖
d
dt
f
(
i
)
a
(
t
)
‖
, which means if
f
(
i
)
a
is changing fast
as in the swapping task, our baseline will not perform well.
We repeat the swapping task for each controller six times,
and report the maximum
z
-error that occurred for any vehicle
over the whole flight. We also verified that the
x
- and
y
-error
distributions are similar across the different controllers and
do not report those numbers for brevity.
Results.
Our results, described in Table I, show three
important results: i) our controller successfully reduces the
worst-case
z
-error by a factor of two to four (e.g.,
2.4 cm
instead of
9.4 cm
for the two vehicle case); ii) our controller
successfully generalizes to cases with more vehicles when
trained with at least three vehicles (e.g., the controller
trained with three quadrotors significantly improves flight
performance even when flying five quadrotors); and iii) our
controllers do not marginalize small-vehicle cases (e.g., the
controller trained with four quadrotors works very well for
the two-vehicle case). The observed maximum
z
-error for
the test cases with three to five quadrotors is larger compared
to the two-vehicle case because we occasionally saturate the
motors during flight.