of 23
Machine learning reveals the control
mechanics of an insect wing hinge
In the format provided by the
authors and unedited
Nature | www.nature.com/nature
Supplementary information
https://doi.org/10.1038/s41586-024-07293-4
Supplementary Information
Initial prediction of wing pose using a convolutional neural network (Flynet)
Our pose estimation strategy required constructing an anatomically accurate 3D model of a fly that we would
fit, frame-by-frame, to the data in our three high speed video sequences. We based our 3D model from images
of a tethered flying fly taken at different angles, from which we extracted aspect ratios and curvatures of the
head, thorax, and abdomen. These 2D contours were converted into 3D surfaces via Non-Uniform Rational
B-splines (NURBS)
71
. Like B-splines, NURBS surfaces can be manipulated by changing the locations of
control points. By collapsing control points at the edges of NURBS surfaces onto each other, we created
spheres for the head, thorax, and abdomen and then manipulated the control points to reshape the objects
according to the aspect ratios and curvatures derived from the fly images.
Observations from high-speed videos show that the wings exhibit chord-wise deformations during flight
that are concentrated at the L3, L4, and L5 veins (Fig. 1d). Accordingly, our model wing consists of four rigid
panels connected by three hinge lines. To reduce the number of parameters representing wing deformation,
we assumed that the deformation angle along each hinge line was equal (Fig. 2c). We thus modeled the total
deformation angle,
, as the sum of the local deformation angle of each hinge line (
=
/
3
+
/
3
+
/
3
), an
assumption that was supported by manual inspection of the raw images.
Each wing pose vector consisted of 8 values: 4 parameters for a quaternion describing the orientation
of the leading edge panel of the wing relative to the world reference frame, 3 parameters for a translation
vector describing the distance between the wing root and the origin of the world reference frame, and the
wing deformation parameter
. For the head, thorax, and abdomen pose vectors we used 7 parameters: 4
for the quaternion and 3 for the translation vector. Our total fly pose vector thus required 37 parameters to
account for the head (7), thorax (7), abdomen(7), left wing (8), and right wing (8). Besides this pose vector,
we defined a vector that contained 5 parameters for scaling the body and wing components independently.
We developed software, called Flynet, to implement the entire wing pose estimation process, including
the manual annotation required for creating a training set. A module in the Flynet GUI constructed in Python
with the PyQt and PyQtGraph packages allows the user to load the 3D fly model and scale, rotate, translate,
and deform all the components. The GUI uses the DLT-calibration
72
of the high-speed cameras to project a
wireframe of the fly model onto the three camera views. By adjusting the pose and scaling vectors of each
component until it matches the images in each view, an accurate label can be created for each frame triplet.
Using this GUI, we collected a manually annotated a final dataset of 2905 frame triplets and associated pose
vectors. Neural networks work best with data centered around 0 with a standard deviation of 1. Thus, the
8-bit pixels of the high speed images were rescaled by dividing by 255. To reduce computation time, the
normalized frames were cropped into
224
×
224
pixel images centered around the thorax of the fly.
The CNN was implemented in Python using the Tensorflow library with Graphics Processing Unit
(GPU) support. The CNN architecture started with three convolutional blocks, one for each camera view. A
graphical representation of the complete Flynet code (created using the the model plotting utility in Keras
3.0) is shown in Supplementary Figures 1 and 2; the full code is available as described in the Data Availability
Section of the main manuscript.
We randomly split our set of manually annotated data into groups of 95% and 5% for for training an
validation, respectively. The network was trained for 1000 epochs with a batch size of 50. After some
testing, we chose a dropout rate of
0
.
1
as it gave the lowest validation error. Lower dropout rates reduced the
training error but increased validation error; higher dropout rates increased the training error and therefore
also the obtainable validation error. Mean squared error (MSE) served as the loss function. We used a
learning rate of
1
.
0
·
10
4
and a decay of
1
.
0
·
10
7
. The decay value was subtracted from the learning rate
after each epoch, resulting in a linear decrease of the learning rate during training. Using a Nvidia Geforce
GTX 1080 graphics card with 8 GB of memory, it took approximately 24 hours to train Flynet. During the
training phase, the MSE for the independent components of the pose vector decayed up to 400 epochs and
remained constant afterwards (Extended Data Fig. 2b). Stabilization of the validation error indicates that
1
the network’s performance did not improve with more epochs, even though the training error continued to
decline. The weights of the trained network were saved in an hdf5 file and loaded in the Flynet GUI for the
pose prediction.
Refining pose with Particle Swarm Optimization
The neural network predictions of body and wing pose were close to the actual wing pose as observed in the
images, but not accurate enough to study the influence of muscle activity and resulting aerodynamic forces,
which are extraordinarily sensitive to small changes in kinematics
32
A larger training set might have resolved
this issue, but many additional frames would likely have been required. Instead, we opted to refine the pose
estimate in a second step using Particle Swarm Optimization (PSO)
30
. This approach avoids an extensive
manual annotation process while still maintaining the benefits of time-independent tracking.
To implement the refinement step, it was necessary to segment the images and create binary masks of the
body and wing pixels (Extended Data Fig. 2a). Instead of using a median image for background subtraction,
we determined a minimum pixel intensity image over a batch of 100 frames. Body pixels were found by a
simple threshold (intensity>200) of the raw, 8-bit image data. Wing pixels were identified by subtracting the
minimum image from the frame and subsequently selecting all pixels that had an intensity >10. Subtracting
the minimum image removed all stationary pixels, thereby identifying those corresponding to moving wing
parts.
Our refinement approach used area matching as a measure of how well the 3D model aligned with the
images. The advantage of area matching is that it is simple and computationally efficient; however, the
disadvantages are that the cost function is not continuous and gradient-based optimization algorithms will
not work well. Fortunately, PSO is a gradient-free optimization algorithm that has the ability to find the
global minimum, even when the cost function contains multiple local minima. PSO is not guaranteed to
converge on the global minimum in a finite number of iterations, but for complex problems such as area
matching, it is one of the few optimization methods available.
PSO relies on a swarm of particles that move through the state space with a position and a velocity.
The state vector consists of all variables that affect the cost function. At the start of the PSO algorithm, the
particles in the swarm are given random positions and random velocity vectors. For a set number of iterations,
the algorithm updates the position and velocity of each particle according to the following equations:
=
1
+
,
(1)
=
푤푣
1
+
1
1
(
푏푒푠푡
1
)+
2
2
(
푏푒푠푡
1
)
,
(2)
where
is the particle index,
is the particle position at iteration
,
is the particle velocity,
1
and
2
are random weight vectors drawn from a normal distribution,
푏푒푠푡
is the personal best of the particle,
푏푒푠푡
is the global best for the whole particle swarm, and
,
1
, and
2
are weights. A particle’s personal best
is the position with the lowest value for the cost function throughout the travel history. The global best is
the position with minimum cost value in the travel history of all particles of the swarm. In each iteration,
the cost function is evaluated for the current position of each particle in the swarm. Subsequently, the cost
function values of the personal and global best are compared to the cost evaluations for the current positions.
If the current cost function of a particle position is lower than the personal or global best, these values and
the associated position vectors are updated. Additionally, in each iteration vectors
1
and
2
are updated by
randomized weights drawn from a normal distribution.
The velocity update of each particle (equation 2) consists of three parts. The inertia term,
·
1
, is
the previous velocity multiplied with weight,
. The second term points towards the personal best of the
particle with some added random noise and multiplied by the scaling factor,
1
. The third term is a vector
pointing towards the global best of the swarm with added random noise multiplied by a scaling factor,
2
.
This scaled randomization introduces stochasticity in the search process, thus insuring that the particles will
sample the space near a minimum more often. The vectors pointing towards the personal and global best
2
positions help the particles converge towards a minimum value. The inertia term controls how quickly the
swarm converges to a global best. If the inertia term is small, the swarm will converge quickly and risks
getting stuck in a local minimum. Setting the inertia term too high, however, may result in no convergence
at all or convergence only after a large number of iterations.
In our particular application of PSO, the cost function was based on the projected images of the 3D
model. In order to speed up evaluation, the PSO algorithm and the cost function were implemented in C++.
The PSO algorithm lends itself well for parallel processing, as the cost function evaluations of all particles
can be executed at the same time. Using the std-library in C++17, a separate thread for evaluating the cost
function was assigned to each particle. The cost function relies on matrix operations that were implemented
using the Armadillo library. To integrate the C++ code within the Flynet GUI, the Boost library was used to
make the functions callable in Python.
The 37-dimensions pose vector was a large parameter space in which to search. However, we simplified
the process by splitting the pose vector into 5 different components (head, thorax, abdomen, left wing, and
right wing), with 7 or 8 state parameters each. Although the pose vectors of the five model components are
independent, the cost function evaluation is not, as different model components can overlap in the projected
view of the 3D model. To address this dependence, the PSO algorithm updated one randomly selected
component at a time.
The variables in the pose vector are all linear, except for the quaternion. A unit quaternion,
=
1
,
can be seen as a point on a 4D sphere with radius 1. However, when updating the quaternion according to
equations 1 and 2, the result is not likely to be a unit quaternion. Determining the quaternion difference
that satisfies the constraint of a 4D sphere with radius 1 requires application of the concepts of quaternion
multiplication and the quaternion conjugate
73
. The details by which we implemented the PSO algorithm
using quaternion operations are detailed elsewhere
74
.
The cost function for each model component
,
, is given by:
=
Δ
,
(3)
for body components, and
=
Δ
,
(4)
for the wings, where
, and
are the dynamic bitsets of the body and wing pixels respectively,
is the
dynamic bitset of the model component projections in all 3 views, and
Δ
and
are the bitwise symmetric
difference and union operators, respectively. We vectorized both the images masks and the projected images
to accelerate these operations.
Before the tracking started, we set the number of particles, the number of iterations, and the standard
deviation of the normal distribution that is used to perturb the initial state and search parameters. The initial
particle pose vectors were created by adding random noise to the predicted pose vector from the CNN.
Similarly, the particle velocities were randomly sampled from a normal distribution. By specifying the
standard deviation of the normal distribution used to generate the random noise vectors, we specified how
close the particles searched around the initial pose. We chose a standard deviation of
0
.
1
, which meant that
most particles searched around the initial pose value.
With 300 iterations and 16 or more particles, the PSO algorithm converged on the manually annotated
pose if the initial CNN prediction was sufficiently accurate. After refining wing pose using PSO, we smoothed
the data using an extended Kalman filter as described in detail elsewhere
74
.
Representing wing motion in the strokeplane reference frame
Although mathematically preferable, quaternions do not provide an intuitive sense of wing motion. Instead,
we defined a set of Tait-Bryan angles relative to a thorax-fixed reference frame. Prior studies of free
3
flight kinematics
32
,
75
,
76
have used this approach to define a reference frame at a fixed angle relative to the
longitudinal body axis. In case of tethered flight, however, the animal often deflects its head and abdomen for
prolonged periods as it attempts to fictively steer. This is problematic, as the body’s longitudinal axis does
not necessarily align with the symmetry plane of the thorax under these conditions. The tracking accuracy
of the thorax alone is generally not sufficient to provide a stationary and symmetric reference frame. Instead
of determining a reference axis based on the body, we defined our reference frame using the positions of the
left and right wing roots. Because the fly is tethered, the wing roots move in a stable, ’c’-shaped trajectory
around the thorax (Fig. 2b). Performing a principle components analysis (PCA) on the left and right root
traces provides a convenient means of defining a strokeplane reference frame (SRF). The first three principal
components correspond to the
-,
-, and
-axis of the SRF, respectively. Although the axes defined in this
way are orthogonal, they are not fixed consistently in space. To create a reference frame that has the
-axis
pointing forward and the
-axis pointing upward, we used the thorax pose vector to establish directionality.
Using the normalized axes of the SRF, it is possible to compute the quaternion of the SRF,
푆푅퐹
. The
left and right wing quaternions relative to the SRF may be obtained by multiplying the left and right wing
quaternions with the conjugate,
푆푅퐹
. The left and right wing root traces can then be expressed relative to
the SRF by subtracting the origin location of the SRF and multiplying by the rotation matrix of
푆푅퐹
.
The mechanics of the wing hinge are complex and the root of the wing is not realistically approximated
as a single point. This is why we choose not to define any constraints between the wing and thorax in Flynet.
However, to simplify the analysis of wing motion and aerodynamics, we assumed the hinge to act as a ball
joint at a fixed position on the thorax with 3 degrees of freedom. From an aerodynamics perspective, this
assumption is not likely to change computed or measured aerodynamic forces substantially, as the arm of the
wing root is relatively small and does not increase wing velocity significantly.
The three Tait-Bryan angles that specify wing orientation are: 1) the stroke angle,
, that describes the
angle between the
-axis of the SRF and the projection of the wingtip on the
-
plane, 2) the deviation
angle,
, that is the angle between the wingtip and the
-
plane, and 3) the wing pitch angle,
, that specifies
the orientation of the leading edge panel of the wing with respect to the
-axis (Fig. 2c). Wing shape is
described by
and remains unaltered from the Flynet definition. The wing kinematic angles of the right
wing are similarly defined such that the signs are symmetric with respect to the left wing.
With the wing kinematic angles defined, we parsed the left and right wing motion into distinct wingbeats.
A wingbeat was defined as the time between two subsequent dorsal stroke reversals. We computed the
derivative of the stroke angle,
휕휙
/
휕푡
, and used the condition
휙 >
0
and
휕휙
/
휕푡 >
0
to find the dorsal stroke
reversal times of the left and right wings in each video sequence. The dorsal reversal time of the left wing
does not necessarily align with the right wing, and can be up to 3 frames apart. To remedy these small
phases differences, we computed an average time point between the dorsal reversals of the two wings and
subsequently rounded to the closest time point. For each high-speed video video sequence, the average
dorsal stroke reversal times were determined and used to parse out individual wingbeats, which we indexed
so as to preserve the instantaneous frequency and the numerical position within each sequence.
Encoding wing motion with Legendre polynomials
Depending on the activation level of the indirect power and tension muscles, wingbeat frequencies can
vary between
150 Hz
and
250 Hz
39
. This variation in wingbeat duration makes it difficult to compare wing
kinematics between different high-speed video sequences. A wingbeat frequency of
200 Hz
corresponds to
about 75 high-speed video frames when captured at 15,000 frames per second. As there are four kinematic
angles per wing, there are thus 300 data points representing wing motion during one wingbeat. In order to
reduce the number of data points per wingbeat and allow for comparison across wingbeats, we used Legendre
polynomials to parameterize wing kinematic traces. Legendre polynomials are well suited for encoding wing
kinematics, as they can fit non-periodic traces and asymmetric waveforms. Fourier coefficients, which are
often used for fitting time series, require periodic boundary conditions and are biased to symmetric wave
forms when a low number of harmonics is used. Similar to Fourier series, Legendre polynomials form
a complete and orthogonal system. An easy way to generate a Legendre polynomial is using Rodrigues’
4
formula:
(
)
=
1
2
∑︁
=
0


2
(
1
)
(
+
1
)
,
(5)
where
is the order of the Legendre polynomial and
ranges between -1 and 1. A Legendre basis of order
is formed by all polynomials from order
0
up to order
. By specifying sample points in the range
=
[−
1
,
1
]
and computing the values for each polynomial in the Legendre basis, a Vandermonde matrix is created. The
Vandermonde matrix,
, can be used in a least-squares fit of a wing kinematic trace,
:
=
(
)
1
푌,
(6)
where
is a vector of
+
1
coefficients corresponding to the order of the Legendre basis. With sufficient
coefficients it is possible to fit any trace throughout a wingbeat with very little error. When using too many
coefficients, however, the the polynomial fit may exhibit high frequency oscillations at the beginning and
end of the waveform known as the Runge phenomenon. This may be remedied by lowering the order of the
Legendre basis and imposing boundary conditions at the start and end of the wingbeat. Boundary conditions
on polynomial fits can be imposed using restricted least-squares
77
. By this method, the restricted least-
squares fit,
, makes use of the the unrestricted least squares fit,
, a restriction matrix,
, and restriction
vector
:
=
−(
)
1
(
(
)
1
)
1
(
푅훽
)
.
(7)
To reduce the Runge phenomenon, we chose the restriction matrix and vectors such that the transition
between subsequent wingbeats was continuous up to the fourth derivative. The
푡ℎ
derivative of a Legendre
polynomial is computed from:
(
)
=
·
1
1
+
·
1
.
(8)
The
1
푠푡
,
2
푛푑
,
3
푟푑
, and
4
푡ℎ
derivatives of the actual wing kinematic trace were computed over a 9 frame time
window centered around the start and end time points of the wingbeat. By requiring that the Legendre fit
matches the actual values of the wing kinematic trace and the derivatives, the relationship between the least
squares fit,
, the restriction matrix,
, and the restriction vector,
is:
푅훽
=
푟.
(9)
The restriction matrix contains the Legendre basis and the derivative Legendre bases up to the
4
푡ℎ
derivative
for
=
1
and
=
1
(start and end of the wingbeat). Matching values of the wing kinematic trace and the
1
푠푡
,
2
푛푑
,
3
푟푑
, and
4
푡ℎ
derivatives are contained in the restriction vector
. Inserting
and
in the restricted
least-squares solution of equation 7 provides the Legendre coefficients that make the transitions between the
previous and next wingbeat continuous up to the
4
푡ℎ
derivative.
After finding the restricted least-squares solution, we fit the waveforms using higher order Legendre
polynomials, making sure to stay below the number of coefficients that induced Runge oscillations. Using
higher order polynomials is preferable as it allows for a more accurate fit. For each wing kinematic angle,
we tested various number of Legendre polynomials. The stroke angle,
, could be fitted accurately with
16 polynomials. The deviation and deformation angles (
and
) required 20 polynomials each. The wing
pitch angle,
, required 24 polynomials. Thus, a we used a total of 80 Legendre coefficients to accurately
describes the motion of a single wing during each wingbeat.
Convolutional neural network predicting wing motion from muscle activity
We created a CNN to perform non-linear regression between muscle activity, as measured by GCaMP
fluorescence
27
, and wing motion, as extracted from the high speed video data using Flynet and PSO. A
graphical representation of this code (created using the the model plotting utility in Keras 3.0) is shown in
5
Supplementary Figure 3; the full code is available as described in the Data Availability Section of the main
manuscript.
Before the CNN could be trained, we first normalized the input and output data. To rescale muscle
activity traces, the mean,
, and standard deviation,
, were computed for the periods when the fly was
flying. A boolean value that specified whether the fly was flying (as determined from wingbeat frequency)
was saved to an hdf5 file with a timestamp during each experiment, and could be subsequently used to
identify flight bouts and exclude sequences when the animal stopped flying. For all steering muscles, except
the
2
and
푖푖푖
1
muscles, the activity was rescaled such that a value of 0 corresponds to
2
and 1 to
+
2
.
The
2
and
푖푖푖
1
muscles are typically quiescent during flight, and in some some experiments were not active
at all. To accommodate sequences when these muscles were quiescent, the values of
2
and
푖푖푖
1
were not
rescaled if the standard deviation was below
0
.
01
. If the standard deviation was above this threshold, the
2
and
푖푖푖
1
traces were rescaled such that a value of 0 corresponded to
and a value of 1 to
+
3
. Although
we tracked wing kinematics for both wings, steering muscle activity was only recorded from the left side of
the thorax; thus, we only used the kinematics data from the left wing in our analysis. The 80 values of the
Legendre coefficients are in units of radians, which we normalized by dividing by
. Wingbeat frequency
was rescaled by subtracting
150 Hz
from the raw values and subsequently dividing by
100 Hz
, such that a
value of 0 corresponds to
150 Hz
and a value of 1 to
250 Hz
.
Although neural networks are capable of handling outliers, too many outlying data points could result
in decreased performance. We excluded wingbeats from the dataset under any of the following conditions:
1) the normalized muscle activity was lower than
0
.
5
or larger than
1
.
5
, 2) the normalized wingbeat
frequency was lower than
0
or larger than
1
, and 3) if any of the normalized Legendre coefficients were
not inside specified ranges: [
1
/
3
<
=
<
=
1
/
3
], [
2
/
3
<
=
<
=
2
/
3
], [
2
/
3
<
=
<
=
2
/
3
], and
[
1
/
3
<
=
<
=
1
/
3
]. After removing all outliers, the final dataset consisted of 72,219 wingbeats. For
the validation set, we selected the first 30 wingbeats of each high-speed video sequence (10,868 total). The
remaining wingbeats in each sequence (61,351 total) were used for the training set.
In the first layer of our CNN (Fig. 3), the network processes the GCaMP7f fluorescence signal via a
number of convolutional kernels over a fixed time window. After evaluating several different values, we
found that a time window length of 9 wingbeats (approximately
45 ms
) worked well. A shorter time window
did not contain sufficient information and prediction accuracy was worse as a consequence. Longer time
windows improved the training error, but the validation error tended to be higher, especially for very long
windows (>50 wingbeats).
Our CNN architecture consisted of 2 convolutional layers and 2 dense layers (Fig. 3a). Input of
the network consisted of a
13
×
9
matrix of normalized muscle activity and normalized frequency of 9
subsequent wingbeats. The output of the network is a prediction of the 80 normalized Legendre coefficients
corresponding to the first wingbeat in a 9 wingbeat time window. Before the input data was fed into the first
convolutional layer, Gaussian noise with a standard deviation of
0
.
05
was added to the input matrix, which
helps the CNN to generalize on the data and makes the network more robust to noise in the fluorescence
recordings. The first convolutional layer of the CNN consisted of 64 kernels with a
1
×
9
kernel window, a
1
×
9
stride and SELU activation (Fig. 3a). A total of 256 kernels were used for the second layer, with a
13
×
1
kernel window and stride, and SELU activation. The output of the second layer was a vector with 256
elements. A fully connected dense layer of 1024 virtual neurons takes in the output of the second layer and
applies a SELU activation for each virtual neuron. The last layer of the CNN has 80 neurons with a linear
activation function, corresponding to the 80 normalized Legendre coefficients.
The CNN was trained for a total of 1000 epochs with a batch size of 100 samples. After every epoch, the
network was evaluated using the validation set. The learning rate was set as
10
4
with a decay of
10
7
per
epoch with MSE as a loss function. After the first 200 epochs, the validation error stabilized at approximately
10
3
.
7
, whereas the training error continued to decline (below
10
3
.
33
) after 1000 epochs (Fig. 3d). After
training, the prediction performance of the network was tested on all videos in the dataset. Examples of
the CNN prediction performance are given in Fig. 3c and Extended Data Fig. 3. For most sequences, the
wing motion predicted from muscle activity was within
±
2
of the tracked wing motion for each angle. This
6
accuracy was quite remarkable, given the complex waveforms of the wing kinematic angles, as well as the
sparse and filtered nature of the input data.
Virtual experiments using the trained CNN
After developing a CNN to predict wing motion from observed muscle activity, we used it as an
in-silico
model of the hinge to conduct virtual experiments. The wing kinematics predicted by baseline muscle
activity was determined by inspecting the the dataset for sequences that contained no significant changes
in muscle activity or wing motion, and subsequently averaging data over those sequences. The baseline
values of activity for each muscle and wingbeat frequency, kept constant over the 9 wingbeats, were fed into
the trained CNN to yield the baseline pattern of wing motion wingbeat (gray traces, Fig. 4). A naive way
of using the CNN to predict the action of individual muscles would be to simply vary the activity of each
muscle independently, while keeping that of all other muscles constant constant. However, we observed
strong correlations among the activity patterns of many of the muscles within our dataset, determined using
RANSAC
68
(Extended Data Fig. 4, Extended Data Table 1). Thus, if we performed virtual experiments
by independently changing the activity of each muscle, the input patterns would likely reside outside the
data subspace on which the CNN was trained, making the network predictions unreliable. To make more
reliable predictions of wing motion from the trained CNN, we deliberately interrogated the CNN using input
patterns that would not deviate substantially from those used to train the network by making use of the
measured muscle correlations (Extended Data Fig. 4). According to this strategy, we used the trained CNN
to probe the influence of each muscle by changing its input value to the maximum normalized value, while
simultaneously changing the activity of the other 11 muscles according to the measured correlations.
Evaluating the function of individual sclerites using latent variable analysis
The CNN that we constructed and trained to predict wing kinematics from muscle activity deliberately
mixed input from all 12 control muscles, making it difficult to draw conclusions on the mechanical function
of individal wing sclerites. As a complementary approach, we trained a CNN with an encoder-decoder
architecture to learn how the activity of all the muscles that attached to a particular sclerite were correlated to
changes in wing motion (Fig. 6; Extended Data Fig. 8). The encoder block contained deliberate bottlenecks
that forced the network to learn how to represent the input data by a small set of latent variables representing
the state of each sclerite and wingbeat frequency. Training the encoder-decoder with a bottleneck is roughly
equivalent to performing a non-linear PCA on the dataset
4
. A graphical representation this code (created
using the model plotting utility in Keras 3.0) is shown in Supplementary Figure 4; the full code is available
as described in the Data Availability Section of the main manuscript.
We constructed the encoder-decoder with two decoder heads, one that predicts muscle activity from
the latent variable space and another that predicts wing kinematics (Extended Data Fig. 8). Input to the
encoder-decoder was split into 5 streams, grouping the activity of muscles that insert on each of the 4 wing
sclerites with a separate stream for wingbeat frequency. Each stream uses two convolutional layers, as in our
original CNN model (Fig. 3a), but followed by a dense layer of 512 neurons and a final layer of one neuron
with linear activation that corresponded to one of the five latent variables. After the latent space, the network
is split into two streams: one stream uses a decoder to predict the input to the network (i.e. the muscle
activity), the second stream uses two dense layers to predict the Legendre coefficients of the wing motion.
We placed a back-propagation stop between the latent space and the muscle activity decoder layers, such that
the latent space is only trained on correlations with wing motion. The muscle activity decoder layers are still
functional, as the decoder predicts the correlation between the latent space and muscle activity, but without
the noise of the input to the network.
We trained the network on the muscle activity and wing motion dataset using 1000 epochs, a batch size
of 100, a learning rate of
10
4
, and a decay rate of
10
7
. The network predicts two vectors: the latent space
(
5
×
1
) and Legendre coefficients (
80
×
1
), and one matrix: muscle activity and wingbeat frequency over 9
wingbeats (
13
×
9
). It is important to note that the prediction error of the wing kinematics by the sclerite
7
function encoder-decoder is expected to be worse than our original CNN, as the bottleneck required to create
the latent variable space greatly restricts the amount of information that can be used to learn the correlations.
In Figure 6 and Extended Data Fig. 8, only the decoder sections of the network were used to predict muscle
activity and wing motion for each latent variable, which we varied systematically between
3
and
+
3
in
9 steps while keeping the other latent variables constant at
0
. The muscle activity and wing motion were
then predicted from the latent input by the two decoders.
Dynamically-scaled flapping wing experiments
In the last three decades, our laboratory has developed several versions of a dynamically-scaled flapping
wing robot
38
,
78
81
; however, none of these had the capability of creating the chord-wise deformations that we
observed and tracked in our high-speed videos. The robotic wings we designed for this study were actuated
by one stepper motor and two servo motors each (Extended Data Fig. 5a). Stroke angle was controlled by
a stepper motor (
10 kHz
clock) via two gears with a 1:3 gear ratio. The position of the stepper motor was
controlled via micro-stepping, which permitted fine motor control (
7
.
8
·
10
3
degrees per microstep). Two
magnets were positioned on the gear so that a Hall-effect sensor was activated when the wing moved out of
bounds. Besides protecting the wing, the Hall-effect sensor was also used to home the stepper motor. During
an experiment, a Teensy 3.2 microcontroller tracked the number of microsteps travelled relative to the home
position. To account for motor slip, which sometimes occurred due to large torques, the steppers were homed
after each experiment. During an experiment, the stepper’s position,
, and velocity,
¤
, were controlled via
a feed-forward process. The stepper made micro-steps on a
10 kHz
clock; position and velocity set points
were updated at
200 Hz
. The custom instrumentation software computed the required stepping frequency
and direction to reach the specified position and velocity values after each update.
The deviation and wing pitch angles were controlled by two servos (HiTec D951TW). The deviation
angle servo operated via two gears (1:1 gear ratio), and the wing was directly attached by the wing pitch
servo. The position of the servos was specified by a pulse-width modulation (PWM) signal at
50 Hz
. The
deviation servo could move between
45
and
+
45
and the wing pitch servo could move between
90
and
+
90
. During an experiment, the position of the servo was updated at
50 Hz
in a feed-forward control loop.
We measured forces and torques on the wing using a 6 degree-of-freedom force and torque (FT) sensor
(ATI Nano 17), mounted on the rotation axis of the wing pitch servo. Custom-machined aluminium mounts
coupled the base of the FT sensor to the servo axis and the wing. During each experiment, six 16-bit unsigned
integers corresponding to all forces and torques (
,
,
,
,
,
) were sampled at
200 Hz
. The robot
was submerged in an acrylic tank (
2
.
4
×
1
×
1
.
2
) filled with mineral oil (Chevron Superla white oil), with
a kinematic viscosity of
115
·
10
6
2
·
1
and a density of
880
푘푔
·
3
at
22
37
. The oil level in the
robofly tank was approximately
1 m
. The robot had the wingtip radius of
31 cm
and was submerged at a
depth of
50 cm
. A previous study that systematically mapped the effects of the tank boundaries on lift an
drag determined that the system approximated the forces generated within an infinite volume, provided the
wing remains further than 12 cm from any boundary (top, bottom, or side)
38
.
Besides the three Tait-Bryan angles describing wing orientation, Flynet tracked a fourth angle describing
wing shape, the deformation angle
. To implement this fourth angle on the robot, the wing was composed
of four panels connected by three hinge lines (Exended Data Fig. 6a). The four panels were cut out of an
acrylic sheet (
2
.
75 mm
thickness). Each hinge consisted of a
2 mm
steel rod core, surrounded by an acrylic
tube with inner and outer diameters of
2 mm
and
4 mm
, respectively. The acrylic tube was cut into sections
of
20 mm
and these sections were glued in an alternating pattern to two adjacent panels. The rotation angle
between two subsequent panels was controlled by a micro servo (HiTec HS-7115TH), which was screwed
onto one panel and connected to the next panel via a
1 mm
metal rod that was coupled to the servo arm.
Three micro servos were used to deform the wing. Following the assumption that the wing bending angle
was uniformly distributed over the three hinge lines, the deformation angle
was divided by 3 to obtain the
rotation angle per hinge line.
Dynamic scaling was ensured by matching the Reynolds number of the robotic wing and a real fly, which
requires:
8
푓푙푦
·
2
푓푙푦
푎푖푟
=
푟표푏표
·
2
푟표푏표
푟표푏표
,
(10)
where
푓푙푦
is the fly’s wingbeat frequency,
푟표푏표
the wingbeat frequency of the robot,
푓푙푦
the fly’s wing
length,
푟표푏표
the wing length of the robot,
푎푖푟
the kinematic viscosity of air and
푟표푏표
the kinematic
viscosity of the mineral oil. Rewriting the Reynolds number equality yields:
푟표푏표
=
2
푓푙푦
푟표푏표
2
푟표푏표
푎푖푟
푓푙푦
.
(11)
By entering the values for a typical fly (
푓푙푦
=
2
.
7
푚푚
,
푎푖푟
=
15
.
7
·
10
6
2
1
at
25
,
푓푙푦
=
200
퐻푧
)
and the RoboFly parameters (
푟표푏표
=
310
푚푚
,
푟표푏표
=
115
·
10
6
2
1
at
22
), the required flapping
frequency for the robot is
0
.
11
퐻푧
. The force scaling factor may be written as
74
:
푓푙푦
푟표푏표
=
푎푖푟
·
2
푓푙푦
·
4
푓푙푦
푟표푏표
·
2
푟표푏표
·
4
푟표푏표
,
(12)
with relevant density values of
푎푖푟
=
1
.
18
푘푔푚
3
and
푟표푏표
=
880
푘푔푚
3
. The torque scaling factor is the
product of force and moment arm:
푓푙푦
푟표푏표
=
푎푖푟
·
2
푓푙푦
·
5
푓푙푦
푟표푏표
·
2
푟표푏표
·
5
푟표푏표
.
(13)
Any experiment using the robotic wing required an input file that specified the following angles:
,
¤
,
,
and
at intervals of
5 ms
. A C++ program read the five angles and sent them to the Teensy microcontroller
via a USB-cable at
200 Hz
(RawHID protocol). The Teensy microcontroller converted the angles to PWM
commands for the servos and step&direction commands for the stepper motors. At the same time, the
force and torque data were sent from the Teensy micro-controller to the C++ program, which logged the
measurements.
At the start of an experiment, the wing was moved to the home position (
=
0
,
=
0
,
=
0
,
=
0
).
In order to prevent rapid acceleration of the wing, the wing kinematic angles during the first wingbeat of an
experiment were multiplied with the following function:
푠푡푎푟푡
(
)
=
sin
2

휋푡
4

,
(14)
where
corresponds to the wingbeat period. Similarly, the last wingbeat of the experiment ends at the home
position and the wing kinematic angles are multiplied by:
푒푛푑
(
)
=
sin
2

4
+
휋푡
4

.
(15)
When wings start flapping in the oil, it takes approximately two to three wingbeats before the wake is fully
developed
78
. In order to exclude these start-up effects, a wing kinematic pattern was repeated for 9 wingbeats
and only wingbeats 4 through 8 were used for analysis. After each experiment, the data were converted
from 16-bit unsigned integers into actual force and torque vectors and smoothed using a linear Kalman filter.
Besides aerodynamic forces, the FT-sensor also measured inertial and gravitational components. The low
flapping frequency of the robot assures that the inertial forces may be ignored; however, the gravitational
forces are significant. To remove the gravitational force from the analysis, every wing kinematic pattern
was replayed at a 5x slower frequency, which reduces aerodynamic forces by a factor of 25. The forces and
moment measured during the slow frequency, which approximate the gravity component, were subtracted
from the fast frequency data after interpolation.
9
The aerodynamic and gravitational forces were measured by the FT sensor in the wing reference frame.
For subsequent analysis it is useful to translate the forces and torques to the stroke reference frame (SRF),
which was accomplished by the following Tait-Bryan operations:
=
·
·
·
푆푅퐹
,
(16)
and
푆푅퐹
=
·
·
·
.
(17)
where
and
푆푅퐹
are the forces in the two different reference frames, and
and
,
,
, correspond
to the rotation matrices. More details on these operations are provided elsewhere
74
.
Computing inertial forces via the Newton-Euler equations
Besides aerodynamic forces, wing inertia plays a significant role in fly flight. Although the wing mass is
only
0
.
2%
of the body mass
39
, the angular velocity and acceleration of wing motion is very high. We
estimated the inertial forces and torques of a rotating rigid body using the Newton-Euler equations:


=

[
×]
[
×]
[
×][
×]
 
¤

+

[
×][
×]
[
×](
[
×][
×])

,
(18)
where
corresponds to the wing mass,
is the position of the center of gravity on the wing,
is
the inertia tensor of the wing,
is the linear acceleration of the wing,
¤
is the angular acceleration, and
the angular velocity, with all values relative to the wing reference frame. We ignored the effects of the
linear acceleration of the wing because they are very small compared to the angular acceleration. The first
right-hand term in equation 18 corresponds to the inertial forces and torques due to wing acceleration,
푎푐푐
and
푎푐푐
. Inertial effects dependent on angular velocity, such as the Coriolis and the centrifugal forces, are
captured by the second right-hand term and are collectively referred to as
푎푛푔푣푒푙
and
푎푛푔푣푒푙
. We assumed
that the inertial effects of wing deformation are small, and thus calculated inertial forces and torques based
on the assumption that the wing is a rigid, flat plate.
The angular velocity and acceleration had to be computed from the Legendre polynomials describing
wing motion. We computed the temporal derivatives of the Legendre polynomials (
¤
,
¤
,
¤
,
¥
,
¥
,
¥
) using
equation 8. However, the temporal derivatives of the wing kinematic angles do not correspond directly to
the angular velocity and acceleration. Angular velocity in the wing reference frame is given by:
=
·
·
¤
0
0
+
·
0
0
¤
+
0
¤
0
,
(19)
where
and
correspond to the rotation matrices of the Tait-Bryan operations for
and
, respectively.
Angular acceleration is given by:
¤
=
¤
·
·
¤
0
0
+
·
¤
·
¤
0
0
+
·
·
¥
0
0
+
¤
·
0
0
¤
+
·
0
0
¥
+
0
¥
0
,
(20)
with
¤
being the temporal derivative of the rotation matrix.
To compute the inertial forces and torques, the mass, center of gravity, and the inertia tensor of the wing
must be determined. These parameters were computed using the scaled 3D model we created for Flynet,
with a wing length of
2
.
7
푚푚
, an estimated cuticle density of
1200
푘푔
·
3
, and a wing thickness of
5
.
4
휇푚
82
.
The inertial parameters of the (left) wing are given by:
=
1
.
61
·
10
9
푘푔,
(21)
10
=
0
.
16
1
.
31
0
푚푚,
(22)
=
6
.
32 0
.
55 0
0
.
55 0
.
36 0
0
0 6
.
67
·
10
9
푘푔
·
푚푚
2
.
(23)
With the wing inertia parameters, angular velocity, and acceleration defined, it is possible to compute the
inertial forces and torques for any arbitrary pattern of wing motion pattern. Extended Data Fig. 5b shows the
aerodynamic and inertial forces and torques in the SRF for the baseline wingbeat. The aerodynamic forces
were measured on the robot and rescaled to the appropriate units for an actual fly using equations 12 and 13.
To make interpretation easier, the forces and torques have been non-dimensionalized by dividing the forces
by the body weight, and the torques by product of body weight and wing length. Using the estimated cuticle
density and the scaled body components of the 3D fly model, the body mass was found to be
1
.
16
·
10
6
푘푔
.
Inspection of the traces in Extended Data Fig. 5b indicate that the inertial forces and torques in wing motion
are substantial, as found in prior studies
83
.
With the methods to measure aerodynamic forces and compute inertial forces established, it is possible
to evaluate the forces generated by changes in steering muscle activity. For each steering muscle, seven
wing kinematic patterns were tested on the robotic wing. The seven wing kinematic patterns were found
by sampling the muscle activity patterns on a line between the baseline muscle activity and the maximum
muscle activity of a selected muscle (Fig. 4), and subsequently predicting the corresponding wing motion
using the trained CNN. The workflow to obtain the control forces and torques of muscle activity consisted
of measuring the force and torque traces of 84 wing kinematic patterns on the robot, performing the gravity
subtraction, smoothing the traces with a Kalman filter, converting traces to the SRF, computing the median
force and torque traces from the measured wingbeats (
푎푒푟표
and
푎푒푟표
), estimating the inertial components
(
푎푐푐
,
푎푐푐
,
푎푛푔푣푒푙
,
푎푛푔푣푒푙
), and finally, summing the inertial and aerodynamic forces and torques (
푡표푡푎푙
and
푡표푡푎푙
). A time history of the individual force and torque components is plotted over a baseline wingbeat
in Extended Data Fig. 5b. The corresponding instantaneous total force vectors are shown superimposed
over the 3D kinematics in Extended Data Fig. 5c, with comparable plots for the kinematics in each virtual
muscle activation experiment (Fig. 4) shown in Extended Data Fig 6. Further details on the construction
and operation of the robot and the analysis of aerodynamic forces and torques are provided elsewhere
74
.
Simulating arbitrary free flight maneuvers with Model Predictive Control
Using our CNN, the virtual muscle activation experiments, and robotic wing experiments provides us with
the components necessary to construct a state-space simulation of a flying fly using Model Predictive Control
(MPC)
70
, in which the virtual fly regulates its flight trajectory via changes in the time course of steering
muscle activation. By specifying an initial and goal state and a time period to achieve the goal state, a
MPC controller finds the optimal trajectory given a cost function and constraints. The explicit discrete
time-invariant state-space system is governed by the following equations:
(
+
1
)
=
(
)
(
)+
(
)
(
)
,
(
)
=
(
)
(
)+
(
)
(
)
,
(24)
where
(
)
and
(
+
1
)
are the state vectors at times
and
+
1
respectively,
(
)
is the control vector,
(
)
is the system matrix,
(
)
is the control matrix,
(
)
the output matrix,
(
)
the feed-forward matrix,
and
(
)
the output vector. In the case of fly flight, there is no feed-forward process and the output of the
state-space system is the state vector. This means that matrices
(
)
and
(
)
do not need to be defined
(Extended Data Fig. 7a).
The state vector describes the orientation, position, velocity, and angular velocity of the fly’s body:
11
=

0

,
(25)
where
and
are the linear and angular velocity of the body in the SRF, respectively,
is the quaternion
describing SRF orientation relative to the inertial reference frame, and
is the position of the SRF in the
inertial reference frame. For simplicity, we assume that the head and abdomen are stationary relative to the
thorax. The temporal derivative of the state vector,
¤
, is required for updating the state for each time step:
¤
=

¤
¤
¤

,
(26)
where
and
¤
are the linear and angular acceleration of the body in the SRF, respectively.
At each time step, the state vector is multiplied with the system matrix,
(
)
, to compute the temporal
derivative of the state vector. The time step,
Δ
, corresponds to the wingbeat period, which is assumed
to be constant at
1
/
=
1
/
200
=
0
.
005
. Although flies regulate wingbeat frequency during flight, it
typically takes 20 wingbeats to increase or decrease the wingbeat frequency to a new level
32
. As our model
was developed for simulating rapid maneuvers of only 10 wingbeats, frequency was held constant in our
simulations.
The equations of motion of the system matrix include the following flight forces and torques: body
weight, body inertia, body aerodynamics, and the aerodynamic and inertial damping of the wings. The mass,
center of gravity, and inertia tensor of the body were determined from the scaled 3D body model (head,
thorax, abdomen) for the average fly in the dataset. For the wing mass, the cuticle density was assumed to
be
1200
푘푔
·
3
, and the body mass estimated as
=
1
.
16
·
10
6
푘푔
. The center of gravity of the body in
the SRF was estimated as
=

0
.
04 0
0
.
24

푚푚
. The inertia tensor of the body is given by:
=
0
.
56 0
0
.
19
0 0
.
67 0
0
.
19 0 0
.
23
푘푔
·
푚푚
2
.
(27)
The inertial forces and torques on the body can be computed with the Newton-Euler equations, evaluated at
the center of gravity of the body:


=

0
0
 
¤

.
(28)
Obtaining the rotational accelerations of the body requires solving the inverse problem:

¤

=

0
0

1


.
(29)
As the state is evaluated at the center of gravity, there are no torques due to the fly’s weight. Although
the aerodynamics of the wings generate larger forces, the body of the fly experiences drag during flight.
The combined 3D shape of the head, thorax, abdomen and legs is complex, but we chose to model the
body conservatively as sphere with a radius of
1
푚푚
. While the actual body drag is likely to be lower, the
simplicity of the drag model makes it easier to implement in the state-space system. The drag on the sphere
can be calculated as:
퐷푥
퐷푦
퐷푧
=
푠푔푛
(
1
2
휋휌푣
2
푠푔푛
(
1
2
휋휌푣
2
푠푔푛
(
1
2
휋휌푣
2
,
(30)
with the drag coefficient as
=
0
.
5
and
푠푔푛
as the sign operator. Because the center of pressure of the
sphere is assumed to be at the center of gravity of the body, there are no torques due to body drag.
Because the aerodynamic and inertial damping terms due to the motion of the flapping wings are
dependent on body rotational velocity, these components are added to the system matrix. The equations of
motion to compute the relevant linear and rotational accelerations can be written as:
12