n
ature
|
methods
High
-
throughput ethomics in large groups of
Drosophila
Kristin Branson, Alice
A
Robie, John Bender, Pietro Perona & Michael
H
Dickinson
Supplementary figures and text:
Supplementary Figure 1
llustration of groundtruthing using hig
h
-
resolution video.
Supplementary Figure 2
Cumulative distribution of position errors.
Supplementary Figure 3
Extended behavioral vectors.
Supplementary Figure 4
Summary statistics of behavior.
Supplementary Table 1
Identity errors.
Supplementary Tabl
e 2
Parameters of each of the eight behaviors.
Supplementary Table 3
Position errors for close flies.
Supplementary Note
Note:
Supplementary So
ftware and Supplementary Videos
are
available on the Nature Methods website.
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Figure 1
: Illustration of groundtruthing using high-resolution video. (
a
) Example images of the same flies captured by the
low- and high-resolution cameras. The left image of each pair is from the low-res camera, the right image is from the high-res camera. The
red box in the low-resolution images indicates the location of the adjacent high-resolution image. The last pair of frames shows a male and
female fly near each other. (
b
) Examples of high resolution labels and corresponding low-resolution tracks. Each row shows the manually
labeled high resolution image and the positions estimated by the tracker for a different example. The first column shows the true labels plotted
on the high-resolution frame. The second column shows the projection of the positions estimated by the tracker on the high-resolution frame.
The third column shows the projection of the true labels on the low-resolution frame. The last column shows the positions estimated by the
tracker on the low-resolution frame. The last row shows touching flies. The red circle and arrow show the bounding box-based center position
and orientation. The colored circles indicate the positions of the five parts.
1
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Figure 2
: Cumulative distribution of position errors. We plot the cumulative distribution of position errors for each of the
methods of estimating error. On the left, we plot error in estimating point locations. On the right, we plot error in estimating the orientation
of the fly. The line labeled ‘Bounding box’ corresponds to errors in estimating the center and orientation of the fly by manually drawing a
bounding box on the high resolution images. The next five lines correspond to errors in estimating the positions of the parts manually labeled
on the high resolution images. The ‘Walk wobble’ line corresponds to estimates of the center and orientation error computed by smoothing
the velocity and orientation during walks. The ‘Stop wobble’ line corresponds to estimates of the center and orientation computed from the
variance in position of a stopped fly. On the
x
-axes, the colored ticks mark the median values.
2
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Figure 3
: Extended behavioral vectors for each male, female, and
fru
1
/fru
1
fly (see
Fig. 3e
of the main text for details). In
the top image, the color indicates the (normalized) fraction of time each fly performs each behavior. In the bottom image, the color indicates
the (normalized) mean duration of bouts of each behavior for each fly. Mean duration cannot be computed for flies that do not perform a
given behavior; this is indicated by black.
3
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Figure 4
: Summary statistics of behavior. For each fly and behavior, we computed (a) the frequency of onsets of the
behavior (
Fig. 3d-f
), (b) fraction of time the fly performed the behavior (
Supplementary Fig. 3
, top), and (c) mean duration of sequences
of the behavior (
Supplementary Fig. 3
, bottom). The mean and standard deviation of these statistics over the entire fly population is shown
in the top row (black). In the bottom row (colored), we show the normalized (z-scored) mean and standard error for each of the five fly and
trial types (male/wild type/single-sex, ..., female/wild type/mixed-sex,
fru
1
/fru
1
. These plots show that many of these behavioral statistics are
significantly different for different fly types.
4
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Table 1
: Identity errors. Number of each type of identity error for each video evaluated. Each row corresponds to a different
video. The first two columns show the number of female and male flies in the video. The third column shows the number of frames in the
video. The fourth column shows the number of times a pair of flies was close (note that this number can be greater than the number of frames
in the video, as multiple pairs of flies may be close in a single frame). The next five columns show the number of errors of each type:
Swap
identity,
Lost
fly,
Spurious
track,
Merged
fly, and
Split
fly.
Fly Sex
N. Frames
N. Errors
Error Freq
♀
∗
♂
Total
Close
Swap
Lost
Spurious
Merged
Split
Swaps/Close
Errors/Fly/Frame
10
0
6364
42
0
0
0
0
0
0
0
10
0
6118
67
1
0
0
0
0
1.5e-2
1.9e-05
20
0
6221
261
0
1
0
0
0
0
9.4e-06
20
0
6206
358
0
1
0
0
0
0
8.5e-06
50
0
6123
2135
0
2
0
0
0
0
7.3e-06
50
0
6587
4545
1
0
0
0
0
2.2e-4
7.1e-06
0
10
6106
268
0
0
0
0
0
0
0
0
10
6133
316
0
0
0
0
0
0
0
0
20
6169
1237
0
1
0
0
0
0
8.1e-06
0
20
6118
1029
0
1
0
0
0
0
8.8e-06
0
50
8696
9765
4
0
1
0
0
4.1e-4
1.2e-05
0
50
6088
4855
7
1
0
0
0
1.4e-3
3.0e-05
5
5
6268
499
0
0
0
0
0
0
0
5
5
7859
753
0
0
0
0
0
0
0
10
10
6121
2050
1
1
0
0
0
4.9e-3
1.9e-05
10
10
6104
1460
0
1
0
0
0
0
8.9e-06
25
25
6220
9843
1
3
0
0
0
1.0e-4
1.4e-05
25
25
6536
9902
1
1
1
0
0
1.0e-4
9.6e-06
*
Column Key
:
♀
: number of female flies.
♂
: number of male flies.
Tot
:
Tot
al number of frames in the video.
Clo
: Number of frames, pairs of flies in which
the flies are
clo
se to each other.
Swa
:
Swa
p identity errors.
Los
:
Los
t fly errors.
Spu
:
Spu
rious track errors.
Mer
:
Mer
ged fly errors.
Spl
:
Spl
it track error occurs.
Swa/Clo
: Ratio of number of identity
swa
ps to frames in which the flies are
clo
se.
5
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Table 2
: Parameters for each of the 8 behaviors. Each row for a behavior corresponds to a different property that is computed
at every time instant. The ‘Property’ column gives this property an abbreviated name; the exact definitions are described below the table. The
four Property columns show the ranges the property is allowed to take for each of the four conditions enumerated in the Methods section.
The ‘Min Length’ column shows the minimum length of the sequence. The ‘Radius’ column shows the definition of ‘near’ for property (2):
frame
t
1
is near frame
t
2
if
t
1
is within Rad seconds of
t
2
.
Behavior
Property
Parameter ranges
Min Length (s)
Radius (s)
Per-frame (1)
Near frame (2)
Sum (3)
Mean (4)
walking
speed (mm/s)
[9
.
9
,
∞
)
[11
.
9
,
∞
)
[2
.
4
,
∞
)
[10
.
6
,
∞
)
.25
.2
|
̇
θ
|
(
◦
/.05s)
[0
,
31
.
3]
[0
,
19
.
8]
[0
,
5
.
2]
acc (mm/(.05 s)
2
)
[0
,
2
.
0]
[0
,
0
.
8]
[0
,
1
.
1]
|
̇
smth
(
θ
)
|
(
◦
/.05s)
[0
,
12
.
9]
[0
,
10
.
5]
stopped
speed (mm/s)
[0
,
4
.
8]
[0
,
1
.
7]
[0
,
2
.
6]
[0
,
3
.
1]
.35s
.1
|
̇
θ
|
(
◦
/.05s)
[0
,
5
.
7]
[0
,
2
.
3]
[0
,
9
.
7]
[0
,
3
.
1]
acc (mm/(.05s)
2
)
[0
,
1
.
00]
[0
,
0
.
80]
[0
,
0
.
88]
sharp turn
|
̇
θ
|
(
◦
/.05s)
[4
.
0
,
∞
)
[6
.
6
,
∞
)
[27
.
6
,
∞
)
[9
.
3
,
∞
)
.35
.1
speed (mm/s)
[0
,
27
.
2]
[0
,
9
.
2]
[0
,
4
.
5]
[0
,
13
.
6]
acc (mm/(.05s)
2
)
[0
,
1
.
4]
[0
,
1
.
2]
[0
,
1
.
2]
crabwalk
|
φ
−
θ
|
(
◦
/.05s)
[7
.
1
,
∞
)
[26
.
8
,
∞
)
[99
.
7
,
∞
) [18
.
9
,
∞
)
.5
.1
|
̇
θ
|
(
◦
/.05s)
[0
,
21
.
6]
[0
,
3
.
9]
[0
,
151
.
8]
[0
,
7
.
7]
|
̇
smth
(
θ
)
|
(
◦
/.05s)
[0
,
16
.
9]
[0
,
12
.
6]
[0
,
6
.
7]
⊥
tail vel (mm/s)
[1
.
54
,
∞
) [0
.
54
,
∞
)
backing up
fwd vel (mm/s)
(
−∞
,
0]
(
−∞
,
−
0
.
50] (
−∞
,.
08] (
−∞
,
0
.
50]
.15
.15
chasing
min speed (mm/s)
[9
.
9
,
∞
)
[8
.
5
,
∞
)
.55
.55
vel twd (mm/s)
[
−
18
.
6
,
∞
)
[
−
6
.
5
,
∞
)
[
−
5
.
0
,
∞
)
[3
.
0
,
∞
)
max speed (mm/s)
[3
.
3
,
∞
)
[14
.
3
,
∞
)
dist btn (mm)
[0
,
26
.
1]
[0
,
12
.
1]
[0
,
7
.
8]
speed apt (mm/s)
[0
,
41
.
2]
[0
,
37
.
7]
[0
,
14
.
7]
angle btn (
◦
)
[0
,
79
.
0]
[0
,
62
.
9]
[0
,
61
.
7]
θ
diff (
◦
)
[0
,
180
.
5]
[0
,
120
.
3]
[0
,
98
.
4]
φ
(
◦
)
[0
,
173
.
7]
[0
,
108
.
2]
[0
,
40
.
9]
jumping
speed (mm/s)
[48
.
1
,
∞
)
0
.05
touching
dist btn (mm)
[0
,
2
.
5]
.25
.05
ang btn (
◦
)
[0,0]
•
Property key
:
•
speed
: Magnitude of the vector between the fly’s centroids in frames
t
−
1
to
t
, normalized by the frame rate. The integral of speed (column 3) is the
total distance traveled during the segment, measured in mm.
• |
̇
θ
|
: Absolute value of change in orientation from frame
t
−
1
to
t
, normalized by the frame rate. The integral of
|
̇
θ
|
is the absolute value of the total
change in orientation during the segment, measured in degrees.
•
acc
: Absolute change in velocity between frames
t
−
1
and
t
+ 1
, measured in mm per frame
2
= (0
.
05
s
)
2
• |
̇
smth
(
θ
)
|
: Absolute change in the smoothed orientation from frame
t
−
1
to
t
, normalized by frame rate =
0
.
05
s/frame
.
• |
φ
−
θ
|
: Absolute difference between orientation at frame
t
and velocity direction from frames
t
to
t
+ 1
.
• ⊥
tail vel
: Change in position of tail of fly from frames
t
to
t
+ 1
, projected onto the direction orthoganal to the fly’s orientation in frame
t
.
•
fwd vel
: Dot product of orientation vector at frame
t
and vector connecting centroids in frames
t
to
t
+ 1
, normalized by frame rate.
•
min speed
: Minimum speed of either fly involved in the chase.
•
vel twd
: Dot product of direction of chased fly from chasing fly and vector connecting centroids of chasing fly in frames
t
to
t
+ 1
, normalized by
frame.
•
max speed
: Maximum speed of either fly involved in the chase.
•
dist btn
: Distance between the head of the chasing fly and any point on the boundary ellipse of the chased fly
•
speed apt
Magnitude of the difference vector between the velocity vectors of each fly from frames
t
−
1
to
t
, normalized by frame rate.
•
angle btn
Absolute difference between angle from chasing fly to chased fly and chasing fly’s orientation.
•
θ
diff
Absolute difference between flies’ orientations.
•
φ
diff
Absolute difference between flies’ velocity directions.
•
ang btn
: Minimum absolute angle between touching fly’s orientation and direction to any point on other fly.
6
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Table 3
: Position errors for close flies.
Type
Median
Std
Center
0.046 mm
0.090 mm
Orientation
2
.
8
◦
2
.
0
◦
Left Antenna
0.18 mm
0.09 mm
Right Antenna
0.19 mm
0.09 mm
Left Leg
0.21 mm
0.09 mm
Right Leg
0.21 mm
0.09 mm
Scutellum
0.14 mm
0.10 mm
Each row of this table corresponds to a different error measure computed from the high-resolution groundtruth samples in which another
fly was near the labeled fly. The Type column describes the measure, the Median column lists the median error in the estimate and the Std
column lists the standard deviation of the error of the estimate. The first two rows show the error for the center position and oriention esimates,
computed from the bounding box labels. The last five rows show the error estimates for the labeled parts.
7
Nature
Methods:
doi:10.1038/nmeth.1328
Supplementary Note
1
Behavior Definitions
(Details related to Behavior Definitions section of Methods)
In this section, we describe how we chose the quantitative definitions of the behaviors analyzed in the Results section. We also provide a table
with the properties and precise thresholds we used (
Supplementary Table 2
). We provide code for automatically detecting these behaviors
online (
http://dickinson.caltech.edu/ctrax
).
Six of our behaviors (walk, stop, sharp turn, crabwalk, backup, and jump) involve basic locomotor actions. The majority of the time, the flies
either walked at a relatively constant velocity or stopped in place. The next-most common behavior was ‘sharp turn’, in which a fly makes
a large and rapid change in orientation. Sharp turns usually occurred in response to the arena wall or another fly. Most commonly, the fly
pivoted around its hind end, though occasionally it pivoted around its center or front end. These front-end pivots primarily occurred during
‘encounters’ with another fly (see below). Other locomotor classifications included ‘crabwalks’, in which the fly walked with a substantial
sideways component, and ‘backups’, in which the flies’ translational velocity was negative. Both crabwalks and backups were often exhibited
as a result of encounters with other flies. Jumps consisted of rapid translations within the arena, often as a consequence of encounters with
other flies. Our two remaining behaviors (touch and chase) related to social interactions between flies. A touch occurred when the head of
one fly came in contact with another fly. Chases were cases in which one fly (always a male) followed another across the arena.
For 6/8 behaviors (walking, stopping, making sharp turns, jumping, crabwalking, and chasing), it was time-consuming and non-intuitive to
manually examine the parameter space for the precise set of parameters that best fit our intuition about the behavior definitions. Thus, we
chose the parameters by manually labeling a number of trajectories. These labeled trajectories are the training examples used to learn the
parameter intervals. The advantage of this learning-based approach is that to obtain an automatic behavior classifier a behavior expert can
simply specify segmented training trajectories for a behavior of interest. He/she does not need to write any code. We chose the structure of
our classifier so that the training examples are converted into ‘characteristic’ parameter ranges for each behavior. We thus end up with a clear
and concise definition of each behavior that is open to inspection and criticism and makes experiments easily reproducible.
Given the segmented example trajectories, we then manually chose which properties the classifier would depend on, and whether it would be
bounded from above, below, or both for each of the four conditions described in the Methods section. We used a genetic algorithm [1] to find
the ranges that produced segmentations closest to the labeled trajectories. Details of the algorithm are given in Supplementary Note Sec. 1.1.
Our cost function for comparing a proposed segmentation and the groundtruth segmentation is
cost
= (
# spurious detections
) + (
# missed detections
) + 0
.
2(
# mislabeled frames
)
Spurious detections are detected sequences that don’t match any labeled sequence, and missed detections are labeled sequences that don’t
match any detected sequence. We say that two segments match if the proposed segmentation can be made identical to the groundtruth by
relabeling at most 5 frames = .25 seconds at the beginning, end, or middle. We made the definition of matching tolerant to small errors
because the exact frame a behavior begins on is subject to interpretation. The exact details of the scoring function are shown in Algorithm 1.
Algorithm 1
Score an automatically detected behavior labeling
x
1:
T
against the groundtruth
y
1:
T
.
Input: Proposed labeling
x
1:
T
and groundtruth labeling
y
1:
T
.
Output: Measure of the number of errors in the labeling
cost
.
cost
←
0
for
t
1
,
t
2
:
y
t
1
:
t
2
6
=
x
t
1
:
t
2
,
x
t
1
−
1
=
y
t
1
−
1
,
x
t
2
+1
=
y
t
2
+1
do
length
←
t
2
−
t
1
+ 1
if
length >
5
then
cost
←
cost
+ 1
end if
cost
←
cost
+
length/
5
end for
We observed holistically that the segmentations produced by the learned classifiers match our intuitive definitions of the behaviors. In
Supplementary Video 5
, we show examples of behaviors that were labeled manually, and of behaviors that were detected automatically.
1.1
Cross-Entropy Method
To find the behavior definition parameters that minimize the cost criterion described above, we use the cross-entropy method [1]. The cross-
entropy method is initialized with hyperparameters that describe the prior distribution on the parameters (see Supplementary Note Sec. 1.2).
It then generates a set of
N
joint samples of all parameters
1
according to this prior. The cost criterion is evaluated for each sample. The
M
1
We set
N
= 100
samples
8
Nature
Methods:
doi:10.1038/nmeth.1328
‘elite’ samples with the highest scores are chosen.
2
The hyperparameters defining the prior are reset to those which maximize the likelihood
of these elite samples. These steps are iterated until there is no decrease in the highest cost elite sample over a span of
d
iterations
3
.
1.2
Cross-Entropy Initialization
For all parameters, we assume independent Gaussian prior distributions. To set the initial mean and variances of these prior distributions, for
each positive training sequence, the system computes the most aggressive parameter values that would result in the sequence being classified
as positive. The prior distribution mean is initialized to the average of the largest and smallest aggressive parameter values over all training
examples. The prior standard deviation is initialized to one quarter of the difference between the largest and smallest aggressive parameter
values.
2
Flies
(Details related to Flies section of Methods)
Unless noted, all experiments were carried out on adult
Drosophila melanogaster
selected from a laboratory population that was derived from
200 wild-caught isofemale female lines. Flies were maintained on 16:8 light:dark cycle and all experiment were conducted during the evening
activity peak. Approximately 24 hours prior to each experimental trial, we collected between 20 and 50 flies (2 day-old) from culture bottles
and anesthetized them using a cold plate cooled to 2
◦
C. While they were anesthetized we clipped both wings of each fly to approximately
1/2 their normal length so that they could not fly out of the arena. After cold anesthetization the flies are allowed to recover overnight in food
vials and 6 hours prior to experiments were transferred to vials with damp paper for wet starvation.
3
Apparatus
(Details related to Apparatus section of Methods)
The walking arena is depicted in
Figure 1
of the main paper. The temperature of the 6-mm-thick, 245-mm-diameter aluminum platform was
actively controlled by an array of 4 circular thermoelectric modules (TE Technologies) bolted to the underside. The distal heating side of each
module was attached to a temperature exchanger through which water was actively circulated and cooled by a radiator. The thermoelectric
modules were driven to a set point of 25
◦
C in parallel by a PID controller (FerroTec) with feedback from a thermistor also mounted to the
bottom of the platform. The surface temperature varied by less than 1
◦
C, as measured by an infrared non-contact thermometer (OmegaScope).
The surface of the platform was lightly sandblasted and painted with ultra flat black spray paint (Krylon), which is minimally reflective in
the near infrared. The surface was cleaned with detergent and distilled water between trials. The thermal barrier consisted of a 0.2 mm thick
strip of galvanized steel wrapped around the platform, flush to the top surface. A rope heater (OmegaLux) was wrapped tightly around the
steel strip and powered by a variable AC transformer (Staco), which heated the barrier in open loop. The thermal barrier was separated from
the platform by a layer of 2 mm thick strip of neoprene so that the edge of the arena itself was not heated. To provide a visual stimulus,
we placed a paper cylinder around the platform. The paper was printed with a random pattern of squares with 50% filling probability. Each
square subtended 5
◦
of azimuth as seen from the center of the arena. The cylinder was backlit by an array of eight halogen lights, powered
by DC current to avoid visual stimulus flicker, creating a luminance of 450 lux at the center of arena floor. The arena was illuminated with
near-infrared LEDs (Dale Wheat) mounted in the same horizontal plane as the camera above the platform. We used a 1280x1024-pixel,
firewire camera (Basler A622f) equipped with an 8 mm lens (PENTAX). This imaging solution gaves us a resolution of 4 pix/mm, i.e. flies
were 8 pixels long. An infrared pass filter was placed in front of the camera to block stray light from the arena. Images were recorded onto
hard disk at 20 frames per second by a computer using the Motmot Python camera interface package[2]. Each group of experimental flies
was allowed to walk into the arena from a vial through a hole in the floor that was plugged once all the flies had entered (larger groups used
for ground-truthing were simply dumped from a vial into the center of arena and allowed to settle for 1 minute).
While our software was developed in conjunction with this set up, it is adaptable to other arrangements with similar characteristics. In
particular, we have successfully tracked flies in another arena that does not require wing clipping, as shown in
Supplementary Videos 6
and
7
. In this alternate setup, we use a Sigma-Coat coated glass ceiling to keep the flies from walking or flying from the arena. The floor of the
translucent arena is illuminated with IR light to create a backlit image of the flies.
4
Tracking Algorithm Details
(Details related to Tracking Algorithm section of Methods)
In this section, we provide details of our tracking algorithm. First, we describe the preprocessing steps, in which the tracker learns what the
arena image looks like without flies in it (Supplementary Note Sec. 4.1), what shapes a fly can take (Supplementary Note Sec. 4.2), and what
regions of the image flies are in (Supplementary Note Sec. 4.3). Then, we provide details of how the positions of flies in the current video
2
We set
M
=
.
25
N
= 25
3
We use
d
= 5
9
Nature
Methods:
doi:10.1038/nmeth.1328
frame are estimated (Supplementary Note Sec. 4.4). Next, we describe how the observed fly positions are assigned identities by matching
them with the positions predicted from the previous frames (Supplementary Note Sec. 4.5). We then describe how tracks are modified in
hindsight so that track births and deaths are avoided (Supplementary Note Sec. 4.6). Finally, we describe the post-processing step to resolve
the head-tail orientation ambiguity (Supplementary Note Sec. 4.7).
4.1
Background Modeling
(Details related to Detection section of Methods)
We model the background pixel intensities at each location in the image as independent Gaussian distributions. Instead of fitting each
Gaussian using the maximum likelihood estimates (the sample mean and standard deviation) we use the more robust median and median
absolute deviation. That is, the tracker estimates the center of the Gaussian
μ
(
p
)
at a given pixel location
p
as the median pixel intensity of
the sampled frames at that location:
μ
(
p
) =
median
{
t
=0
,
∆
,
2∆
,...,T
)
}
I
t
(
p
)
,
where
I
t
is the video frame at time
t
,
T
is the number of frames in the video, and
∆
is the interval between sampled frames.
4
At each frame,
the tracker also computes the absolute difference between the observed pixel value
I
t
(
p
)
and the median
μ
(
p
)
. The tracker estimates the
standard deviation from the median such absolute difference:
σ
(
p
) =
c
median
{
t
=0
,
∆
,
2∆
,...,T
)
}
|
I
t
(
p
)
−
μ
(
p
)
|
where the constant
c
ensures that the correct fraction of the data is within one standard deviation:
c
= 1
/
(
√
2
erf
−
1
(
.
5))
≈
1
.
4826
.
4.2
Shape Modeling
(Details related to Detection section of Methods)
As discussed in detail in Supplementary Note Sec. 4.4.3, the tracker decides whether to merge, split, or delete connected components based
on the expected image area of a fly. The shape parameters are computed automatically. To compute the female shape parameters, in 50
frames evenly spaced through each all-female video, the tracker detected all connected components and computed the mean and standard
deviation of their areas. The maximum and minimum area bounds are set to the mean plus and minus three standard deviations. The same
computation is performed for the male parameters using the all-male videos. The mixed arena parameters were set as the extrema of the
single-sex parameters (since females are larger than males, the minimum area was set to the minimum area for males and the maximum area
was set to the maximum area for females).
5
Currently, we only use the fly’s area to model its shape, but in future work we plan to use the length of the major and minor axes and the
eccentricities of the ellipses as well.
4.3
Arena Detection
(Details related to Detection section of Methods)
The tracker automatically detects the circular arena floor by fitting a large circle to edges in the background image using the Hough circle
transform [3]. All pixels outside of the arena floor are labeled as background. This step is necessary because the wall of the arena is extremely
reflective. As the flies are restricted from walking on the wall by the heat barrier, most foreground pixels on the wall are due to reflections of
flies on the arena floor. In addition, in future experiments we plan to use dynamic LED panels to affect the flies’ behavior, as in [4]. Thus, we
would like to ignore all foreground detections not on the floor of the arena. In future work, we plan to extend the tracker interface to allow
arena shapes other than the circle.
4.4
Observation Detection
(Details related to Detection section of Methods)
4
We chose
∆ =
b
T/
200
c
in all our experiments.
5
In all our experiments, we used the same fly area parameters. The minimum area was 16.77 px
2
= 1.0488 mm
2
for all-male arenas, 21.445 px
2
= 1.3155 mm
2
for all-female arenas, and 16.78 px
2
= 1.0488
mm
2
for mixed arenas. The maximum area was 63.19 px
2
= 3.9494 mm
2
for all-male arenas, 76.945 px
2
= 4.8091 mm
2
for all-female arenas, and 76.945 px
2
= 4.8091 mm
2
for mixed arenas. The mean area
was the average of these extreme values.
10
Nature
Methods:
doi:10.1038/nmeth.1328
Here, we overview the steps involved in estimating the positions of flies in the current frame. To understand the choices made, it is beneficial
to view observation detection in terms of probabilistic modeling and inference. The general goal in detection is to find the positions of the
flies that best explains the current video frame. More formally, we would like to find the positions of the flies of maximum density given the
current video frame. Let
x
i
= (
x,y,θ,a,b
)
>
be the ellipse position for the
i
th fly detected (the
x
- and
y
-coordinates of the centroid, the
orientation, the semi-major axis length, and the semi-minor axis length),
X
=
{
x
1
,...,
x
N
}
be a set of
N
fly positions, and
I
be the current
video frame. We would like to find
X
that maximizes
p
(
X|
I
)
∝
p
(
X
)
p
(
I
|X
)
.
We assume that the prior density on fly positions is independent for each fly:
p
(
X
) =
N
Y
i
=1
p
(
x
i
)
.
Our prior on the position of a single fly
p
(
x
i
)
is the model of the fly’s area:
p
(
x
i
)
∝
exp[
−|
πa
i
b
i
−
μ
area
|
/σ
area
]
.
To compute the likelihood of the video frame
I
given the fly positions
X
,
p
(
I
|X
)
, we construct the foreground/background labels predicted
for each pixel location given
X
. The label
L
ij
(
X
)
predicted at pixel location
(
i,j
)
is labeled foreground if it is part of a fly, that is, if it is
inside the ellipse for some fly in
X
, and background if it is not. We assume that the intensity of each pixel in the video frame is independent
given its label, thus the likelihood factors as
p
(
I
|X
) =
Y
(
i,j
)
p
(
I
ij
|
L
ij
(
X
))
where
p
(
I
ij
|
background
)
is the Gaussian background model described in Supplementary Note Sec. 4.1 and
p
(
I
ij
]
|
foreground
)
is uniform.
Finding the optimal fly positions is difficult because
p
(
X|
I
)
has many local maxima and is non-smooth, and
X
has a discrete component
(the number of flies). We therefore use a sequence of heuristics to try to optimize the criterion efficiently. First, each pixel is classified as
foreground or background (Supplementary Note Sec. 4.4.1). Each connected component of foreground pixels usually corresponds to exactly
one fly. Thus, initially one ellipse is fit to each of these connected components (Supplementary Note Sec. 4.4.2). To identify connected
components which do not correspond to exactly one fly, the tracker finds ellipse fits that have a low density according to the prior
p
(
x
i
)
.
For these connected components, the number of ellipses fit to the connected component is iteratively increased or decreased to optimize the
criterion (Supplementary Note Sec. 4.4.3).
4.4.1
Background Subtraction
Classifying a pixel location as foreground (belonging to a fly) or background (not belonging to any fly) is referred to as background subtraction
[5]. To do this classification, the tracker thresholds the likelihood of the observed pixel intensities in the current frame given the background
model described in Supplementary Note Sec. 4.1. Pixels with high likelihood are labeled background and pixels with low likelihood are
labeled foreground. As we use a Gaussian model, it is equivalent to threshold the absolute difference from the mean normalized by the
standard deviation:
l
(
p
) =
foreground
|
I
(
p
)
−
μ
(
p
)
|
/σ
(
p
)
> thresh
background
otherwise
The tracking software allows us to specify the special case that flies are always brighter than the background (or vice-versa). In this case, the
signed difference
(
I
(
p
)
−
μ
(
p
))
/σ
(
p
)
is thresholded.
To improve robustness to noise, when thresholding foreground from background pixels, we use a hysteresis approach. The tracker thresholds
the difference at a low threshold
thresh
small
. If no pixel in the connected component has a difference larger than a larger threshold
thresh
large
, thenall labels in this connected component are flipped to background.
6
4.4.2
Ellipse Fitting
To fit an ellipse to a connected component of foreground pixels, the tracker fits a 2-D Gaussian to the locations of the foreground pixels in
the connected component. Given the parameters of the best-fitting Gaussian, the parameters of the ellipse can be computed. Consider all the
pixel locations within an ellipse with semi-major axis length
a
, semi-minor axis length
b
, and orientation
θ
. The sample mean of the pixel
locations will be close to the ellipse center. The sample covariance of the pixel locations within this ellipse will be close to
Σ =
R
>
„
a/
2
0
0
b/
2
«
2
R,
6
In all our experiments, we set
thresh
small
= 10
and
thresh
large
= 20
.
11
Nature
Methods:
doi:10.1038/nmeth.1328
where
R
=
„
cos
θ
sin
θ
−
sin
θ
cos
θ
«
.
Thus, given the covariance matrix
Σ
, the semi-major and semi-minor axis lengths and orientation can be computed by finding the eigen-
decomposition of
Σ =
U
>
DU
. The axis lengths are twice the square root of the eigenvalues, and the orientation can be computed as the
arctangent of two of the entries of
U
:
a
= 2
√
D
11
, b
= 2
√
D
22
, θ
=
atan
(
U
12
,U
11
)
.
Instead of just computing the sample mean and covariance of all the pixels in the connected component, the tracker computes a weighted
mean and covariance, where the weight of a pixel is proportional to its distance from the background model. Let
{
p
1
,...,
p
n
}
be the locations
of the pixels in a given connected component. Let the weight of pixel
i
be the normalized distance of the pixel intensity from the background
image:
w
i
=
|
I
(
p
i
)
−
μ
(
p
i
)
|
/σ
(
p
i
)
.
Then the sample mean and covariance are computed as
Z
=
X
i
w
i
μ
=
1
Z
X
i
w
i
p
i
Σ =
1
Z
X
i
w
i
(
p
i
−
μ
)(
p
i
−
μ
)
>
Using the weighted mean and variance improves our accuracy in two ways. First, it improves robustness to less than perfect threshold
parameters, and allows us to use a single threshold throughout the image, despite lighting differences in different regions. In dimmer parts
of the arena, low threshold is needed. In brighter parts, using this low threshold results in pixels at the edge of the fly being classified as
foreground. These pixels are actually background, but contain some foreground light from blurring effects. While they are classified as
foreground, they have a relatively small weight in the estimate of the ellipse. Second, using the weighted mean and variance improves the
subpixel accuracy of our estimates.
4.4.3
Splitting and Merging Connected Components
If the prior density
p
(
x
i
)
of an ellipse fit
x
i
for a given connected component is small, then this connected component may actually correspond
to multiple flies, part of a fly, or a spurious detection. Recall that the prior density is currently based only on the area of the ellipse. If the
prior is small because the area of the ellipse is too large, then the tracker determines if fitting multiple flies to the connected component will
increase the score
p
(
X|
I
)
. If the prior density is small because the area of the ellipse is too small, then the tracker determines if any of the
following will increase the score: (1) lowering the foreground threshold to increase the area, (2) merging the ellipse with nearby ellipses, or
(3) deleting the ellipse.
Splitting Large Components.
Because of image blur and the legs of the flies, the pixels surrounding the fly will actually be a blur of foreground
and background. Often, these pixels will be classified as foreground but have a higher background likelihood than pixels in the interior of
the fly. We observed that relabeling these pixels as background often resulted in large connected components consisting of multiple flies
being split into multiple connected components. Thus, the tracker first determines whether raising the foreground threshold for the connected
component results in multiple, reasonably sized components.
7
If it does not, then the Expectation-Maximization algorithm is used to fit a
mixture of
K
Gaussians to the weighted pixel locations in the large connected component [6]. This approximately finds a fit of
K
ellipses
(that do not overlap too much) to the weighted pixel locations in the large connected component. More specifically, a mixture of
K
Gaussians
is the distribution
p
(
p
) =
K
X
k
=1
π
k
G
(
p
;
μ
k
,
Σ
k
)
,
where
π
k
is the relative weight of component
k
,
μ
k
and
Σ
k
are the parameters of the
k
th
Gaussian, and G
(
p
;
μ,
Σ)
is the density of the
Gaussian with parameters
μ
and
Σ
at pixel location
p
.
The tracker determines the number of components
K
to split the large component into based on the shape prior
Q
K
k
=1
p
(
x
k
)
of the ellipses fit.
We empirically observed that this prior did not usually have multiple maxima, thus the tracker greedily chooses the number of components.
It iteratively increases the number of components it splits the large component into, stopping when there is a decrease in the prior.
Fixing Small Connected Components
. If a connected component has a small area, the tracker first determines if the ellipse fit is small because
the foreground threshold is too high in that region of the image. Note that, in our setup, the ideal threshold varied with location in the image,
7
The tracker iteratively tries increasing the threshold from
thresh
small
to the maximum normalized distance in the connected component for 20 iterations, stopping if it has successfully split the large
connected component.
12
Nature
Methods:
doi:10.1038/nmeth.1328