nmeth.1328-S1.pdf

ature

methods

High

throughput ethomics in large groups of

Drosophila

Kristin Branson, Alice

Robie, John Bender, Pietro Perona & Michael

Dickinson

Supplementary figures and text:

Supplementary Figure 1

llustration of groundtruthing using hig

resolution video.

Supplementary Figure 2

Cumulative distribution of position errors.

Supplementary Figure 3

Extended behavioral vectors.

Supplementary Figure 4

Summary statistics of behavior.

Supplementary Table 1

Identity errors.

Supplementary Tabl

e 2

Parameters of each of the eight behaviors.

Supplementary Table 3

Position errors for close flies.

Supplementary Note

Note:

Supplementary So

ftware and Supplementary Videos

are

available on the Nature Methods website.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Figure 1

: Illustration of groundtruthing using high-resolution video. (

) Example images of the same flies captured by the

low- and high-resolution cameras. The left image of each pair is from the low-res camera, the right image is from the high-res camera. The

red box in the low-resolution images indicates the location of the adjacent high-resolution image. The last pair of frames shows a male and

female fly near each other. (

) Examples of high resolution labels and corresponding low-resolution tracks. Each row shows the manually

labeled high resolution image and the positions estimated by the tracker for a different example. The first column shows the true labels plotted

on the high-resolution frame. The second column shows the projection of the positions estimated by the tracker on the high-resolution frame.

The third column shows the projection of the true labels on the low-resolution frame. The last column shows the positions estimated by the

tracker on the low-resolution frame. The last row shows touching flies. The red circle and arrow show the bounding box-based center position

and orientation. The colored circles indicate the positions of the five parts.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Figure 2

: Cumulative distribution of position errors. We plot the cumulative distribution of position errors for each of the

methods of estimating error. On the left, we plot error in estimating point locations. On the right, we plot error in estimating the orientation

of the fly. The line labeled ‘Bounding box’ corresponds to errors in estimating the center and orientation of the fly by manually drawing a

bounding box on the high resolution images. The next five lines correspond to errors in estimating the positions of the parts manually labeled

on the high resolution images. The ‘Walk wobble’ line corresponds to estimates of the center and orientation error computed by smoothing

the velocity and orientation during walks. The ‘Stop wobble’ line corresponds to estimates of the center and orientation computed from the

variance in position of a stopped fly. On the

-axes, the colored ticks mark the median values.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Figure 3

: Extended behavioral vectors for each male, female, and

fru

/fru

fly (see

Fig. 3e

of the main text for details). In

the top image, the color indicates the (normalized) fraction of time each fly performs each behavior. In the bottom image, the color indicates

the (normalized) mean duration of bouts of each behavior for each fly. Mean duration cannot be computed for flies that do not perform a

given behavior; this is indicated by black.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Figure 4

: Summary statistics of behavior. For each fly and behavior, we computed (a) the frequency of onsets of the

behavior (

Fig. 3d-f

), (b) fraction of time the fly performed the behavior (

Supplementary Fig. 3

, top), and (c) mean duration of sequences

of the behavior (

Supplementary Fig. 3

, bottom). The mean and standard deviation of these statistics over the entire fly population is shown

in the top row (black). In the bottom row (colored), we show the normalized (z-scored) mean and standard error for each of the five fly and

trial types (male/wild type/single-sex, ..., female/wild type/mixed-sex,

fru

/fru

. These plots show that many of these behavioral statistics are

significantly different for different fly types.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Table 1

: Identity errors. Number of each type of identity error for each video evaluated. Each row corresponds to a different

video. The first two columns show the number of female and male flies in the video. The third column shows the number of frames in the

video. The fourth column shows the number of times a pair of flies was close (note that this number can be greater than the number of frames

in the video, as multiple pairs of flies may be close in a single frame). The next five columns show the number of errors of each type:

Swap

identity,

Lost

fly,

Spurious

track,

Merged

fly, and

Split

fly.

Fly Sex

N. Frames

N. Errors

Error Freq

♀

∗

♂

Total

Swap

Lost

Spurious

Merged

Split

Swaps/Close

Errors/Fly/Frame

6364

6118

1.5e-2

1.9e-05

6221

261

9.4e-06

6206

358

8.5e-06

6123

2135

7.3e-06

6587

4545

2.2e-4

7.1e-06

6106

268

6133

316

6169

1237

8.1e-06

6118

1029

8.8e-06

8696

9765

4.1e-4

1.2e-05

6088

4855

1.4e-3

3.0e-05

6268

499

7859

753

6121

2050

4.9e-3

1.9e-05

6104

1460

8.9e-06

6220

9843

1.0e-4

1.4e-05

6536

9902

1.0e-4

9.6e-06

Column Key

♀

: number of female flies.

♂

: number of male flies.

Tot

al number of frames in the video.

Clo

: Number of frames, pairs of flies in which

the flies are

clo

se to each other.

Swa

p identity errors.

Los

t fly errors.

Spu

rious track errors.

Mer

ged fly errors.

Spl

it track error occurs.

Swa/Clo

: Ratio of number of identity

swa

ps to frames in which the flies are

clo

se.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Table 2

: Parameters for each of the 8 behaviors. Each row for a behavior corresponds to a different property that is computed

at every time instant. The ‘Property’ column gives this property an abbreviated name; the exact definitions are described below the table. The

four Property columns show the ranges the property is allowed to take for each of the four conditions enumerated in the Methods section.

The ‘Min Length’ column shows the minimum length of the sequence. The ‘Radius’ column shows the definition of ‘near’ for property (2):

frame

is near frame

is within Rad seconds of

Behavior

Property

Parameter ranges

Min Length (s)

Radius (s)

Per-frame (1)

Near frame (2)

Sum (3)

Mean (4)

walking

speed (mm/s)

∞

)

[11

∞

)

∞

)

[10

∞

)

.25

(

◦

/.05s)

acc (mm/(.05 s)

)

smth

(

)

(

◦

/.05s)

stopped

speed (mm/s)

.35s

(

◦

/.05s)

acc (mm/(.05s)

)

00]

80]

88]

sharp turn

(

◦

/.05s)

∞

)

∞

)

[27

∞

)

∞

)

.35

speed (mm/s)

acc (mm/(.05s)

)

crabwalk

−

(

◦

/.05s)

∞

)

[26

∞

)

[99

∞

) [18

∞

)

(

◦

/.05s)

151

smth

(

)

(

◦

/.05s)

⊥

tail vel (mm/s)

∞

) [0

∞

)

backing up

fwd vel (mm/s)

(

−∞

(

−∞

−

50] (

−∞

08] (

−∞

50]

.15

chasing

min speed (mm/s)

∞

)

∞

)

.55

vel twd (mm/s)

[

−

∞

)

[

−

∞

)

[

−

∞

)

∞

)

max speed (mm/s)

∞

)

[14

∞

)

dist btn (mm)

speed apt (mm/s)

angle btn (

◦

)

diff (

◦

)

180

120

(

◦

)

173

108

jumping

speed (mm/s)

[48

∞

)

.05

touching

dist btn (mm)

.25

.05

ang btn (

◦

)

[0,0]

•

Property key

•

speed

: Magnitude of the vector between the fly’s centroids in frames

−

, normalized by the frame rate. The integral of speed (column 3) is the

total distance traveled during the segment, measured in mm.

• |

: Absolute value of change in orientation from frame

−

, normalized by the frame rate. The integral of

is the absolute value of the total

change in orientation during the segment, measured in degrees.

•

acc

: Absolute change in velocity between frames

−

and

+ 1

, measured in mm per frame

= (0

)

• |

smth

(

)

: Absolute change in the smoothed orientation from frame

−

, normalized by frame rate =

s/frame

• |

−

: Absolute difference between orientation at frame

and velocity direction from frames

+ 1

• ⊥

tail vel

: Change in position of tail of fly from frames

+ 1

, projected onto the direction orthoganal to the fly’s orientation in frame

•

fwd vel

: Dot product of orientation vector at frame

and vector connecting centroids in frames

+ 1

, normalized by frame rate.

•

min speed

: Minimum speed of either fly involved in the chase.

•

vel twd

: Dot product of direction of chased fly from chasing fly and vector connecting centroids of chasing fly in frames

+ 1

, normalized by

frame.

•

max speed

: Maximum speed of either fly involved in the chase.

•

dist btn

: Distance between the head of the chasing fly and any point on the boundary ellipse of the chased fly

•

speed apt

Magnitude of the difference vector between the velocity vectors of each fly from frames

−

, normalized by frame rate.

•

angle btn

Absolute difference between angle from chasing fly to chased fly and chasing fly’s orientation.

•

diff

Absolute difference between flies’ orientations.

•

diff

Absolute difference between flies’ velocity directions.

•

ang btn

: Minimum absolute angle between touching fly’s orientation and direction to any point on other fly.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Table 3

: Position errors for close flies.

Type

Median

Std

Center

0.046 mm

0.090 mm

Orientation

◦

Left Antenna

0.18 mm

0.09 mm

Right Antenna

0.19 mm

0.09 mm

Left Leg

0.21 mm

0.09 mm

Right Leg

0.21 mm

0.09 mm

Scutellum

0.14 mm

0.10 mm

Each row of this table corresponds to a different error measure computed from the high-resolution groundtruth samples in which another

fly was near the labeled fly. The Type column describes the measure, the Median column lists the median error in the estimate and the Std

column lists the standard deviation of the error of the estimate. The first two rows show the error for the center position and oriention esimates,

computed from the bounding box labels. The last five rows show the error estimates for the labeled parts.

Nature

Methods:

doi:10.1038/nmeth.1328

Supplementary Note

Behavior Definitions

(Details related to Behavior Definitions section of Methods)

In this section, we describe how we chose the quantitative definitions of the behaviors analyzed in the Results section. We also provide a table

with the properties and precise thresholds we used (

Supplementary Table 2

). We provide code for automatically detecting these behaviors

online (

http://dickinson.caltech.edu/ctrax

Six of our behaviors (walk, stop, sharp turn, crabwalk, backup, and jump) involve basic locomotor actions. The majority of the time, the flies

either walked at a relatively constant velocity or stopped in place. The next-most common behavior was ‘sharp turn’, in which a fly makes

a large and rapid change in orientation. Sharp turns usually occurred in response to the arena wall or another fly. Most commonly, the fly

pivoted around its hind end, though occasionally it pivoted around its center or front end. These front-end pivots primarily occurred during

‘encounters’ with another fly (see below). Other locomotor classifications included ‘crabwalks’, in which the fly walked with a substantial

sideways component, and ‘backups’, in which the flies’ translational velocity was negative. Both crabwalks and backups were often exhibited

as a result of encounters with other flies. Jumps consisted of rapid translations within the arena, often as a consequence of encounters with

other flies. Our two remaining behaviors (touch and chase) related to social interactions between flies. A touch occurred when the head of

one fly came in contact with another fly. Chases were cases in which one fly (always a male) followed another across the arena.

For 6/8 behaviors (walking, stopping, making sharp turns, jumping, crabwalking, and chasing), it was time-consuming and non-intuitive to

manually examine the parameter space for the precise set of parameters that best fit our intuition about the behavior definitions. Thus, we

chose the parameters by manually labeling a number of trajectories. These labeled trajectories are the training examples used to learn the

parameter intervals. The advantage of this learning-based approach is that to obtain an automatic behavior classifier a behavior expert can

simply specify segmented training trajectories for a behavior of interest. He/she does not need to write any code. We chose the structure of

our classifier so that the training examples are converted into ‘characteristic’ parameter ranges for each behavior. We thus end up with a clear

and concise definition of each behavior that is open to inspection and criticism and makes experiments easily reproducible.

Given the segmented example trajectories, we then manually chose which properties the classifier would depend on, and whether it would be

bounded from above, below, or both for each of the four conditions described in the Methods section. We used a genetic algorithm [1] to find

the ranges that produced segmentations closest to the labeled trajectories. Details of the algorithm are given in Supplementary Note Sec. 1.1.

Our cost function for comparing a proposed segmentation and the groundtruth segmentation is

cost

= (

# spurious detections

) + (

# missed detections

) + 0

# mislabeled frames

)

Spurious detections are detected sequences that don’t match any labeled sequence, and missed detections are labeled sequences that don’t

match any detected sequence. We say that two segments match if the proposed segmentation can be made identical to the groundtruth by

relabeling at most 5 frames = .25 seconds at the beginning, end, or middle. We made the definition of matching tolerant to small errors

because the exact frame a behavior begins on is subject to interpretation. The exact details of the scoring function are shown in Algorithm 1.

Algorithm 1

Score an automatically detected behavior labeling

against the groundtruth

Input: Proposed labeling

and groundtruth labeling

Output: Measure of the number of errors in the labeling

cost

←

for

−

length

←

−

+ 1

length >

then

cost

←

cost

+ 1

end if

cost

←

cost

length/

end for

We observed holistically that the segmentations produced by the learned classifiers match our intuitive definitions of the behaviors. In

Supplementary Video 5

, we show examples of behaviors that were labeled manually, and of behaviors that were detected automatically.

1.1

Cross-Entropy Method

To find the behavior definition parameters that minimize the cost criterion described above, we use the cross-entropy method [1]. The cross-

entropy method is initialized with hyperparameters that describe the prior distribution on the parameters (see Supplementary Note Sec. 1.2).

It then generates a set of

joint samples of all parameters

according to this prior. The cost criterion is evaluated for each sample. The

We set

= 100

samples

Nature

Methods:

doi:10.1038/nmeth.1328

‘elite’ samples with the highest scores are chosen.

The hyperparameters defining the prior are reset to those which maximize the likelihood

of these elite samples. These steps are iterated until there is no decrease in the highest cost elite sample over a span of

iterations

1.2

Cross-Entropy Initialization

For all parameters, we assume independent Gaussian prior distributions. To set the initial mean and variances of these prior distributions, for

each positive training sequence, the system computes the most aggressive parameter values that would result in the sequence being classified

as positive. The prior distribution mean is initialized to the average of the largest and smallest aggressive parameter values over all training

examples. The prior standard deviation is initialized to one quarter of the difference between the largest and smallest aggressive parameter

values.

Flies

(Details related to Flies section of Methods)

Unless noted, all experiments were carried out on adult

Drosophila melanogaster

selected from a laboratory population that was derived from

200 wild-caught isofemale female lines. Flies were maintained on 16:8 light:dark cycle and all experiment were conducted during the evening

activity peak. Approximately 24 hours prior to each experimental trial, we collected between 20 and 50 flies (2 day-old) from culture bottles

and anesthetized them using a cold plate cooled to 2

◦

C. While they were anesthetized we clipped both wings of each fly to approximately

1/2 their normal length so that they could not fly out of the arena. After cold anesthetization the flies are allowed to recover overnight in food

vials and 6 hours prior to experiments were transferred to vials with damp paper for wet starvation.

Apparatus

(Details related to Apparatus section of Methods)

The walking arena is depicted in

Figure 1

of the main paper. The temperature of the 6-mm-thick, 245-mm-diameter aluminum platform was

actively controlled by an array of 4 circular thermoelectric modules (TE Technologies) bolted to the underside. The distal heating side of each

module was attached to a temperature exchanger through which water was actively circulated and cooled by a radiator. The thermoelectric

modules were driven to a set point of 25

◦

C in parallel by a PID controller (FerroTec) with feedback from a thermistor also mounted to the

bottom of the platform. The surface temperature varied by less than 1

◦

C, as measured by an infrared non-contact thermometer (OmegaScope).

The surface of the platform was lightly sandblasted and painted with ultra flat black spray paint (Krylon), which is minimally reflective in

the near infrared. The surface was cleaned with detergent and distilled water between trials. The thermal barrier consisted of a 0.2 mm thick

strip of galvanized steel wrapped around the platform, flush to the top surface. A rope heater (OmegaLux) was wrapped tightly around the

steel strip and powered by a variable AC transformer (Staco), which heated the barrier in open loop. The thermal barrier was separated from

the platform by a layer of 2 mm thick strip of neoprene so that the edge of the arena itself was not heated. To provide a visual stimulus,

we placed a paper cylinder around the platform. The paper was printed with a random pattern of squares with 50% filling probability. Each

square subtended 5

◦

of azimuth as seen from the center of the arena. The cylinder was backlit by an array of eight halogen lights, powered

by DC current to avoid visual stimulus flicker, creating a luminance of 450 lux at the center of arena floor. The arena was illuminated with

near-infrared LEDs (Dale Wheat) mounted in the same horizontal plane as the camera above the platform. We used a 1280x1024-pixel,

firewire camera (Basler A622f) equipped with an 8 mm lens (PENTAX). This imaging solution gaves us a resolution of 4 pix/mm, i.e. flies

were 8 pixels long. An infrared pass filter was placed in front of the camera to block stray light from the arena. Images were recorded onto

hard disk at 20 frames per second by a computer using the Motmot Python camera interface package[2]. Each group of experimental flies

was allowed to walk into the arena from a vial through a hole in the floor that was plugged once all the flies had entered (larger groups used

for ground-truthing were simply dumped from a vial into the center of arena and allowed to settle for 1 minute).

While our software was developed in conjunction with this set up, it is adaptable to other arrangements with similar characteristics. In

particular, we have successfully tracked flies in another arena that does not require wing clipping, as shown in

Supplementary Videos 6

and

. In this alternate setup, we use a Sigma-Coat coated glass ceiling to keep the flies from walking or flying from the arena. The floor of the

translucent arena is illuminated with IR light to create a backlit image of the flies.

Tracking Algorithm Details

(Details related to Tracking Algorithm section of Methods)

In this section, we provide details of our tracking algorithm. First, we describe the preprocessing steps, in which the tracker learns what the

arena image looks like without flies in it (Supplementary Note Sec. 4.1), what shapes a fly can take (Supplementary Note Sec. 4.2), and what

regions of the image flies are in (Supplementary Note Sec. 4.3). Then, we provide details of how the positions of flies in the current video

We set

= 25

We use

= 5

Nature

Methods:

doi:10.1038/nmeth.1328

frame are estimated (Supplementary Note Sec. 4.4). Next, we describe how the observed fly positions are assigned identities by matching

them with the positions predicted from the previous frames (Supplementary Note Sec. 4.5). We then describe how tracks are modified in

hindsight so that track births and deaths are avoided (Supplementary Note Sec. 4.6). Finally, we describe the post-processing step to resolve

the head-tail orientation ambiguity (Supplementary Note Sec. 4.7).

4.1

Background Modeling

(Details related to Detection section of Methods)

We model the background pixel intensities at each location in the image as independent Gaussian distributions. Instead of fitting each

Gaussian using the maximum likelihood estimates (the sample mean and standard deviation) we use the more robust median and median

absolute deviation. That is, the tracker estimates the center of the Gaussian

(

)

at a given pixel location

as the median pixel intensity of

the sampled frames at that location:

(

) =

median

{

∆

2∆

,...,T

)

}

(

)

where

is the video frame at time

is the number of frames in the video, and

∆

is the interval between sampled frames.

At each frame,

the tracker also computes the absolute difference between the observed pixel value

(

)

and the median

(

)

. The tracker estimates the

standard deviation from the median such absolute difference:

(

) =

median

{

∆

2∆

,...,T

)

}

(

)

−

(

)

where the constant

ensures that the correct fraction of the data is within one standard deviation:

= 1

(

√

erf

−

(

5))

≈

4826

4.2

Shape Modeling

(Details related to Detection section of Methods)

As discussed in detail in Supplementary Note Sec. 4.4.3, the tracker decides whether to merge, split, or delete connected components based

on the expected image area of a fly. The shape parameters are computed automatically. To compute the female shape parameters, in 50

frames evenly spaced through each all-female video, the tracker detected all connected components and computed the mean and standard

deviation of their areas. The maximum and minimum area bounds are set to the mean plus and minus three standard deviations. The same

computation is performed for the male parameters using the all-male videos. The mixed arena parameters were set as the extrema of the

single-sex parameters (since females are larger than males, the minimum area was set to the minimum area for males and the maximum area

was set to the maximum area for females).

Currently, we only use the fly’s area to model its shape, but in future work we plan to use the length of the major and minor axes and the

eccentricities of the ellipses as well.

4.3

Arena Detection

(Details related to Detection section of Methods)

The tracker automatically detects the circular arena floor by fitting a large circle to edges in the background image using the Hough circle

transform [3]. All pixels outside of the arena floor are labeled as background. This step is necessary because the wall of the arena is extremely

reflective. As the flies are restricted from walking on the wall by the heat barrier, most foreground pixels on the wall are due to reflections of

flies on the arena floor. In addition, in future experiments we plan to use dynamic LED panels to affect the flies’ behavior, as in [4]. Thus, we

would like to ignore all foreground detections not on the floor of the arena. In future work, we plan to extend the tracker interface to allow

arena shapes other than the circle.

4.4

Observation Detection

(Details related to Detection section of Methods)

We chose

∆ =

200

in all our experiments.

In all our experiments, we used the same fly area parameters. The minimum area was 16.77 px

= 1.0488 mm

for all-male arenas, 21.445 px

= 1.3155 mm

for all-female arenas, and 16.78 px

= 1.0488

for mixed arenas. The maximum area was 63.19 px

= 3.9494 mm

for all-male arenas, 76.945 px

= 4.8091 mm

for all-female arenas, and 76.945 px

= 4.8091 mm

for mixed arenas. The mean area

was the average of these extreme values.

Nature

Methods:

doi:10.1038/nmeth.1328

Here, we overview the steps involved in estimating the positions of flies in the current frame. To understand the choices made, it is beneficial

to view observation detection in terms of probabilistic modeling and inference. The general goal in detection is to find the positions of the

flies that best explains the current video frame. More formally, we would like to find the positions of the flies of maximum density given the

current video frame. Let

= (

x,y,θ,a,b

)

be the ellipse position for the

th fly detected (the

- and

-coordinates of the centroid, the

orientation, the semi-major axis length, and the semi-minor axis length),

{

,...,

}

be a set of

fly positions, and

be the current

video frame. We would like to find

that maximizes

(

)

∝

(

)

(

)

We assume that the prior density on fly positions is independent for each fly:

(

) =

(

)

Our prior on the position of a single fly

(

)

is the model of the fly’s area:

(

)

∝

exp[

−|

πa

−

area

/σ

area

]

To compute the likelihood of the video frame

given the fly positions

(

)

, we construct the foreground/background labels predicted

for each pixel location given

. The label

(

)

predicted at pixel location

(

i,j

)

is labeled foreground if it is part of a fly, that is, if it is

inside the ellipse for some fly in

, and background if it is not. We assume that the intensity of each pixel in the video frame is independent

given its label, thus the likelihood factors as

(

) =

(

i,j

)

(

))

where

(

background

)

is the Gaussian background model described in Supplementary Note Sec. 4.1 and

(

]

foreground

)

is uniform.

Finding the optimal fly positions is difficult because

(

)

has many local maxima and is non-smooth, and

has a discrete component

(the number of flies). We therefore use a sequence of heuristics to try to optimize the criterion efficiently. First, each pixel is classified as

foreground or background (Supplementary Note Sec. 4.4.1). Each connected component of foreground pixels usually corresponds to exactly

one fly. Thus, initially one ellipse is fit to each of these connected components (Supplementary Note Sec. 4.4.2). To identify connected

components which do not correspond to exactly one fly, the tracker finds ellipse fits that have a low density according to the prior

(

)

For these connected components, the number of ellipses fit to the connected component is iteratively increased or decreased to optimize the

criterion (Supplementary Note Sec. 4.4.3).

4.4.1

Background Subtraction

Classifying a pixel location as foreground (belonging to a fly) or background (not belonging to any fly) is referred to as background subtraction

[5]. To do this classification, the tracker thresholds the likelihood of the observed pixel intensities in the current frame given the background

model described in Supplementary Note Sec. 4.1. Pixels with high likelihood are labeled background and pixels with low likelihood are

labeled foreground. As we use a Gaussian model, it is equivalent to threshold the absolute difference from the mean normalized by the

standard deviation:

(

) =



foreground

(

)

−

(

)

/σ

(

)

> thresh

background

otherwise

The tracking software allows us to specify the special case that flies are always brighter than the background (or vice-versa). In this case, the

signed difference

(

)

−

(

))

/σ

(

)

is thresholded.

To improve robustness to noise, when thresholding foreground from background pixels, we use a hysteresis approach. The tracker thresholds

the difference at a low threshold

thresh

small

. If no pixel in the connected component has a difference larger than a larger threshold

thresh

large

, thenall labels in this connected component are flipped to background.

4.4.2

Ellipse Fitting

To fit an ellipse to a connected component of foreground pixels, the tracker fits a 2-D Gaussian to the locations of the foreground pixels in

the connected component. Given the parameters of the best-fitting Gaussian, the parameters of the ellipse can be computed. Consider all the

pixel locations within an ellipse with semi-major axis length

, semi-minor axis length

, and orientation

. The sample mean of the pixel

locations will be close to the ellipse center. The sample covariance of the pixel locations within this ellipse will be close to

Σ =

„

In all our experiments, we set

thresh

small

= 10

and

thresh

large

= 20

Nature

Methods:

doi:10.1038/nmeth.1328

where

„

cos

sin

−

sin

cos

Thus, given the covariance matrix

, the semi-major and semi-minor axis lengths and orientation can be computed by finding the eigen-

decomposition of

Σ =

. The axis lengths are twice the square root of the eigenvalues, and the orientation can be computed as the

arctangent of two of the entries of

= 2

√

, b

= 2

√

, θ

atan

(

)

Instead of just computing the sample mean and covariance of all the pixels in the connected component, the tracker computes a weighted

mean and covariance, where the weight of a pixel is proportional to its distance from the background model. Let

{

,...,

}

be the locations

of the pixels in a given connected component. Let the weight of pixel

be the normalized distance of the pixel intensity from the background

image:

(

)

−

(

)

/σ

(

)

Then the sample mean and covariance are computed as

Σ =

(

−

)(

−

)

Using the weighted mean and variance improves our accuracy in two ways. First, it improves robustness to less than perfect threshold

parameters, and allows us to use a single threshold throughout the image, despite lighting differences in different regions. In dimmer parts

of the arena, low threshold is needed. In brighter parts, using this low threshold results in pixels at the edge of the fly being classified as

foreground. These pixels are actually background, but contain some foreground light from blurring effects. While they are classified as

foreground, they have a relatively small weight in the estimate of the ellipse. Second, using the weighted mean and variance improves the

subpixel accuracy of our estimates.

4.4.3

Splitting and Merging Connected Components

If the prior density

(

)

of an ellipse fit

for a given connected component is small, then this connected component may actually correspond

to multiple flies, part of a fly, or a spurious detection. Recall that the prior density is currently based only on the area of the ellipse. If the

prior is small because the area of the ellipse is too large, then the tracker determines if fitting multiple flies to the connected component will

increase the score

(

)

. If the prior density is small because the area of the ellipse is too small, then the tracker determines if any of the

following will increase the score: (1) lowering the foreground threshold to increase the area, (2) merging the ellipse with nearby ellipses, or

(3) deleting the ellipse.

Splitting Large Components.

Because of image blur and the legs of the flies, the pixels surrounding the fly will actually be a blur of foreground

and background. Often, these pixels will be classified as foreground but have a higher background likelihood than pixels in the interior of

the fly. We observed that relabeling these pixels as background often resulted in large connected components consisting of multiple flies

being split into multiple connected components. Thus, the tracker first determines whether raising the foreground threshold for the connected

component results in multiple, reasonably sized components.

If it does not, then the Expectation-Maximization algorithm is used to fit a

mixture of

Gaussians to the weighted pixel locations in the large connected component [6]. This approximately finds a fit of

ellipses

(that do not overlap too much) to the weighted pixel locations in the large connected component. More specifically, a mixture of

Gaussians

is the distribution

(

) =

(

;

)

where

is the relative weight of component

and

are the parameters of the

Gaussian, and G

(

;

μ,

Σ)

is the density of the

Gaussian with parameters

and

at pixel location

The tracker determines the number of components

to split the large component into based on the shape prior

(

)

of the ellipses fit.

We empirically observed that this prior did not usually have multiple maxima, thus the tracker greedily chooses the number of components.

It iteratively increases the number of components it splits the large component into, stopping when there is a decrease in the prior.

Fixing Small Connected Components

. If a connected component has a small area, the tracker first determines if the ellipse fit is small because

the foreground threshold is too high in that region of the image. Note that, in our setup, the ideal threshold varied with location in the image,

The tracker iteratively tries increasing the threshold from

thresh

small

to the maximum normalized distance in the connected component for 20 iterations, stopping if it has successfully split the large

connected component.

Nature

Methods:

doi:10.1038/nmeth.1328