of 29
1
Whole
-
cell segmentation of tissue images with human
-
level performance using large
-
scale data
annotation and deep learning
Noah F. Greenwald
1,2
*, Geneva Miller
3
*, Erick Moen
3
, Alex Kong
2
, Adam Kagel
2
, Christine Camacho
Fullaway
2
, Brianna
J.
Mc
I
ntosh
1
, Ke
Leow
1,2
, Morgan Sarah Schwartz
3
,
Thomas Dougherty
3
,
Cole
Pavelchek
3,
4
, Sunny Cui
,5,6
, Isabella Camplisson
3
, Omer Bar
-
Tal
7
, Jaiveer Sing
h
2
,
Mara Fong
2
,
Gautam
Chaudhry
2
, Zion Abraham
2
, Jackson Moseley
2
,
Shiri Warshawsky
2
, Erin Soon
2
, Shirley
Green
baum
2
, Tyl
er Risom
2
,
Travis Hollmann
8
, Leeat Keren
7
, Will Graf
3
, Michael Angelo
2†
, David Van
Valen
3†
1. Cancer Biology Program, Stanford University
2. Department of Pathology, Stanford University
3. Division of Biology and Bioengineering, California
Institute of Technology
4. Present address: Washington University in St. Louis Medical School
5. Department of Electrical Engineering, California Institute of Technology
6. Present address: Department of Computer Science, Princeton University
7
. Department
of Molecular Cell Biology, Weizmann Institute of Science
8
. Department of Pathology, Memorial Sloan Kettering Cancer Center
* These authors contributed equally to this work
† These authors jointly supervised this work
Abstract
Understanding the spatial o
rganization of tissues is of critical importance for both basic and translational
research. While recent advances in tissue imaging are opening an exciting new window into the biology of
human tissues, interpreting the data that they create is a significan
t computational challenge. Cell
segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue
imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield
useful
results. Here, we addressed the problem of cell segmentation in tissue imaging data through large
-
scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million
paired whole
-
cell and nuclear annotations for tissu
e images from nine organs and six imaging platforms.
We created Mesmer, a deep learning
-
enabled segmentation algorithm trained on TissueNet that performs
nuclear and whole
-
cell segmentation in tissue imaging data. We demonstrated that Mesmer has better spe
ed
and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms
in TissueNet, and achieves human
-
level performance for whole
-
cell segmentation. Mesmer enabled the
automated extraction of key cellular features,
such as subcellular localization of protein signal, which was
challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell
lineage information present in highly multiplexed datasets. We used this enhanced version to
quantify cell
morphology changes during human gestation. All
underlying
code
and models are released with permissive
licenses as a community resource.
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted March 2, 2021.
;
https://doi.org/10.1101/2021.03.01.431313
doi:
bioRxiv preprint
2
Introduction
Understanding the structural and functional relationships present within tissues is a
challenge
that is at the
forefront of basic and translational research. R
ecent advances in multiplexed imaging have dramatically
expanded the number of transcripts and pr
oteins that can be quantified in a single tissue section while also
improving the throughput of these platforms
1
12
. These technological improvements have opened up
exciting new frontiers for large
-
scale analysis of human tissue samples. Ambitious collaborative effor
ts
such as the Human Tumor Atlas Network
13
, the Human Bio
Molecular Atlas Program
14
, and the Human
Cell Atlas
15
are now using novel imaging techniques to comprehensively cha
racterize the location, function,
and phenotype of the cells in the human body. By generating high
-
quality, open
-
source datasets
characterizing the full breadth of human tissues, these datasets will be as transformative as the Human
Genome Project in unlea
shing the next era of biological discovery.
Despite this immense promise, the tools to facilitate the analysis and interpretation of these datasets at scale
do not yet exist. The clearest example of this shortcoming is the lack of a
generalized algorithm for locating
single cells in images. Unlike flow cytometry or single
-
cell RNA sequencing methods
,
in which individual
cells are dissociated and physically separated from one another prior to being analyzed, tissue imaging is
performed
with intact specimens. Thus, in order to extract single
-
cell information from images, each pixel
must be assigned to a cell after image acquisition in a process known as cell segmentation. Since the features
extracted through this process are the basis fo
r downstream analyses like cell
-
type identification and tissue
neighborhood analyses
16
, inaccuracies at this stage have far
-
reaching consequences for interpreting image
data.
Achieving accurate and automated cell segmentation for tissues remains a substantial challenge. Depending
on the tissue, cells can be rar
e and dispersed within a large bed of extracellular matrix or densely packed
such that contrast between adjacent neighbors is limited. Cell size in non
-
neuronal mammalian tissues can
vary over two orders of magnitude
17
, while cell morphology can vary widely from small mature
lymphocytes with little discernible cy
toplasm, to elongated spindle
-
shaped fibroblasts, to large
multinucleated osteoclasts and megakaryocytes
18
. Achieving accurate cell segmentation has been a long
-
standing goal of the biological image analysis community, and a diverse array of software tools have been
developed to meet this challenge
19
24
. While these efforts have been crucial for advancing our understanding
of biology across a wide range of domains, they fall short for tissue imaging data. A common shortcoming
has been
the
need to perform manu
al, image
-
specific adjustments to produce useful segmentations. This
lack of full automation poses a prohibitive barrier given the increasing scale of tissue imaging experiments.
Recent advances in deep learning have transformed the field of
computer vision, and are increasingly being
used for a variety of tasks in biological image analysis, including cell segmentation
25
31
. These methods
differ from conventional algorithms in that they learn how to perform tasks from annotated data. While the
accuracy of these new, data
-
driven algorithms can render difficult analyses
routine, using
them in practice
can be challenging
: high accuracy requires a substantial amount of annotated data. Generating ground
-
truth
data for cell segmentation is time intensive due to the need to generate pixel
-
level labels; as a result, existing
datasets are of modest size (10
4
-
10
5
annotations). M
oreover, most public datasets
26,27,32
38
annotate the
location of cell nuclei rather than the whole cell. Deploying pre
-
trained models to the life science
community is also difficult, and has been the focus of a number of recent work
s
39
42
. Despite deep learning’s
potential, these challenges have caused whole
-
cell segmentat
ion in tissue imaging data to remain an open
problem.
Here, we sought to close these gaps by creating an automated, simple, and scalable algorithm for nuclear
and whole
-
cell segmentation that performs accurately across a diverse range of tissue types and i
maging
platforms. Developing this algorithm required two innovations: (1) a scalable approach for generating large
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted March 2, 2021.
;
https://doi.org/10.1101/2021.03.01.431313
doi:
bioRxiv preprint
3
volumes of pixel
-
level training data in tissue images and (2) an integrated deep learning pipeline that utilizes
these data to achieve human
-
level performance. To address the first challenge, we developed a
crowdsourced, human
-
in
-
the
-
loop approach for segmenting cells in tissue images where humans and
algorithms work in tandem to produce accurate annotations (
Figure 1a
). We used this pipeline t
o create
TissueNet, a comprehensive segmentation dataset of >1 million paired whole
-
cell and nuclear annotations.
These curated annotations were derived from images of nine different organs acquired from six distinct
imaging platforms. TissueNet is the lar
gest cell
-
segmentation dataset assembled to date, containing twice
as many nuclear and 16 times as many whole
-
cell labels as all previously published datasets combined. To
address the second challenge, we developed Mesmer, a deep learning pipeline for scal
able, user
-
friendly
segmentation of imaging data. Mesmer was trained on TissueNet and is the first algorithm to demonstrate
human
-
level performance on cell segmentation
.
To enable broad use by the scientific community, we
harnessed DeepCell, an open
-
source collection of software libraries, to create cloud
-
native software for
using Mesmer, including plugins for
ImageJ and QuPath. We have made all code, data, and trained mod
els
available under a permissive license as a community resource, setting the stage for application of these
modern, data
-
driven methods
to a broad range of fundamental and translational research challenges.
A human
-
in
-
the
-
loop approach
drives
scalable co
nstruction of TissueNet
Existing annotated datasets for cell segmentation are limited in scope and scale (
Figure 1
b
)
26,27,32
38
. This
limitation
is
largely
due to the linear, time
-
inte
nsive
approach used to construct them, which req
uires
the
border of every cell in an
image to be manually demarcated
. This approach scales poorly, as the time
required to label each image remains constant
throughout the annotation effort.
We therefore implemented
a three
-
phase approach to create TissueN
et. In the first phase, expert human annotators
outlined the border
of each cell in 80 images.
The labeled images
were
then used to train a preliminary model (
Figure 1
a
, left;
Methods
). Once the preliminary model reache
d
a sufficient level of accuracy, cor
recting mistakes require
d
less time than labeling from scratch. Although the exact point at which
this transition occurs depends on
model quality and training data diversity, we found that 10,000 cells was a reasonable approximation.
T
he process then
moved
to the second phase
(Figure 1
a
, middle)
, where
images
were
first passed through
the model to generate predicted annotations. These predictions were sent to crowdsourced annotators to
correct errors.
The
c
orrected annotations
then unde
rwent final inspection by an expert prior to being added
to the
training dataset. When enough new data
were
compiled, a new model
was
trained and phase two was
repeat
ed
. Each
iteration
yielded more training data, which led to improved model accuracy and fe
wer errors
that needed to be manually corrected.
This virtuous cycle
continue
d
until the model
achieved human
-
level
performance
. At this point, we transitioned to the third phase (
Figure 1a, right
), where the model
was
run
without human assistance to produ
ce high
-
quality predictions. One advantage of this approach is that we
utilized annotators
with different amounts of bandwidth and expertise: experts have experience but limited
bandwidth, while crowdsourced annotators have limited experience but higher ba
ndwidth. Triaging each
task according to its difficulty
and accessing a much larger pool of human annotators further
reduce
d
the
time and cost of dataset construction.
Human
-
in
-
the
-
loop pipelines require specialized software that is optimized for the task
and can be scalably
deployed. We therefore developed DeepCell Label
43
, a browser
-
based graphical user interface optimized
for editing existing cell annotations in tissue images (
Figure S1a,
Methods). DeepCell Label is supported
by a scalable cloud backend that dynamically adjusts the number
of servers
according to demand (
Figure
S1b
). Using DeepCell Label, we trained annotators from multiple crowdsourcing platforms to identify
whole
-
cell and
nuclear boundaries. To further simplify our annotation workflow, we integrated DeepCell
Label into a pipeline that allowed us to prepare and submit images for annotation, have users annotate those
images, and download the results. The images and resulting
labels were used to train and update our model,
completing the loop (
Figure S1c;
Methods).
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted March 2, 2021.
;
https://doi.org/10.1101/2021.03.01.431313
doi:
bioRxiv preprint
4
Our goal in creating TissueNet was to use it to power general
-
purpose tissue segmentation models. To
ensure that models trained on TissueNet would serve as much of
the imaging community as possible, we
made two key choices. First, all data in TissueNet contains two channels, a nuclear channel (such as DAPI)
and a membrane or cytoplasm channel (such as E
-
cadherin or Pan
-
Keratin). Although some highly
multiplexed plat
forms are capable of imaging dozens of markers at once
1,2,4,6
, restricting TissueNet to
incl
ude only the minimum number of channels necessary for whole
-
cell segmentation maximizes the
number of imaging platforms where the resulting models can be used. Second, the data in TissueNet are
derived from a wide variety of tissue types, disease states, a
nd imaging platforms. This diversity of data
allows models trained on TissueNet to handle data from many different experimental setups and biological
questions. The images included in TissueNet were acquired from the published and unpublished works of
labs
who routinely perform tissue imaging
44
51
. Thus, wh
ile this first release of TissueNet encompasses the
tissue types most commonly analyzed by the community, we expect that subsequent versions of TissueNet
will be expanded to include less
-
studied organs.
Figure 1: A human
-
in
-
the
-
loop approach enables sca
lable, pixel
-
level annotation of large image
collections
.
a
, This approach has three phases. During phase 1, annotations are created from scratch to train a model.
During phase 2, new data are fed through a preliminary model to generate predictions. These predictions are used as
a starting point for annotators to
correct. As more images are corrected, the model improves, which decreases the
number of errors, increasing the speed with which new data can be annotated. During phase 3, an accurate model is
run without human correction.
b
, TissueNet has more nuclear and
whole
-
cell annotations than all previously published
datasets.
c
,
The number of cell annotations per platform in TissueNet.
d
, The number of cell annotations per tissue
type in TissueNet.
e
, The number of hours of annotation time required to create Tissue
Net.
Crowd
Expert
0
2k
4k
TissueN
e
t
annotation time
Cells
per platform
Cells
per tissue
CODEX
CyCIF
Vectra
MIBI-TOF
MxIF
IMC
0
50k
100k
150k
200k
250k
300k
350k
50k
100k
150k
200k
250k
300k
350k
0
All previous
TissueNet
Published
segment
a
tions
b
c
d
e
Annotations
Annotations
Hours
Nuclear
Whole Cell
0
400k
800k
1.2M
Annotations
Annotation
throughput
a
Crowdsourced
correction
Expert
correction
Model
annotation
Add to
training
data
Retrain
and update
Preliminary
model
Expert
annotation
Model
training
Fully
automated
final model
P
hase
1
P
hase
2
P
hase
3
Pancreas
Tonsil
Breast
Lung
Colon
Esoph.
Lymph
Skin
Spleen
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted March 2, 2021.
;
https://doi.org/10.1101/2021.03.01.431313
doi:
bioRxiv preprint
5
As a result of
the scalability of our human
-
in
-
the
-
loop approach to data labeling,
TissueNet is larger than
the sum total of all previously published datasets
26,27,32
38
(
Figure 1b
), with 1.3 million whole
-
cell
annotations and 1.
2
million nuclear annotations. TissueNet
contains
data from six imaging platforms
(
Figure 1c)
, nine
organs
(
Figure 1d
),
and
includes both histologically normal and
diseased tissue (e.
g.,
tumor resections).
TissueNet also encompasses three species, with images from human, mouse, and
macaque.
C
onstructing TissueNet required >4,000 person hours, the equivalent of
nearly
2 person
-
years of
full
-
time effort (
Figure 1e
). With an average hourly rate of $6 per hour, we anticipate
that subsequent
datasets of this size will cost around USD $25,000 to produce
a significant reduction versus highly trained
($30/h) or expert patholo
gist (>$150/h) annotators.
Mesmer is a novel algorithm for accurate whole
-
cell segmentation of tissue data
An ideal deep learning model for cell segmentation has two specific requirements. First, a suitable model
must be accurate, which is challenging giv
en the range of cell morphologies, tissue types, and image
platforms present in TissueNet. A model capable of accurately performing whole
-
cell segmentation in this
setting needs sufficient representational capacity to understand and interpret these heterog
eneous images.
Second, a suitable model needs to be fast. Image datasets are increasing rapidly in size, and a model with
high performance but poor inference speed would be of limited utility.
To satisfy these requirements we developed the PanopticNet de
ep learning architecture. To ensure adequate
model capacity, PanopticNets use a ResNet50 backbone coupled to a modified Feature Pyramid Network
(FPN)
52
54
(
Figure S2a
;
Methods). ResNet backbones are a popular architecture for extracting features from
imaging data for a variety of tasks
54
. FPNs aggregate features across length scales, producing
representations that contain both low
-
level details and high
-
level semantics
52
. To perform segmentation,
two semantic heads are attached to the highest level of the FPN to create pixel
-
level p
redictions. These
heads perform two separate prediction tasks. The first head predicts whether a pixel is inside a cell, at the
cell boundary, or part of the image background
25,26
. The second head predicts the distance of each pixel
within
a cell to the cell centroid (
Figure S2a
;
Methods); we extended previous work
30
,
55
by explicitly
accounting for cell size in this step.
We used the PanopticNet architecture and TissueNet to cre
ate Mesmer, a deep learning pipeline for accurate
nuclear and whole
-
cell segmentation of tissue data. Mesmer’s
PanopticNet
model contains four semantic
heads (two for nuclear segmentation and two for whole
-
cell segmentation) that are attached to a common
b
ackbone and FPN. The input to Mesmer is a nuclear image (e.g. DAPI) to define the nucleus of each cell
and a membrane or cytoplasm image (e.g. CD45 or E
-
cadherin) to define the shape of each cell (
Figure
2a
). These inputs are normalized
56
(to improve robustness), tiled into patches of fixed size (to allow
process
ing of images with arbitrary dimensions), and then fed to the
PanopticNet model
. The model outputs
are then untiled
57
to produce predictions for the centroid and boundary of every nucleus and cell in the
image. The centroid and boundary predictions are used as inputs to a watershed algorithm
58
to create the
final instance segmentation mask for each nucleus and each cell
in the image (Methods).
We used the newly created TissueNet dataset to train Mesmer’s model. We randomly partitioned TissueNet
into training
(80%), validation (10%), and testing (10%) splits. The training split was used to directly update
the model weig
hts during training, with the validation split used to assess increases in model accuracy after
each epoch. The test split was completely held out during
training and
used only to evaluate model
performance after training. We used standard image augmentati
on during training to increase model
robustness. To benchmark model accuracy, we built off our prior framework for classifying segmentation
errors
37
. In brief, we perform a linear assignment between predicted cells and ground truth cells. Cells that
map 1
-
to
-
1 with a ground truth cell are marked as
accurately segmented; all other cells are assigned to one
of several error modes depending on their relationship with the ground truth data. We use these assignments
to calculate precision, recall, F1 score, and Jaccard inde
x;
see the Methods section for d
etailed descriptions.
.
CC-BY-NC 4.0 International license
available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint
this version posted March 2, 2021.
;
https://doi.org/10.1101/2021.03.01.431313
doi:
bioRxiv preprint