A Caltech Library Service

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

Greenwald, Noah F. and Miller, Geneva and Moen, Erick and Kong, Alex and Kagel, Adam and Dougherty, Thomas and Fullaway, Christine Camacho and McIntosh, Brianna J. and Leow, Ke Xuan and Schwartz, Morgan Sarah and Pavelchek, Cole and Cui, Sunny and Camplisson, Isabella and Bar-Tal, Omer and Singh, Jaiveer and Fong, Mara and Chaudhry, Gautam and Abraham, Zion and Moseley, Jackson and Warshawsky, Shiri and Soon, Erin and Greenbaum, Shirley and Risom, Tyler and Hollmann, Travis and Bendall, Sean C. and Keren, Leeat and Graf, Will and Angelo, Michael and Van Valen, David (2022) Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nature Biotechnology, 40 (4). pp. 555-565. ISSN 1087-0156. PMCID PMC9010346. doi:10.1038/s41587-021-01094-0.

PDF - Submitted Version
Creative Commons Attribution Non-commercial.

[img] PDF (Reporting Summary) - Supplemental Material
See Usage Policy.

[img] Image (JPEG) (Extended Data Fig. 1: DeepCell Label annotation workflow) - Supplemental Material
See Usage Policy.

[img] Image (JPEG) (Extended Data Fig. 2: Mesmer benchmarking) - Supplemental Material
See Usage Policy.

[img] Image (JPEG) (Extended Data Fig. 3: TissueNet accuracy comparisons) - Supplemental Material
See Usage Policy.

[img] Image (JPEG) (Extended Data Fig. 4: 3D segmentation) - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


A principal challenge in the analysis of tissue imaging data is cell segmentation—the task of identifying the precise boundary of every cell in an image. To address this problem we constructed TissueNet, a dataset for training segmentation models that contains more than 1 million manually labeled cells, an order of magnitude more than all previously published segmentation training datasets. We used TissueNet to train Mesmer, a deep-learning-enabled segmentation algorithm. We demonstrated that Mesmer is more accurate than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We then adapted Mesmer to harness cell lineage information in highly multiplexed datasets and used this enhanced version to quantify cell morphology changes during human gestation. All code, data and models are released as a community resource.

Item Type:Article
Related URLs:
URLURL TypeDescription ReadCube access Paper ItemData ItemSoftware ItemCode
Greenwald, Noah F.0000-0002-7836-4379
Moen, Erick0000-0002-5947-7044
McIntosh, Brianna J.0000-0003-3626-625X
Schwartz, Morgan Sarah0000-0001-8131-9125
Pavelchek, Cole0000-0001-9249-6637
Bar-Tal, Omer0000-0003-1622-3674
Chaudhry, Gautam0000-0003-2240-9846
Greenbaum, Shirley0000-0002-0680-7652
Risom, Tyler0000-0003-1089-9542
Hollmann, Travis0000-0003-1599-0433
Bendall, Sean C.0000-0003-1341-2453
Keren, Leeat0000-0002-6799-6303
Graf, Will0000-0003-0460-4605
Angelo, Michael0000-0003-1531-5067
Van Valen, David0000-0001-7534-7621
Additional Information:© 2021 Nature Publishing Group. Received 01 March 2021; Accepted 14 September 2021; Published 18 November 2021. We thank K. Borner, L. Cai, M. Covert, A. Karpathy, S. Quake and M. Thomson for interesting discussions; D. Glass and E. McCaffrey for feedback on the manuscript; T. Vora for copy editing; R. Angoshtari, G. Barlow, B. Bodenmiller, C. Carey, R. Coffey, A. Delmastro, C. Egelston, M. Hoppe, H. Jackson, A. Jeyasekharan, S. Jiang, Y. Kim, E. McCaffrey, E. McKinley, M. Nelson, S.-B. Ng, G. Nolan, S. Patel, Y. Peng, D. Philips, R. Rashid, S. Rodig, S. Santagata, C. Schuerch, D. Schulz, Di. Simons, P. Sorger, J. Weirather and Y. Yuan for providing imaging data for TissueNet; the crowd annotators who powered our human-in-the-loop pipeline; and all patients who donated samples for this study. This work was supported by grants from the Shurl and Kay Curci Foundation, the Rita Allen Foundation, the Susan E. Riley Foundation, the Pew Heritage Trust, the Alexander and Margaret Stewart Trust, the Heritage Medical Research Institute, the Paul Allen Family Foundation through the Allen Discovery Centers at Stanford and Caltech, the Rosen Center for Bioengineering at Caltech and the Center for Environmental and Microbial Interactions at Caltech (D.V.V.). This work was also supported by 5U54CA20997105, 5DP5OD01982205, 1R01CA24063801A1, 5R01AG06827902, 5UH3CA24663303, 5R01CA22952904, 1U24CA22430901, 5R01AG05791504 and 5R01AG05628705 from NIH, W81XWH2110143 from DOD, and other funding from the Bill and Melinda Gates Foundation, Cancer Research Institute, the Parker Center for Cancer Immunotherapy and the Breast Cancer Research Foundation (M.A.). N.F.G. was supported by NCI CA246880-01 and the Stanford Graduate Fellowship. B.J.M. was supported by the Stanford Graduate Fellowship and Stanford Interdisciplinary Graduate Fellowship. T.D. was supported by the Schmidt Academy for Software Engineering at Caltech. Data availability: The TissueNet dataset is available at for noncommercial use. Code availability: All software for dataset construction, model training, deployment and analysis is available on our github page All code to generate the figures in this paper is available at These authors contributed equally: Noah F. Greenwald, Geneva Miller. Author Contributions: N.F.G., L.K., M.A. and D.V.V. conceived the project. E.M. and D.V.V. conceived the human-in-the-loop approach. L.K. and M.A. conceived the whole-cell segmentation approach. G.M., T.D., E.M., W.G. and D.V.V. developed DeepCell Label. G.M., N.F.G., E.M., I.C., W.G. and D.V.V. developed the human-in-the-loop pipeline. M.S.S., C.P., W.G. and D.V.V. developed Mesmer’s deep learning architecture. W.G., N.F.G. and D.V.V. developed model training software. C.P. and W.G. developed cloud deployment. M.S.S., S.C., W.G. and D.V.V. developed metrics software. W.G. developed plugins. N.F.G., A. Kong, A. Kagel, J.S. and O.B.-T. developed the multiplex image analysis pipeline. A. Kagel and G.M. developed the pathologist evaluation software. N.F.G., G.M. and T.H. supervised training data creation. N.F.G., C.C.F., B.J.M., K.X.L., M.F., G.C., Z.A., J.M. and S.W. performed quality control on the training data. E.S., S.G. and T.R. generated MIBI-TOF data for morphological analyses. S.C.B. helped with experimental design. N.F.G., W.G. and D.V.V. trained the models. N.F.G., W.G., G.M. and D.V.V. performed data analysis. N.F.G., G.M., M.A. and D.V.V. wrote the manuscript. M.A. and D.V.V. supervised the project. All authors provided feedback on the manuscript. Peer review information: Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.
Group:Caltech Center for Environmental Microbial Interactions (CEMI), Rosen Bioengineering Center
Funding AgencyGrant Number
Shurl and Kay Curci FoundationUNSPECIFIED
Rita Allen FoundationUNSPECIFIED
Susan E. Riley FoundationUNSPECIFIED
Pew Heritage TrustUNSPECIFIED
Alexander and Margaret Stewart TrustUNSPECIFIED
Heritage Medical Research InstituteUNSPECIFIED
Paul Allen Family FoundationUNSPECIFIED
Donna and Benjamin M. Rosen Bioengineering CenterUNSPECIFIED
Caltech Center for Environmental Microbial Interactions (CEMI)UNSPECIFIED
Department of DefenseW81XWH2110143
Bill and Melinda Gates FoundationUNSPECIFIED
Cancer Research InstituteUNSPECIFIED
Parker Institute for Cancer ImmunotherapyUNSPECIFIED
Breast Cancer Research FoundationUNSPECIFIED
National Cancer InstituteCA246880-01
Stanford UniversityUNSPECIFIED
Schmidt Futures ProgramUNSPECIFIED
Subject Keywords:Image processing; Imaging; Software
Issue or Number:4
PubMed Central ID:PMC9010346
Record Number:CaltechAUTHORS:20210303-070232817
Persistent URL:
Official Citation:Greenwald, N.F., Miller, G., Moen, E. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat Biotechnol 40, 555–565 (2022).
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108281
Deposited By: Tony Diaz
Deposited On:03 Mar 2021 19:23
Last Modified:25 Apr 2022 18:16

Repository Staff Only: item control page