A Caltech Library Service

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

Greenwald, Noah F. and Miller, Geneva and Moen, Erick and Kong, Alex and Kagel, Adam and Fullaway, Christine Camacho and McIntosh, Brianna J. and Leow, Ke and Schwartz, Morgan Sarah and Dougherty, Thomas and Pavelchek, Cole and Cui, Sunny and Camplisson, Isabella and Bar-Tal, Omer and Singh, Jaiveer and Fong, Mara and Chaudhry, Gautam and Abraham, Zion and Moseley, Jackson and Warshawsky, Shiri and Soon, Erin and Greenbaum, Shirley and Risom, Tyler and Hollmann, Travis and Keren, Leeat and Graf, Will and Angelo, Michael and Van Valen, David (2021) Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. . (Unpublished)

PDF - Submitted Version
Creative Commons Attribution Non-commercial.

PDF (Supplementary Figures) - Supplemental Material
Creative Commons Attribution Non-commercial.


Use this Persistent URL to link to this item:


Understanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemSoftware ItemCode
Greenwald, Noah F.0000-0002-7836-4379
Moen, Erick0000-0002-5947-7044
Schwartz, Morgan Sarah0000-0001-8131-9125
Greenbaum, Shirley0000-0002-0680-7652
Risom, Tyler0000-0003-1089-9542
Hollmann, Travis0000-0003-1599-0433
Keren, Leeat0000-0002-6799-6303
Graf, Will0000-0003-0460-4605
Van Valen, David0000-0001-7534-7621
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. This version posted March 2, 2021. We thank Long Cai, Katy Borner, Matt Thomson, Steve Quake, and Markus Covert for interesting discussions; Sean Bendall, David Glass, and Erin McCaffrey for feedback on the manuscript; Roshan Angoshtari, Graham Barlow, Bernd Bodenmiller, Christopher Carey, Robert Coffey, Alea Delmastro, Colt Egelston, Michal Hoppe, Hartland Jackson, Anand Jeyasekharan, Sizun Jiang, Youn Kim, Erin McCaffrey, Eliot McKinley, Michael Nelson, Siok-Bian Ng, Gary Nolan, Sanjay Patel, Yanfen Peng, Darci Philips, Rumana Rashid, Scott Rodig, Sandro Santagata, Christian Schuerch, Daniel Schulz, Diana Simons, Peter Sorger, Jason Weirather, and Yuan Yuan for providing imaging data for TissueNet; the crowd annotators who powered our human-in-the-loop pipeline; and all patients who donated samples for this study. This work was supported by grants from the Shurl and Kay Curci Foundation, the Rita Allen Foundation, the Susan E. Riley Foundation, the Paul Allen Family Foundation through the Allen Discovery Centers at Stanford and Caltech, the Rosen Center for Bioengineering at Caltech, and the Center for Environmental and Microbial Interactions at Caltech (D.V.V.). Additional support was provided by grants from the Bill and Melinda Gates Foundation, a Translational Research Award from the Stanford Cancer Institute, and 1-DP5-1051, OD019822, 1R01AG056287, 1R01AG057915, and 1U24CA224309 (M.A.). N.F.G was supported by NCI CA246880-01 and the Stanford Graduate Fellowship. B.J.M. was supported by the Stanford Graduate Fellowship and Stanford Interdisciplinary Graduate Fellowship. T.D. was supported by the Schmidt Academy for Software Engineering. Competing interests: M.A. is an inventor on patent US20150287578A1. M.A. is a board member and shareholder in IonPath Inc. T.R. has previously consulted for IonPath Inc. The authors have filed a provisional patent for this work. Authorship Contributions: N.F.G., L.K., M.A., and D.V.V. conceived the project. E.M. and D.V.V. conceived the human-in-the-loop approach. L.K. and M.A. conceived the whole-cell segmentation approach. G.M., T.D., E.M., W.G., and D.V.V. developed DeepCell Label. G.M., N.F.G., E.M., I.C., W.G., and D.V.V. developed the human-in-the-loop pipeline. M.S., C.P., W.G., and D.V.V. developed PanopticNets. W.G., N.F.G., and D.V.V. developed model training software. C.P. and W.G. developed cloud deployment. M.S., S.C., W.G., and D.V.V. developed metrics software. W.G. developed plug-ins. N.F.G., A.Kong, A.Kagel, J.S., and O.B-T. developed the multiplex image analysis pipeline. A.Kagel and G.M. developed the pathologist evaluation software. N.F.G., G.M., and T.H. supervised training data creation. N.F.G., C.C.F., B.M., K.L., M.F., G.C., Z.A., J.M. and S.W. performed quality control on the training data. E.S., S.G., and T.R. generated MIBI-TOF data for morphological analyses. N.F.G., W.G., and D.V.V. trained the models. N.F.G., W.G., G.M., and D.V.V. performed data analysis. N.F.G., G.M., M.A., and D.V.V. wrote the manuscript. M.A. and D.V.V. supervised the project. All authors provided feedback on the manuscript. Data and software availability: All data will be made available at upon publication in a peer-reviewed journal. All software for dataset construction, model training, deployment, and analysis is available on our github page All code to generate the figures in this paper is available at
Group:Rosen Bioengineering Center, Caltech Center for Environmental Microbial Interactions (CEMI)
Funding AgencyGrant Number
Shurl and Kay Curci FoundationUNSPECIFIED
Rita Allen FoundationUNSPECIFIED
Susan E. Riley FoundationUNSPECIFIED
Paul Allen Family FoundationUNSPECIFIED
Donna and Benjamin M. Rosen Bioengineering CenterUNSPECIFIED
Caltech Center for Environmental Microbial Interactions (CEMI)UNSPECIFIED
Bill and Melinda Gates FoundationUNSPECIFIED
Stanford Cancer InstituteUNSPECIFIED
Stanford UniversityUNSPECIFIED
Schmidt Academy for Software EngineeringUNSPECIFIED
Record Number:CaltechAUTHORS:20210303-070232817
Persistent URL:
Official Citation:Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Noah F. Greenwald, Geneva Miller, Erick Moen, Alex Kong, Adam Kagel, Christine Camacho Fullaway, Brianna J. McIntosh, Ke Leow, Morgan Sarah Schwartz, Thomas Dougherty, Cole Pavelchek, Sunny Cui, Isabella Camplisson, Omer Bar-Tal, Jaiveer Singh, Mara Fong, Gautam Chaudhry, Zion Abraham, Jackson Moseley, Shiri Warshawsky, Erin Soon, Shirley Greenbaum, Tyler Risom, Travis Hollmann, Leeat Keren, William Graf, Michael Angelo, David Van Valen. bioRxiv 2021.03.01.431313; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108281
Deposited By: Tony Diaz
Deposited On:03 Mar 2021 19:23
Last Modified:18 Aug 2021 01:29

Repository Staff Only: item control page