CaltechAUTHORS
  A Caltech Library Service

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Higgins, Irina and Chang, Le and Langston, Victoria and Hassabis, Demis and Summerfield, Christopher and Tsao, Doris and Botvinick, Matthew (2021) Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature Communications, 12 . Art. No. 6456. ISSN 2041-1723. doi:10.1038/s41467-021-26751-5. https://resolver.caltech.edu/CaltechAUTHORS:20211111-212227870

[img] PDF - Published Version
Creative Commons Attribution.

3MB
[img] PDF - Submitted Version
See Usage Policy.

4MB
[img] PDF - Supplemental Material
Creative Commons Attribution.

1MB
[img] PDF (Peer Review File) - Supplemental Material
Creative Commons Attribution.

739kB
[img] PDF (Reporting summary) - Supplemental Material
Creative Commons Attribution.

305kB
[img] MS Excel (Source Data) - Supplemental Material
Creative Commons Attribution.

301kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20211111-212227870

Abstract

In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1038/s41467-021-26751-5DOIArticle
https://arxiv.org/abs/2006.14304arXivDiscussion Paper
https://doi.org/10.6084/m9.figshare.c.5613197.v2DOIData
https://www.nist.gov/itl/iad/image-group/color-feret-databaseRelated ItemFERET face database
http://lrv.fri.uni-lj.si/facedb.htmlRelated ItemCVL face database
http://ninastrohminger.com/the-mr2Related ItemMR2 face database
http://www2.ece.ohio-state.edu/~aleix/ARdatabase.htmlRelated ItemAR face database
https://www.chicagofaces.org/Related ItemChicago face database
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.htmlRelated ItemCelebA face database
https://github.com/google-research/disentanglement_libRelated ItemCode
ORCID:
AuthorORCID
Higgins, Irina0000-0002-1890-2091
Hassabis, Demis0000-0003-2812-9917
Tsao, Doris0000-0003-1083-1919
Botvinick, Matthew0000-0001-7758-6896
Additional Information:© The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Received 15 December 2020; Accepted 22 October 2021; Published 09 November 2021. We would like to thank Raia Hadsell, Zeb Kurth-Nelson and Koray Kavukcouglu for comments on the manuscript, and Lucy Campbell-Gillingham and Kevin McKee for their help in running the human psychophysics experiments. Funding source: NIH (DP1-NS083063, R01-EY030650) and the Howard Hughes Medical Institute. Data availability: The unprocessed responses of all models to the 2162 face images generated in this study have been deposited in the figshare database (https://doi.org/10.6084/m9.figshare.c.5613197.v2). This includes AAM, VGG (PCA), VAE and β-VAE responses previously published in Chang et al.6. The figshare database also includes the anonymised psychophysics data, a file describing how the semantic labels used in one of the psychophysics study were obtained from the larger list of 46 descriptive face attributes compiled in Klare et al.61, and the two sample forms used for data collection on Prolific. The raw neural data supporting the current study were previously published in Chang et al.6 and are available under restricted access because of the complexity of the customised data structure and the size of the data; access can be obtained by contacting Le Chang (stevenlechang@gmail.com) or Doris Tsao (tsao.doris@gmail.com). The face image data used in this study are available in the corresponding databases: FERET face database55 (https://www.nist.gov/itl/iad/image-group/color-feret-database), CVL face database54 (http://lrv.fri.uni-lj.si/facedb.html), MR2 face database56 (http://ninastrohminger.com/the-mr2), PEAL face database57, AR face database51 (http://www2.ece.ohio-state.edu/aleix/ARdatabase.html), Chicago face database53 (https://www.chicagofaces.org) and CelebA face database52 (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). Source data are provided with this paper. Code availability: The code that supports the findings of this study is available upon request from Irina Higgins (irinah@google.com) due to its complexity and partial reliance on proprietary libraries. Open-source implementations of the β-VAE model, the alignment score and the UDR measure are available at https://github.com/google-research/disentanglement_lib. These authors contributed equally: Irina Higgins, Le Chang. Author Contributions: C.S., D.T. and M.B. contributed equally to this manuscript. Conceptualisation, I.H., D.H., M.B.; Methodology, Software, Data Curation and Validation, I.H. and L.C.; Formal Analysis, Investigation, Visualisation, Writing - Original Draft, I.H.; Writing - Review and Editing, I.H., C.S., L.C., D.T., M.B. and C.S.; Project Administration, V.L.; Supervision, D.H., C.S., D.T. and M.B.; Funding Acquisition and Resources, D.T., M.B. and D.H. The authors declare no competing interests. Peer review information: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Group:Tianqiao and Chrissy Chen Institute for Neuroscience
Funders:
Funding AgencyGrant Number
NIHDP1-NS083063
NIHR01-EY030650
Howard Hughes Medical Institute (HHMI)UNSPECIFIED
Subject Keywords:Computational neuroscience; Neuroscience; Object vision; Visual system
DOI:10.1038/s41467-021-26751-5
Record Number:CaltechAUTHORS:20211111-212227870
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20211111-212227870
Official Citation:Higgins, I., Chang, L., Langston, V. et al. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat Commun 12, 6456 (2021). https://doi.org/10.1038/s41467-021-26751-5
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:111847
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:11 Nov 2021 22:18
Last Modified:11 Nov 2021 22:19

Repository Staff Only: item control page