A Caltech Library Service

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Higgins, Irina and Chang, Le and Langston, Victoria and Hassabis, Demis and Summerfield, Christopher and Tsao, Doris and Botvinick, Matthew (2021) Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature Communications, 12 . Art. No. 6456. ISSN 2041-1723. doi:10.1038/s41467-021-26751-5.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF - Submitted Version
See Usage Policy.

[img] PDF - Supplemental Material
Creative Commons Attribution.

[img] PDF (Peer Review File) - Supplemental Material
Creative Commons Attribution.

[img] PDF (Reporting summary) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Source Data) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper ItemFERET face database ItemCVL face database ItemMR2 face database ItemAR face database ItemChicago face database ItemCelebA face database ItemCode
Higgins, Irina0000-0002-1890-2091
Hassabis, Demis0000-0003-2812-9917
Tsao, Doris0000-0003-1083-1919
Botvinick, Matthew0000-0001-7758-6896
Additional Information:© The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit Received 15 December 2020; Accepted 22 October 2021; Published 09 November 2021. We would like to thank Raia Hadsell, Zeb Kurth-Nelson and Koray Kavukcouglu for comments on the manuscript, and Lucy Campbell-Gillingham and Kevin McKee for their help in running the human psychophysics experiments. Funding source: NIH (DP1-NS083063, R01-EY030650) and the Howard Hughes Medical Institute. Data availability: The unprocessed responses of all models to the 2162 face images generated in this study have been deposited in the figshare database ( This includes AAM, VGG (PCA), VAE and β-VAE responses previously published in Chang et al.6. The figshare database also includes the anonymised psychophysics data, a file describing how the semantic labels used in one of the psychophysics study were obtained from the larger list of 46 descriptive face attributes compiled in Klare et al.61, and the two sample forms used for data collection on Prolific. The raw neural data supporting the current study were previously published in Chang et al.6 and are available under restricted access because of the complexity of the customised data structure and the size of the data; access can be obtained by contacting Le Chang ( or Doris Tsao ( The face image data used in this study are available in the corresponding databases: FERET face database55 (, CVL face database54 (, MR2 face database56 (, PEAL face database57, AR face database51 (, Chicago face database53 ( and CelebA face database52 ( Source data are provided with this paper. Code availability: The code that supports the findings of this study is available upon request from Irina Higgins ( due to its complexity and partial reliance on proprietary libraries. Open-source implementations of the β-VAE model, the alignment score and the UDR measure are available at These authors contributed equally: Irina Higgins, Le Chang. Author Contributions: C.S., D.T. and M.B. contributed equally to this manuscript. Conceptualisation, I.H., D.H., M.B.; Methodology, Software, Data Curation and Validation, I.H. and L.C.; Formal Analysis, Investigation, Visualisation, Writing - Original Draft, I.H.; Writing - Review and Editing, I.H., C.S., L.C., D.T., M.B. and C.S.; Project Administration, V.L.; Supervision, D.H., C.S., D.T. and M.B.; Funding Acquisition and Resources, D.T., M.B. and D.H. The authors declare no competing interests. Peer review information: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Group:Tianqiao and Chrissy Chen Institute for Neuroscience
Funding AgencyGrant Number
Howard Hughes Medical Institute (HHMI)UNSPECIFIED
Subject Keywords:Computational neuroscience; Neuroscience; Object vision; Visual system
Record Number:CaltechAUTHORS:20211111-212227870
Persistent URL:
Official Citation:Higgins, I., Chang, L., Langston, V. et al. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat Commun 12, 6456 (2021).
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:111847
Deposited By: Tony Diaz
Deposited On:11 Nov 2021 22:18
Last Modified:11 Nov 2021 22:19

Repository Staff Only: item control page