Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Creators: Higgins, Irina; Chang, Le; Langston, Victoria; Hassabis, Demis; Summerfield, Christopher; Tsao, Doris; Botvinick, Matthew

Abstract

In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Additional Information

© The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Received 15 December 2020; Accepted 22 October 2021; Published 09 November 2021. We would like to thank Raia Hadsell, Zeb Kurth-Nelson and Koray Kavukcouglu for comments on the manuscript, and Lucy Campbell-Gillingham and Kevin McKee for their help in running the human psychophysics experiments. Funding source: NIH (DP1-NS083063, R01-EY030650) and the Howard Hughes Medical Institute. Data availability: The unprocessed responses of all models to the 2162 face images generated in this study have been deposited in the figshare database (https://doi.org/10.6084/m9.figshare.c.5613197.v2). This includes AAM, VGG (PCA), VAE and β-VAE responses previously published in Chang et al.6. The figshare database also includes the anonymised psychophysics data, a file describing how the semantic labels used in one of the psychophysics study were obtained from the larger list of 46 descriptive face attributes compiled in Klare et al.61, and the two sample forms used for data collection on Prolific. The raw neural data supporting the current study were previously published in Chang et al.6 and are available under restricted access because of the complexity of the customised data structure and the size of the data; access can be obtained by contacting Le Chang (stevenlechang@gmail.com) or Doris Tsao (tsao.doris@gmail.com). The face image data used in this study are available in the corresponding databases: FERET face database55 (https://www.nist.gov/itl/iad/image-group/color-feret-database), CVL face database54 (http://lrv.fri.uni-lj.si/facedb.html), MR2 face database56 (http://ninastrohminger.com/the-mr2), PEAL face database57, AR face database51 (http://www2.ece.ohio-state.edu/aleix/ARdatabase.html), Chicago face database53 (https://www.chicagofaces.org) and CelebA face database52 (http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). Source data are provided with this paper. Code availability: The code that supports the findings of this study is available upon request from Irina Higgins (irinah@google.com) due to its complexity and partial reliance on proprietary libraries. Open-source implementations of the β-VAE model, the alignment score and the UDR measure are available at https://github.com/google-research/disentanglement_lib. These authors contributed equally: Irina Higgins, Le Chang. Author Contributions: C.S., D.T. and M.B. contributed equally to this manuscript. Conceptualisation, I.H., D.H., M.B.; Methodology, Software, Data Curation and Validation, I.H. and L.C.; Formal Analysis, Investigation, Visualisation, Writing - Original Draft, I.H.; Writing - Review and Editing, I.H., C.S., L.C., D.T., M.B. and C.S.; Project Administration, V.L.; Supervision, D.H., C.S., D.T. and M.B.; Funding Acquisition and Resources, D.T., M.B. and D.H. The authors declare no competing interests. Peer review information: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Attached Files

Published - s41467-021-26751-5.pdf

Submitted - 2006.14304.pdf

Supplemental Material - 41467_2021_26751_MOESM1_ESM.pdf

Supplemental Material - 41467_2021_26751_MOESM2_ESM.pdf

Supplemental Material - 41467_2021_26751_MOESM3_ESM.pdf

Supplemental Material - 41467_2021_26751_MOESM4_ESM.xlsx

Files

41467_2021_26751_MOESM2_ESM.pdf

Files (11.3 MB)

Name	Size	Download all
41467_2021_26751_MOESM2_ESM.pdf md5:585ac76356e06b0d76473b000ae18684	739.9 kB	Preview Download
2006.14304.pdf md5:b444a1c69f462de434e1af421c03be1d	4.6 MB	Preview Download
s41467-021-26751-5.pdf md5:2648271624afc25ce63f4c8d0ee4330d	3.4 MB	Preview Download
41467_2021_26751_MOESM1_ESM.pdf md5:e22e75f94296c8dc1d9c95d68d385732	2.0 MB	Preview Download
41467_2021_26751_MOESM3_ESM.pdf md5:6e47d87837f259f3b65b9f8ee0f771ff	305.1 kB	Preview Download
41467_2021_26751_MOESM4_ESM.xlsx md5:13eb50d22cd1c7fff1b389bc3c0f84ab	301.4 kB	Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes