Feature-based encoding of face identity by single neurons in the human
medial temporal lobe
Runnan Cao
1
, Jinge Wang
2
, Chujun Lin
3
, Ueli Rutishauser
4
, Alexander Todorov
5
, Xin Li
2
,
Nicholas Brandmeir
6,7
, and Shuo Wang
1,7
1
Department of Chemical and Biomedical Engineering, West Virginia University, Morgantown,
WV 26506, USA
2
Lane Department of Computer Science and Electrical Engineering, West Virginia University,
Morgantown, WV 26506, USA
3
Humanities and Social Sciences, California Institute of Technology, Pasadena, CA 91125, USA
4
Departments of Neurosurgery and Neurology, Cedars-Sinai Medical Center, Los Angeles, CA
90048, USA
5
Department of Psychology and Princeton Neuroscience Institute, Princeton University,
Princeton, NJ 08540, USA
6
Department of Neurosurgery, West Virginia University, Morgantown, WV 26506, USA
7
Rockefeller Neurosciences Institute, West Virginia University, Morgantown, WV 26506, USA
Corresponding authors
:
Runnan Cao (runnan.cao@mail.wvu.edu)
Shuo Wang (wangshuo45@gmail.com)
Page of 1
17
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted September 2, 2020.
.
https://doi.org/10.1101/2020.09.01.278283
doi:
bioRxiv preprint
Abstract
Neurons in the human medial temporal lobe (MTL) that are selective for the identity of specific
people are classically thought to encode identity invariant to visual features. However, it remains
largely unknown how visual information from higher visual cortex is translated into a semantic
representation of an individual person. Here, we show that some MTL neurons are selective to
multiple different face identities on the basis of shared features that form clusters in the
representation of a deep neural network trained to recognize faces. Contrary to prevailing views,
we find that these neurons represent an individual’s face with feature-based encoding, rather than
through association with concepts. The response of feature neurons did not depend on face
identity nor face familiarity, and the region of feature space to which they are tuned predicted
their response to new face stimuli. Our results provide critical evidence bridging the perception-
driven representation of facial features in the higher visual cortex and the memory-driven
representation of semantics in the MTL, which may form the basis for declarative memory.
Keywords:
Human single-neuron recordings, Medial temporal lobe, Face, Deep neural network,
Identity neuron, Feature coding
Page of 2
17
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted September 2, 2020.
.
https://doi.org/10.1101/2020.09.01.278283
doi:
bioRxiv preprint
Main Text
How the brain encodes different face identities is one of the most fundamental and intriguing
questions in neuroscience. There are two extreme hypotheses. The feature-based model posits
that face representations are encoded over a broad and distributed population of neurons (1-4).
Under this model, recognizing a particular individual requires access to many neurons, with each
neuron responding to many different faces that share specific visual features such as shape and
skin texture (e.g., (5) and (6)). Conclusive evidence for feature-based coding, in particular axis-
based feature coding (i.e., neurons parametrically correlate with facial features along specific
axes in face space), has recently been revealed in the non-human primate inferotemporal cortex
(IT) (7-10). In contrast, on the other extreme, the exemplar-based model posits that explicit facial
representations in the brain are formed by highly selective (sparse) but at the same time highly
visually invariant neurons (11-14). Identity neurons that selectively respond to many different
images showing a specific person’s face embody the exemplar-based coding and are common in
the human hippocampus and other parts of the medial temporal lobe (MTL) (13, 14). Recent
studies have shown that the responses of identity neurons are clustered by high-level conceptual
or semantic relatedness (e.g., Bill Clinton and Hillary Clinton) rather than by lower-level facial
features (15, 16). Feature-based and exemplar-based models are not mutually exclusive given
that both types of neurons have been observed in different brain regions; but there appears to be
an abrupt transition from a distributed axis-coding model in the higher visual cortex to a sparse
exemplar-based model in the MTL. The neural computations achieving this transformation
remain little understood. Here, we ask the critical question of how the brain transitions from the
representation of facial features processed in the higher visual cortex to the representation of
identities in the MTL. We hypothesize that there are traces of feature-based encoding in the MTL
and these remaining feature-based responses will enable the transformation from feature-based
coding to exemplar-based coding.
To test this hypothesis, we recorded from 578 neurons in the amygdala and hippocampus (MTL
areas) of 5 neurosurgical patients (16 sessions in total;
Table S1
;
Fig. S1
) while they performed a
one-back task (
Fig. 1A
; accuracy = 75.7±5.28% [mean±SD across sessions]). Participants
Page of 3
17
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted September 2, 2020.
.
https://doi.org/10.1101/2020.09.01.278283
doi:
bioRxiv preprint
viewed 500 natural face images of 50 celebrities (
Fig. 1
; 10 faces per identity). 490 neurons had
an overall firing rate greater than 0.15Hz and we restricted our analysis to this subset of neurons,
which included 242 neurons from the amygdala, 186 neurons form the anterior hippocampus,
and 62 neurons from the posterior hippocampus (
Table S1
). The responses of 46/490 neurons
(9.39%) differed between different face identities in a window 250-1000ms following stimulus
onset (i.e., identity neurons;
Fig. 1B
;
Table S1
), consistent with prior recordings from the human
MTL (13, 15, 16). Of the 46 identity neurons, 17 neurons responded to a single identity (referred
to here as single-identity [SI] neurons) and the remaining 29 neurons each responded to multiple
identities (referred to here as multiple-identity [MI] neurons). On average, MI neurons encoded
2.55±0.63 identities (
Fig. 1F, J
). We confirmed the results using an identity selective index (
d
′
between the most- and least-preferred identities;
Fig. 1C
) and ordered responses from the most-
to the least-preferred identities (
Fig. 1D
). As expected, SI neurons had a sharp decrease of
response from the most-preferred identity while MI neurons showed constantly steeper changes
from the most- to the least-preferred identity compared to the non-identity neurons (
Fig. 1D
). We
further confirmed the results using a depth of selectivity (DOS) index (
Fig. S2A
) and single-trial
population decoding (
Fig. S2B, C
), which showed that it was possible to predict the identity of
the face shown.
It has been shown that some MI neurons encode
conceptually
related identities (e.g., Bill Clinton
and Hillary Clinton) (15, 16) in a way that makes the response of MI neurons invariant to visual
features (13, 14, 16). However, it is unknown whether MI neurons can also encode
visually
(rather than conceptually) similar identities. To answer this question, we extracted features from
the images shown to the patients using a pre-trained deep neural network (DNN) VGG-16
trained to recognize faces (see
Fig. S3A, B
for DNN architecture and feature visualization). We
then constructed a two-dimensional stimulus feature space using t-distributed stochastic neighbor
embedding (t-SNE) feature reduction for each DNN layer (
Fig. 1G, K
and
Fig. S4
; note that
quantifications below are in this t-SNE space rather than full dimensional space of the DNN; also
note the pairwise distance between face examples in the full dimensional space is preserved in
the t-SNE space;
Fig. S3D
). This feature space was derived solely from the input images without
knowledge of neural responses and/or tuning of neurons. The feature space demonstrated an
Page of 4
17
.
CC-BY-NC-ND 4.0 International license
(which was not certified by peer review) is the author/funder. It is made available under a
The copyright holder for this preprint
this version posted September 2, 2020.
.
https://doi.org/10.1101/2020.09.01.278283
doi:
bioRxiv preprint