Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
1 of 14
COGNITIVE NEUROSCIENCE
Domain-
specific representation of social inference by
neurons in the human amygdala and hippocampus
Runnan Cao
1
*, Julien Dubois
2
, Adam N. Mamelak
2
, Ralph Adolphs
3,4
†,
Shuo Wang
1
†, Ueli Rutishauser
2,4
*†
Inferring the intentions and emotions of others from behavior is crucial for social cognition. While neuroimaging
studies have identified brain regions involved in social inference, it remains unknown whether performing social
inference is an abstract computation that generalizes across different stimulus categories or is specific to certain
stimulus domain. We recorded single-
neuron activity from the medial temporal lobe (MTL) and the medial frontal
cortex (MFC) in neurosurgical patients performing different types of inferences from images of faces, hands, and
natural scenes. Our findings indicate distinct neuron populations in both regions encoding inference type for
social (faces, hands) and nonsocial (scenes) stimuli, while stimulus category was itself represented in a task-
general manner. Uniquely in the MTL, social inference type was represented by separate subsets of neurons for
faces and hands, suggesting a domain-
specific representation. These results reveal evidence for specialized social
inference processes in the MTL, in which inference representations were entangled with stimulus type as expected
from a domain-
specific process.
INTRODUCTION
Inferring the latent mental states and intentions of other people
from observing their behavior is a critical human ability that is at
the core of what is often referred to as either “theory of mind”
(ToM), “mentalizing” (
1
,
2
), or “social inference” (
3
,
4
). Emerging
during early childhood, typically around the age of four (
5
), this
important skill relies on representing the beliefs, desires, intentions,
and feelings of others to make sophisticated social interactions
possible (
6
). Atypical social inference is thought to contribute to the
difficulties experienced in mental and neurological disorders, in-
cluding in autism (
7
,
8
), schizophrenia (
9
), and Parkinson’s disease
(
10
,
11
) [for review, see (
12
)]. Social inference has been an active
topic of study since the 1970s, ranging from developmental psychol-
ogy (
13
) to social neuroscience (
14
,
15
) to philosophy of mind (
16
).
A wide range of tasks has been developed to study it, including the
false-
belief task (
17
), pragmatic language comprehension (
18
), and
belief-
desire reasoning (
19
). A key question in social inference work
is how the brain represents its own and other’s mental states at the
neuronal level (
6
).
Anatomically, a large number of brain regions have been associ-
ated with social inference, ranging from the cerebellum to the supe-
rior temporal sulcus to the frontal cortex (
12
,
20
,
21
), a list that
varies greatly depending on the exact task used [see (
22
,
23
) for re-
views]. Three common sets of regions that stand out in the literature
are the temporo-
parietal junction (TPJ) (
24
,
25
), medial frontal cor
-
tex (MFC) (
21
), and the medial temporal lobe (MTL), which to-
gether form important components of the “social brain” (
26
,
27
).
The MFC contains several areas of interest to social inference, in-
cluding the supplementary motor area (SMA), the pre-
SMA, the
anterior and middle cingulate cortex (ACC/MCC) (
28
), and the
medial prefrontal cortex (mPFC) (
29
,
30
). These subregions have
been broadly associated, to varying extents, with the inference
of different mental states (
31
), such as false beliefs (
32
), deception
(
20
), intentions (
33
), empathy (
34
), desires (
24
), and preferences
(
35
), indicating a prominent role for sectors of the MFC in specific
kinds of social inferences (
36
). The role of frontal regions in social
inference is dissociable from their role in “executive functions”
more broadly (
37
,
38
), suggesting that social inference processes
are distinct specializations rather than reutilization of more general
executive processes.
By contrast, a broader role in making social inferences is sug-
gested for the MTL, notably including the amygdala (AMY) and
hippocampus (HIPP) (
26
,
27
,
39
). The MTL supports processes such
as recognition memory (
40
), social evaluations (
27
,
41
), categoriza-
tion (
42
), facial emotion recognition (
43
,
44
), relational processing
(
45
), and latent state inference (
46
) that are needed for social infer
-
ence but are not specialized for doing so. The MTL is closely
connected both structurally and functionally with the MFC (
47
,
48
),
suggesting that these two regions are two nodes in the social infer
-
ence network.
Several important questions regarding the neural basis of social
inference remain open. First, while a wide set of brain regions have
been implicated in social inference, it remains unknown what spe-
cifically the neurons in these regions contribute functionally, a
question that neuroimaging studies alone cannot resolve. Second,
although many subregions of the MFC have been linked to social
inference, only the mPFC reached a high degree (90%) of reliability
of activation across different inference studies (
36
). It remains un-
clear how other frontal brain regions, including pre-
SMA and dor
-
sal anterior cingulate cortex (dACC), contribute to social inference.
Third, while the MTL has been hypothesized to play an important
role in social inference (
26
), it is not typically identified as part of
the social inference network in imaging studies (
36
,
49
). However,
lesion studies (e.g., impaired social inference function following
amygdala damage) (
49
–
51
) as well as prominent representations of
faces and judgments about faces at the single neuron level (
52
,
53
)
suggest that the amygdala plays an important role in social infer
-
ence. This discrepancy may be partly attributed to limitations such
1
Department of Radiology, Washington University in St. l
ouis, St. l
ouis, MO 63110,
USA.
2
Departments of neurosurgery, neurology, and Biomedical Sciences, Cedars-
Sinai
Medical Center, l
os Angeles, CA 90048, USA.
3
Division of the humanities and Social
Sciences, California i
nstitute of
technology, Pasadena, CA 91125, U.S.A.
4
Division of
Biology and Biological engineering, California i
nstitute of
technology, Pasadena, CA
91125, U.S.A.
*Corresponding author.
email: r. cao@ wustl. edu (R.C.); ueli. rutishauser@ cshs. org(U.R.)
†t
hese authors contributed equally to this work.
Copyright © 2024
the
Authors, some rights
reserved; exclusive
licensee American
Association for the
Advancement of
Science. no claim to
original U.S.
Government Works.
Distributed under a
Creative Commons
Attribution
nonCommercial
license 4.0 (CC BY- nC).
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025
Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
2 of 14
as poor signal-
to-
noise ratio (SNR) in subcortical areas with blood
oxygen level dependent (BOLD) functional magnetic resonance
imaging (fMRI), thus leaving the involvement of the MTL in social
inference unclear.
Last, it remains unclear whether the implementation of social in-
ference recruits specific brain regions dedicated to representing
mental states (
54
,
55
). This question is part of a long standing debate
on the domain specificity of social processing in the brain, with
seminal debates historically focused on whether or not there are
regions specialized for processing faces (
56
). With respect to the
processes engaged in social inference, this important question re-
mains unresolved. On one hand, a recent nonhuman primate fMRI
study found that part of the MFC was exclusively activated during
social interactions (
57
), and lesion studies in the macaque have
found evidence for specifically social valuation in anterior cingulate
cortex (
58
). On the other hand, human studies suggested that the
MFC plays a more general role in subserving executive function–
related neural functions rather than social inference specifically
(
54
,
59
,
60
).
Here, we used single-
neuron recordings to study social inference
using a well-
validated and well-
established ToM task. Our focus is
on the specific cognitive function of inferring the mental states of
others from observing human behavior. We previously developed
and validated with fMRI and behavior a task that contrasts physical
judgments about social images (“how” an action is being performed)
with social inferences about the mental states responsible (“why”
the person is performing that action) (
61
). We note that this task has
been adopted in the National Institute of Mental Health (NIMH)
Research Domain Criteria framework for the subconstruct of
“action perception” within the construct “perception and under
-
standing of others” in the social processes domain of the framework
(www.nimh.nih.gov/research/research-
funded- by-
nimh/rdoc/
constructs/action- perception) (
62
,
63
). Given the importance of this
task, it is important to anchor its neural correlates in intracranial
electrophysiology. Here, we used this same task with intracranial re-
cordings in humans and asked whether social inference-
related pro-
cesses are represented in the responses of single neurons within
parts of the MFC (the dACC and pre-
SMA) and the MTL (amyg-
dala and hippocampus). The core analysis approach we took is to
contrast why with how questions for the very same images, thereby
differentiating the neural representations of social inference from
those of perceptual judgments (
61
).
RESULTS
Task and behavior
We used the validated “why/how” social inference task (
61
,
64
,
65
)
to probe the neural mechanisms of social inference in the human
brain. Patients were presented with naturalistic color images and
asked to answer questions about the stimuli that required perform-
ing social inference. We varied stimulus domains (hand actions
versus facial expressions versus nonsocial events) and the type of
inference required to answer a question (perceptual judgment of the
action probed by how questions (e.g., “is the person smiling?”)
versus social inference of the cause probed by why questions (e.g.,
“is the person admiring someone?”) in a blocked design (Fig. 1A
and see Materials and Methods for details). In each trial, patients
were first shown the question to be answered, then saw a single im-
age, and then made a “yes” or “no” decision. Our analysis generally
used a two (inference type: why versus how) by two (category: faces
versus hands) by two (choice: yes versus no) factorial design. In 8 of
the 14 patients, we also included a third category of images: scenes
of natural events that contained neither a face nor hands. We refer to
face and hand images as social stimuli and to scene images as
nonsocial stimuli. In the latter subset of sessions, the paradigm was
a 2
×
3
×
2 design.
Patients’ responses were compared with normative data acquired
independently in (
61
). We used the data from this group of subjects
as the “ground truth” to calculate the accuracy of the judgments
made by the subjects in the present study. The patients performed
well on both why (Fig. 1, C and D; accuracy: 87.08
±
10.5% [mean
±
SD]; response time: 0.77
±
0.12 s [mean
±
SD]) and how ques-
tions (accuracy: 91.87
±
7.83% [mean
±
SD]; response time:
0.68
±
0.14 s [mean
±
SD]). The accuracy for why questions was
lower (two-
tailed paired
t
test:
t
18
=
4.06,
P
=
0.0007), and the
response time for why questions was longer (two-
tailed paired
t
test:
t
18
=
5.05,
P
=
8.03
×
10
−
5
) compared to how questions. This
is similar to the normative data (
61
) and is expected given the
additional inferential processing required for why blocks.
Neuronal correlates of social inference in the MTL and MFC
We isolated in total 726 single neurons from the amygdala, hippo-
campus, dACC, and pre-
SMA across 19 sessions in 14 neurosurgical
patients (see Fig. 1B for example locations of the electrodes and ta-
ble S1 for a complete list;
n
=
236, 158, 141, and 191 neurons from
the AMY, HIPP, dACC, and pre-
SMA, respectively). For brevity, we
refer to AMY and HIPP together as the MTL (
n
= 394 cells) and the
dACC and pre-
SMA together as the MFC (
n
=
332 cells). Only neu-
rons with an average firing rate greater than 0.2 Hz (
n
= 683) were
included in subsequent analyses.
Answering why and how questions required different types of
inference for the very same images, thereby allowing us to isolate
signatures of inference controlling for sensory input (which is the
same). We first examined neural activity in a single 1-
s long time
window following stimulus onset (200 to 1200 ms relative to stimu-
lus onset). A total of 17.9% of neurons in the MFC responded dif-
ferentially as a function of whether subjects were performing the
why or how task (“inference-
type neurons”; see Fig. 2E for a sum-
mary of selected neurons in each subregion; Fig. 2, B and D shows
examples; 56 of 313, 16 in dACC and 40 in pre-
SMA; binomial test,
P
< 10
−
20
; three-
way analysis of variance (ANOVA); see Materials
and Methods). Similarly, 13.2% of MTL neurons differentiated be-
tween the two tasks following stimulus onset (Fig. 2, A and C show
examples; 49 of 370, 13.2%, 28 in AMY and 21 in HIPP; binomial
test, ).
P
= 2.58 × 10
−
20
The proportion of neurons doing so was not
significantly different between the MTL and MFC (17.9% versus
13.2%;
χ
2
test of proportion:
P
=
0.09). Among all inference-
type
neurons, 60 of 105 (57.14%; see Fig. 2, A and B for examples and Fig.
2G for group results, and see fig. S1G for the proportions in each
area, respectively) showed higher activity in the why task (why-
preferring), and 45 (42.86%; see Fig. 2, C and D for examples and
Fig. 2H for group results) had a greater response in the how task
(how-
preferring). The proportion of why-
and how-
preferring
inference-
type neurons was similar across brain areas and hemi-
spheres (see fig. S1G and legend for details). As a control, we also
repeated the above analysis by selecting neurons with linear regres-
sion using response time as a nuisance regressor, with qualitatively
similar results (see the Supplementary Materials).
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025
Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
3 of 14
In addition to the above effects following stimulus onset, inference-
type neurons also differentiated between task types during the inter
-
stimulus interval period that preceded stimulus onset. This was
possible because the task is blocked (Fig. 1A; Fig. 2F shows an exam-
ple neuron; and Fig. 2, G and H shows the average firing rate during
the baseline period throughout the entire block for all inference-
type
neurons; and fig. S1, A to F shows the temporal dynamics aligned to
trial onset).
We next investigated how the neural population as a whole rep-
resented different types of inference. We performed single-
trial
population decoding on the firing rates following stimulus onset
(200 to 1200 ms) of all recorded neurons pooled across patients to
distinguish between inference types (why versus how). Decoding
accuracy was significantly above chance in both the MTL (Fig. 2I;
accuracy
=
73.68
±
4.69% [mean
±
SD],
P
<
0.001, compared
against the empirical null distribution) and the MFC (accuracy
=
86.39
±
3.67% [mean
±
SD],
P
<
0.001, compared against the em-
pirical null distribution). Decoding accuracy in the MFC was sig-
nificantly higher than that in the MTL (difference in accuracy:
12.72%,
P
=
0.04, compared against the difference of empirical null
distribution), suggesting that the MFC had a stronger association
with social inference at the population level (see Fig. 2K for the de-
coding performance in each subregion of the MTL and MFC). Sim-
ilar results were derived when we matched the number of neurons
between the MTL and MFC. Inference type was decodable through
the whole trial including the pre-
cue period (Fig. 2J and see fig. S1L
for decoding performance in the MTL and MFC separately). To-
gether, these data show that the type of inference is encoded in both
MFC and MTL.
Generalizability of inference encoding in the human
MTL and MFC
We next examined how the encoding of inference type (how versus
why) was modulated by other task variables. We first turned to the
visual category (i.e., face and hand), which is prominently encoded
across the ventral visual pathway (
66
–
68
) and the MTL (
42
,
69
). In
the human MFC, on the other hand, representations of visual cate-
gory are task dependent in a manner that is little understood, with
encoding in some and weak encoding in other tasks (
70
).
We examined whether the selectivity of inference-
type neurons
differed between images showing faces and hands. To do so, we se-
lected inference-
type neurons separately in face and hand trials (us-
ing a paired
t
test). A significant number of inference-
type cells (Fig.
3C and see fig. S2C for the proportions in each subregion) was iden-
tified for both face (
45of370
, 12.16% in the MTL;
45of313
,
14.38% in the MFC) and hand stimuli (
n
=
31
, 8.38% in the MTL
and
n
=
48
, 15.34% in the MFC). Therefore, both the MTL and MFC
represented inference types during both face and hand stimuli.
A
+
+
Is the person
admiri
ng
someone?
th
h
e per
e per
son
n
admi
adm
ring
rin
omeo
meo
ne?
?
Ye sN
o
Admiri
ng?
+
+
Admir
dmir
ing?
ing?
Ye sN
o
Is the person
looking to
their side?
No
No
N
Ye s
Fa
ces
...
t
he person
o
okin
g
to
h
eir side
?
Ye sN
o
...
...
+
+
Is the person
helping
someone?
Helping?
+
+
Lorem i
Hands
...
...
...
Bloc
k star
t
0.15 s
Ye s
No
Ye sN
o
Ye sN
o
No
Ye s
Both hands?
Looking side?
Social inf
erences (wh
y)
Perceptual judgment (Ho
w)
Bloc
k cue
2.1 s
Stimulu
s
1.7 s (max)
ISI cue
0.3 s
Stimulu
s
1.7 s (max)
Bloc
k cue
2.1 s
Stimulus
1.7 s (max)
ISI cue
0.3 s
Stimulus
1.7 s (max)
+
+
+
+
Bloc
k star
t
0.15 s
Interb
loc
k inter
val
0.2–13.4 s
End of b
loc
k
B
AMY
HIPP
Pre-SMA
dA
CC
Anter
ior
Poster
ior
Left
Right
C
D
Why
How
0
20
40
60
80
100
Accuracy (%)
****
Why
How
0
0.2
0.4
0.6
0.8
1
Response time (s)
****
Is the person
using
both hands?
Fig. 1.
Task, electrode locations, and behavior.
(
A
)
task paradigm. each session consisted of 16 blocks of 8 trials, with 128 trials in total. i
n each block, a set of face im-
ages with emotional expressions or hand images depicting intentional actions were paired with questions about motive (why) and implementation (how).
the blocks
alternated between why and how questions. i
mages of the same category were shown in neighboring blocks. each block began with a short fixation and full question
presentation. A brief interstimulus interval (iSi) cue was presented as a reminder of the question between image presentations. i
ndependently acquired normative data
are used to ensure that the selected images featuring unambiguous (i.e., consensus) response. each block contained five images eliciting a yes response and three im-
ages eliciting a no response.
the participants had up to 1.7 s to respond.
the task advanced either 0.2 s after a response or when the response time limit was reached.
the
block onsets were predesigned and fixed, although the block durations were contingent on response times. As a result, session durations of were approximately equal
across participants. (
B
) example electrode locations are shown on an Mni152 template brain.
each dot indicates the location of a microwire bundle in one subject.
(
C
) Behavior performance. Accuracy was calculated by comparing the participants’ response to the normative response. each dot represents a session. Only trials where
participants responded were included in the analysis. (
D
) Reaction time in why versus how trials.
****
P
<
0.0001.
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025
Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
4 of 14
Strikingly, the inference-
type neurons selected from trials in which
faces were shown were largely distinct from those selected in trials
in which hands were shown in the MTL (Fig. 3D; Chi-
squared test
of independence,
P
=
0.82) and the MFC, although only marginally
so, indicating more overlap in the MFC compared to MTL (Fig. 3E;
Chi- squared test,
P
=
0.08). This result also held in all subregions
(
Ps
>
0.05) of the MTL and MFC (fig. S2C). Consistent results were
revealed at the single-
trial level (fig. S2, A and B). Together, this
single-
neuron analysis indicates that inference-
type signals may
generalize across face and hand stimuli in the MFC, particularly in
the pre-
SMA. We next tested this prediction at the population level.
We next examined how inference type was represented at the
population level of all recorded neurons, with the ultimate goal of
examining whether representations generalize across hands and
face. We trained decoders to distinguish between how versus why
questions on one visual category (i.e., face) and then tested them on
MTL cell #114
0
40
80
120
Trials (sorted
by inference type)
−0.5
0
0.5
1
Time from stim onset (s)
2
4
Firing rate (Hz)
Why
How
MFC cell# 393
40
80
0
120
160
200
−0.5
0
0.5
1
4
6
8
Time from stim onset (s)
MFC cell# 437
−0.5
00
.5
1
0
20
40
60
80
100
120
5
10
15
Time from stim onset (s)
MTL cell# 790
0
40
80
120
160
200
−0.5
00
.5
1
1
2
Time from stim onset (s)
AB
CD
G
HI
10
15
0.5
1
1.5
2
Normalized firing rate (Hz)
Time from first trial onset (s)
0
5
0.6
0.8
1
1.2
1.4
1.6
Time from first trial onset (s)
Normalized firing rate (Hz)
10
15
0
5
J
F
Block Nr
2
4
6
8
Firing rate (Hz)
Bloc
k transitio
n
10
1
2
3
4
5
6
7
8
9
Wh
y-prefe
rring (
n
= 60)
Ho
w-pref
err
ing (
n
= 45)
40
60
80
100
Decoding performance (%)
Chance
MTL
MFC
t = 250ms
95th percentile null
−0.5
00
.5
1
Time from stim onset (s)
40
60
80
Decoding performance (%)
Chance
AMY
HIPP
ACC
0
5
10
15
20
Inference-tyle neurons (%)
Pre-SMA
E
Fig. 2.
Representation of inference type.
(
A
to
D
) example cells that discriminate between social (why) and perceptual (how) inference. [(A) and (C)] M
tl. [(B) and (D)] MFC.
(
E
) Percentage of inference-
type neurons in each brain area. Gray bars represent the percentage, and black circles indicate the chance level estimated by a permutation
test. (
F
) Average firing rate during the baseline period (
−
0.4 to 0 s relative to the stimulus onset) for each block for the cell shown in (D). neighboring blocks for the same
condition were collapsed, resulting in 10 inference-
alternating blocks. (
G
and
H
) Average normalized response in why versus how blocks for all inference selective neu-
rons.
the responses were aligned to the onset of the first trial in each block. Shaded area denotes
±
SeM across neurons. A dot indicates a significant difference between
the conditions in that bin (
P
<
0.05, two- tailed
t
test, corrected by false discovery rate (
90
) for
Q
<
0.05, bin size
=
500 ms, sliding window
=
100 ms). (G) i
nference-
type
neurons (
n
=
60) that responded more strongly to social inference (why). (h) i
nference-
type neurons (
n
=
45) that had a stronger responses to perceptual judgment (how).
(
I
and
J
) Population decoding of inference type.
the central mark on each box indicates the median, and the top and bottom edges of the box represent the 75th and 25th
percentiles, respectively. “
+
” symbol indicates outliers. (i) Decoding with mean firing rate on all M
tl neurons (
n
=
370) versus MFC neurons (
n
=
313). (J) Decoding with
a sliding time window on the whole population (
n
=
683; see Materials and Methods). AMY, amygdala; hiPP, hippocampus; ACC, anterior cingulate cortex; SMA, supple
-
mentary motor area.
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025
Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
5 of 14
MFC cell# 392
A
D
0
50
100
150
200
2
4
−0.5
0
0.5
1
Time from stim onset (s)
2
4
G
MTL
MFC
40
60
80
100
Decoding performance (%)
**
95th percentile of null
Cross categor
y
Within categor
y
Tr ain on f
ace
Tr ain on hand
H
Chance
MTLM
FC
40
60
80
100
Decoding performance (%
)
(test on scene)
Chance
50
100
150
200
0
1
2
3
Trials
(sorted by category)
Firing rate (Hz)
−0.5
0
0.5
1
1
2
Time from stim onset (s)
MTL cell# 790
Face why
Face how
Hand why
Hand how
MTL
39
6
25
None
n
= 300
MFC
34
11
37
None
n
= 231
B
C
MTL
MFC
0
5
10
15
Inference neuron (%)
Hand inference neuron
Face inference neuron
10
−
3
0
2
4
6
8
MFC
None
Face and hand
E
F
w
Face
w
Hand
6
24
8
0
MTLM
FC
40
60
80
100
Decoding performance (%)
w
ithin-category
Chance
I
J
MTL
MFC
−
0.2
0.2
0.6
Generalization index
10
−3
0
2
4
6
MTL
w
Hand
w
Face
6
2
4
0
10
−3
10
−
3
1.0
Withinin = cross-categor
y
Hand infe
rence neuron
Face inf
erence neuron
K
Tr ain on scene
Tr ain on f
ace
Tr ain on hand
Face and hand
Fig. 3.
Domain-
specific inference type encoding.
(
A
and
B
) example inference-
type cells. i
nference type contrast was shown separately for face and hand stimuli, with
different colors (dark colors for face and light colors for hand). (A) An example cell in the M
tl that did not generalize across face and hand stimuli. (B) An example cell in
the MFC that showed generalized effect across face and hand. (
C
) Proportions of inference-
type neuron in the M
tl and MFC. Yellow: selected with face stimuli; blue: se
-
lected with hand stimuli; purple: overlapping for face and hand. (
D
and
E
) Distribution of inference-
type cells selected using face and hand stimuli respectively in the M
tl
(D) and MFC (e). (
F
) Population decoding of inference type using within-
category decoder (i.e., train and test using either face or hand stimuli only) and cross-
category
decoder (i.e., train with one category and test on the other). (
G
) Generalization index of inference decoding (see Materials and Methods for computation).
the representa-
tion of inference generalized across face and hand in the MFC but not in the M
tl. (
H
and
I
) Scatter plot of the importance index (see Materials and Methods for details)
assigned by an inference-
type decoder to each cell built with face stimuli (
x
axis) or hand stimuli (
y
axis). (h) M
tl cells. (i) MFC cells. (
J
) i
nference-
type decoders trained and
tested within each category. Only cells (
n
=
281 in total) that collected in sessions where scene images presented in addition to face and hand images were included in
the analysis. l
egend conventions as in (G). (
K
) Decoding performance of inference type from cross social-
domain decoders: train on face and test on scene (yellow) or train
on hand and test on scene (blue). l
egend conventions as in (G).
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025
Cao
et al.
,
Sci. Adv.
10
, eado6166 (2024) 4 December 2024
SCienCe ADv
AnCeS
|
ReSeARCh AR
tiCle
6 of 14
the same visual category (within-
category decoding, i.e., face) or the
other (cross-
category decoding, i.e., hand). Confirming our earlier
finding (for which we pooled across hands and faces), inference type
was decodable using within-
category decoders for both face and
hand stimuli in the MTL (Fig. 3F; face: 84.40
±
5.10% [mean
±
SD];
P
<
0.001; hand: 68.48
±
5.91% [mean
±
SD];
P
=
0.001) and the
MFC (face: 88.92
±
4.67% [mean
±
SD];
P
<
0.001; hand: 86.48
±
4.94% [mean
±
SD];
P
<
0.001). Decoding accuracy for face stim-
uli was higher than that for hand stimuli in the MTL (difference in
accuracy with face-
hand: 15.92%,
P
=
0.01, compared against the
difference of empirical null distribution). This was the case in both
the AMY (difference in accuracy with face-
hand: 7.23%,
P
=
0.01)
and HIPP (difference in accuracy with face-
hand: 14.70%,
P
=
0.01;
fig. S2D).
By contrast, decoding accuracy was not significantly different in
the MFC (difference in accuracy with face-
hand: 2.44%,
P
=
0.08).
However, there were notable differences when looking at dACC and
pre-
SMA separately: Decoding accuracy was higher for face stimuli
(difference in decoding accuracy face-
hand: 9.91%,
P
=
0.01) in the
dACC, whereas the pre-
SMA had a higher decoding accuracy for
hand stimuli (difference in accuracy hand-
face: 3.83%,
P
=
0.005).
We next turned to examine cross-
condition generalization perfor
-
mance (train on face, test on hand, and vice versa). This revealed
that, in the MFC, decoding generalized (train with faces and test
with hands: 68.05
±
4.43% [mean
±
SD],
P
<
0.001; train with hands
and test with faces: 67.22
±
4.73% [mean
±
SD],
P
<
0.001). In con-
trast, in the MTL, cross-
condition generalization was not greater
than expected by chance (train with faces and test with hands:
53.25
±
4.37% [mean
±
SD],
P
=
0.17; train with hands and test with
faces: 52.78
±
4.75% [mean
±
SD],
P
=
0.23). This was also the case
separately in both AMY and HIPP. Quantifying this observation
with the generalization index (see Materials and Methods for the
definition) confirmed this observation (Fig. 3G and fig. S2E). Con-
sistently, face-
selected inference-
type neurons and hand-
selected
inference-
type neurons tended to contribute exclusively to the de-
coding of inference type for one category in the MTL (Fig. 3H; im-
portance index defined using weight in the decoder for each neuron)
but exhibited mixed effects in the MFC (Fig. 3I). These results indi-
cate that, in the MTL, social inference processes are coupled to
specific classes of stimuli and do not generalize across stimulus
categories (especially so for faces). In contrast, in the MFC, infer
-
ence processes were domain-
general across the two types of stimu-
lus categories (faces and hands).
Generalizability of inference representation between social
versus nonsocial world
While both MTL and MFC are implicated in social processing
(
57
,
71
), they are also involved in the processing of general nonso-
cial objects (e.g., selectivity to different object categories) (
69
). This
thus raises the question of whether making inferences in the social
and nonsocial world share a common neural mechanism in these
brain areas—a question related to the long-
standing question about
whether social processing is specialized in some way. To address this
question, we also included images of scenes showing nonhuman
natural events in the task in a subset of patients (
n
=
281 neurons
from nine sessions; see Materials and Methods for details). As be-
fore, we asked our patients, for the same image, to either judge its
perceptual properties (e.g., “is the photo showing rain?”) or make
inferences about the hidden states that caused what the image shows
(e.g., “is it a result of thunderstorm?”). At the single-
neuron level, a
significant number of neurons (fig. S3, C and D) discriminated why
versus how following the onset of scene images in both MTL (18 of
174, 10.34%, binomial
P
=
0.0012; 8.91% in AMY and 12.33% in
HIPP; see fig. S3A for an example) and MFC (18 of 107, 16.82%,
P
=
1.47
×
10
−
6
; 13.95% in dACC and 18.75% in pre-
SMA; see fig.
S3B for an example). Analysis of the single-
trial response selectivity
index (RSI) confirmed that these neurons discriminated why versus
how questions for natural scene stimuli [fig. S3E; Kolmogorov-
Smirnov (KS) test: MTL, KS
=
0.21,
P
=
0.59
×
10
−
17
; MFC,
KS
=
0.22,
P
=
0.02
×
10
−
17
]. At the population level, inference type
was decodable for scene images in both MTL (Fig. 3J; 63.50
±
4.76%
[mean
±
SD];
P
=
0.004) and MFC (69.03
±
4.87% [mean
±
SD];
P
=
0.004). These results suggested that the MTL and MFC repre-
sent inference type also for nonsocial images.
We next repeated the cross-
condition generalization analysis for
the scene images. First, mirroring our earlier finding, inference-
type
neurons selected using social stimuli were largely separate from
those selected using scene stimuli in both the MTL (3 of the 21 face-
selected and 3 of the 14 hand-
selected inference-
type neurons were
also selective for scene inference) and MFC (3 of the 19 face-
selected
and 3 of the 16 hand-
selected inference-
type neurons were also se-
lective for scene inference). Second, single-
trial RSI analysis con-
firmed this result by showing that the inference-
type neurons in the
MTL selected with social images could not discriminate why versus
how conditions of scene stimuli (fig. S3, E and F). In line with indi-
vidual neuron level results, decoding did not generalize across cat-
egories (Fig. 3K; face versus scene and hand versus scene) in neither
the MTL (train with faces and test with scene: 49.87
±
4.84%
[mean
±
SD],
P
=
0.42; train with hands and test with scenes:
49.97
±
4.54% [mean
±
SD],
P
=
0.50) nor MFC (train with faces
and test with scene: 50.27
±
5.93% [mean
±
SD],
P
=
0.44; train with
hands and test with scenes: 50.07
±
5.43% [mean
±
SD],
P
=
0.54).
Together, our results suggest that the neural substrates in the
MTL and MFC for making inferences in the social versus nonsocial
domain are domain specific. In contrast, in the MFC, inference was
domain general between different subtypes of social domains (hands
and faces).
Representation of visual categories in the MTL and MFC
An open question is whether social inference processes share neural
substrates with other cognitive processes that involve the MTL and
MFC. Neurons in both areas prominently encode visual categories
(
42
,
69
,
70
,
72
). We therefore started our analysis by examining the
encoding of visual category in our dataset. Note that we restricted
this analysis to the face and hand stimuli (scene stimuli were not
examined for this analysis). As expected, neurons were modulated
by visual category following stimulus onset (200 to 1200 ms) in both
the MTL (65 of 370, 17.57%, binomial
P
<
10
−
20
; 42 neurons in
AMY and 23 neurons in HIPP; see an example in Fig. 4A and group
results in Fig. 4C) and the MFC (48 of 286, 16.61%, binomial
P
=
9.70
×
10
−
13
; 20 neurons in dACC and 28 neurons in pre-
SMA;
see an example in Fig. 4B and group results in Fig. 4C). We refer to
these neurons as category-
selective (CS) neurons. Sixty-
one of the
113 CS neurons (53.98%; see Fig. 4, D to F) showed higher activ-
ity for faces (face-
preferring), with the remaining 52 (46.02%;
Fig. 4, D, G, and H) showing a greater response to hands (hand-
preferring). The proportions of the two types of neurons were
comparable (Fig. 4D) in HIPP (face-
preferring: 10 of 23, 43.48% versus
Downloaded from https://www.science.org at California Institute of Technology on February 14, 2025