nature human behaviour
https://doi.org/10.1038/s41562-024-02059-4
Registered Report
How trait impressions of faces shape
subsequent mental state inferences
In the format provided by the
authors and unedited
Supp�ementary information
1
1
Supplementary Information
for
2
Stage
2
Registered Report
3
4
H
ow
t
rait impressions
of
faces
shape
subsequent
mental state inferences
5
Table of Content
6
•
Supplementary information
7
o
Supplementary Methods for Pilot Studies in Stage 1 Registered Report
8
o
Supplementary Methods for Causal Effects in Stage 2 Registered Report
9
o
Supplementary Methods for Interpretations of Ridge Regression Coefficients
10
11
•
Supplementary Materials
12
o
Supplementary
Figures
1
-
6
13
o
Supplementary Tables 1
-
4
14
15
16
2
Supplementary
information
1
Supplementary Methods
for Pilot Studies in Stage 1 Registered Report
2
We collected pilot data for
5 randomly selected mental states from 300 participants
3
via Amazon Mechanical Turk (N = 60 per mental state
; Age [M = 35.7, SD =
11.6
]
, 36 males
4
and 23 females
for “attentive”; Age [M =
33.0
, SD =
10.8
],
34
males and
26
females for
5
“
gleeful
”
; Age [M =
33.3
, SD =
10.1
],
39
males and
21
females for “mad”; Age [M =
37.2
, SD =
6
13.4
],
38
males and
22
females for “panicked”; Age [M =
37.8
, SD =
10.8
],
33
males and
27
7
females for “sorrow”
).
8
We applied the following exclusion criteria to the data: participant
-
wise exclusion
9
was done if a participant was
i
) not a native English speaker, or
ii
) had not completed high
10
school, or
iii
) reported the meaning of the mental state was not clear, or
iv
) reported the
11
meaning of the scenario was not clear, or
v
) gave a rating of 1 to any of the average person
12
question, or
vi
) failed more than one attention check, or g) gave more than 90% of the faces
13
the same rating.
Trial
-
wise exclusion was done if a trial
was
an outlier in rating or response
14
time (see
Methods
:
Sampling Plan
)
. After exclusion, the remaining sample size was 50, 45,
15
45, 57, 55 participants for attentive, gleeful, mad, panicked, sorrow, respectively.
16
The pilot data showed that individual
-
level ratings across all faces and participants
17
for each mental state
were, as expected, heavily distributed to the right with a longer tail to
18
the left (Ms = 4.6, 4.4, 4.4, 4.3, 4.4; SDs = 1.3, 1.5, 1.4, 1.5, 1.4; skewness =
-
0.14,
-
0.17,
-
19
0.21,
-
0.14,
-
0.28;
kurtosis
=
-
0.43,
-
0.71,
-
0.49,
-
0.60,
-
0.40, for attentive, gleeful, mad,
20
panicked, sorrow, respectively).
The floor
and
ceiling effects were minimal (rating 1 no more
21
than 2% and rating 7 no more than 7% across all five
mental
states).
22
The pilot data showed that t
he between
-
subject consensus for the majority of
23
mental states
was moderate to good (ICC = 0.70, F = 4.03, df1 = 99, df2 = 4851, CI [0.63,
24
0.
76
]
for attentive; ICC =
0.85, F = 10.63, df1 = 99, df2 = 4356, CI [0.81, 0.88] for gleeful; ICC
25
= 0.81, F = 7.34, df1 = 99, df
2 =
4356,
CI [0.77, 0.86] for mad; ICC =
0.62,
F = 3.50, df1 = 99,
26
df2 = 5544, CI [0.54, 0.70] for panicked; ICC = 0.44, F = 2.31, df1 = 99, df2 = 5346, CI [0.33,
27
0.55];
all
p
s < 0.001
).
A
ll CIs are 95% confidence intervals
;
all ICCs were estimated with two
-
28
way random
-
effects model
on
the
mean of
raters
for consistency
.
29
Supplementary Methods for Causal Effects in Stage 2 Registered Report
30
T
o test the causal effect of trait impression from faces on context
-
specific mental
31
state inferences (Table 1, H4), we
focused
on
a subset of target mental states, one from
32
each mental state dimension.
As pre
-
registered,
we first identified the mental states
loaded
33
on each dimension (Table 1, H2a); for each dimension, we then identified the mental state
34
with the greatest unique explained variance by face traits (Table 1, H1b). These procedures
35
resulted in four targeted mental states: embarrassed (from the
sentimental
mental state
36
dimension), threatened (from the
youthful mental state
dimension), jealous (from the
37
empathetic mental state
dimension), and lonely (from the
competen
t mental state
38
dimension).
39
3
W
e then
identified the trait that was mostly strongly associated with each target
1
mental state to manipulate. Both embarrassed and jealous were most strongly associated
2
with trait femininity. Thus, as pre
-
registered, we
diversified the target traits by selecting the
3
state
-
trait pairs that
generated the greatest
absolute
average coefficient
(i.e., model
4
weight
s
of
the full model in H
1b
) across the four state
-
trait pairs.
These procedures resulted
5
in four state
-
trait pairs: state embarrassed with trait strong, state threatened with trait
6
white, state jealous with trait feminine, and state lonely with trait leader
-
like.
7
For each target trait,
we generated
the
B+
a
nd B
-
sets
for face manipulation
.
As
pre
-
8
r
e
gistered, to maximize the mean trait rating difference between the B+ and B
-
sets (i.e.,
9
maximize trait
manipulation),
while minimizing the mean state rating difference between
10
the B+ and B
-
sets (i.e., balance face states)
,
given a similar number of faces in both the B+
11
and B
-
sets,
we
define
d
the following metric
f
or any
퐵
+
and
퐵
−
sets
.
12
∆
푡푟푎푖푡
=
푚푒푎푛
푡푟푎푖푡
푖
표푓
퐵
+
−
푚푒푎푛
푡푟푎푖푡
푖
표푓
퐵
−
13
∆
푠푡푎푡푒
=
∑
(
푚푒푎푛
푠푡푎푡푒
푘
표푓
퐵
+
−
푚푒푎푛
푠푡푎푡푒
푘
표푓
퐵
−
)
8
푠푡푎푡푒푠
푘
=
1
푛
푠푡푎푡푒푠
14
푓푖푡푛푒푠푠
=
∆
푡푟푎푖푡
−
∆
푠푡푎푡푒
15
We then
repeat
ed
the following optimization procedure for various set sizes
(from
3
16
to 14 faces
in each set)
, each with 10,000 iterations
. For each set size N and trait, we
17
define
d
the initial
퐵
+
and
퐵
−
sets to be the top
-
N faces and bottom
-
N faces based on the
18
face
-
trait ratings.
Then for each iteration, we randomly selected one face from
19
⋃
{
퐵
+
,
퐵
−
}
,
푓푎푐푒
푖
, to be potentially replaced. If
푓푎푐푒
푖
∈
퐵
+
, we randomly selected one
20
face from
{
푡표푝
50
푓푎푐푒푠
\
퐵
+
}
,
푓푎푐푒
푗
; if
푓푎푐푒
푖
∈
퐵
−
, we randomly selected one face from
21
{
푏표푡푡표푚
50
푓푎푐푒푠
\
퐵
−
}
,
푓푎푐푒
푗
.
If
푓푖푡푛푒푠푠
푖
≤
푓푖푡푛푒푠푠
푗
,
we replaced
푓푎푐푒
푖
with
푓푎푐푒
푗
;
22
otherwise, we replaced
푓푎푐푒
푖
with
푓푎푐푒
푗
with probability
푝
:
23
푝
=
−
푓푖푡푛푒푠푠
푖
−
푓푖푡푛푒푠푠
푗
푟푒푚푎푖푛푖푛푔
%
표푓
푖푡푒푟푎푡푖표푛푠
∗
50
24
This optimization procedure produced the
퐵
+
표푝푡푖푚푎푙
and
퐵
−
표푝푡푖푚푎푙
sets that gave
25
the greatest fitness index per trait per set size.
Al
l
optimization results converged
wi
t
hi
n
26
10,000 iterations
.
Finally, to determine the optimal set size
(i.e., the number of faces in each
27
set)
, we compared the gain in fitness and loss in trait ma
n
ip
ul
atio
n
b
et
w
een
the
optimal
28
solution
(i.e.,
퐵
+
표푝푡푖푚푎푙
and
퐵
−
표푝푡푖푚푎푙
sets)
and the initial sets across different set sizes
29
(note that the initial set with the top
-
N faces as the
퐵
+
set and bottom
-
N faces as the
퐵
−
set
30
always g
ave
the greatest trait manipulation).
This procedure showed that set size = 4 gave
31
the greatest gain in fitness and the least loss in trait manipulation.
Therefore,
for each trait,
32
we used the optimal solutions
퐵
+
표푝푡푖푚푎푙
and
퐵
−
표푝푡푖푚푎푙
of
size 4
for
trait manipulation.
33
푔푎푖푛
푖푛
푓푖푡푛푒푠푠
=
푓푖푡푛푒푠푠
{
퐵
+
표푝푡푖푚푖푎푙
,
퐵
−
표푝푡푖푚푎푙
}
−
푓푖푡푛푒푠푠
{
퐵
+
푖푛푖푡푖푎푙
,
퐵
−
푖푛푖푡푖푎푙
}
34
푙표푠푠
푖푛
푡푟푎푖푡
푚푎푛푖푝푢푙푎푡푖표푛
=
∆
푡푟푎푖푡
푖푛푖푡푖푎푙
−
∆
푡푟푎푖푡
표푝푡푖푚푎푙
35
표푝푡푖푚푖푧푒푟
=
푔푎푖푛
푖푛
푓푖푡푛푒푠푠
−
푙표푠푠
푖푛
푡푟푎푖푡
푚푎푛푖푝푢푙푎푡푖표푛
36
4
Supplementary
Methods
for Interpretations of Ridge Regression Coefficients
1
To test the causal effect of face
-
trait on scenario
-
state inferences, we have pre
-
2
registered to select the pair
s
of face
-
trait
s
and scenario
-
state
s
based on the Ridge
3
regression coefficients
in our analysis H1b (see Table 1 and Methods in the manuscript).
4
However, interpreting the coefficients from Ridge regressions as the associations between
5
the dependent variable and the independent variables may not be suitable given the
6
parameter shrinkage introduced by Ridge regressions.
To address this concern
, we carried
7
out three additional analyses (see details below). Results from all analyses confirmed that
8
the interpretation
of coefficients from Ridge regressions as the independent variables’
9
relative associations with the dependent variables is valid.
10
Specifically, we
conducted three
additional
analyses
to validate the interpretation of
11
the coefficients in Ridge regressions: i)
o
rdinary
l
east
s
quares regressions, ii) ‘Ridge
-
select’
12
analysis where we first applied Ridge regression to identify and eliminate non
-
significant,
13
zeroed
-
out independent variables, and then applied
o
rdinary
l
east
s
quares regressions to
14
the remaining independent variables, and iii) Lasso regressions. For all methods, we fitted
15
them to
the
full model: regressing the factor scores (
dependent variables corresponding to
16
Fig. 5
in our manuscript
) or scenario
-
state inferences (
dependent variables corresponding to
17
Supplementary Fig. 3) on
the
13 face
-
traits while controlling for
the
8 face
-
states. We then
18
compared these results with the coefficients we obtained with the Ridge regression models.
19
For all three additional analysis methods mentioned above, we fitted the models
20
using a similar cross
-
validation procedure as we preregistered for Ridge regressions (see
21
Methods in the manuscript).
22
i) For
o
rdinary
l
east
s
quares regressions, we randomly split the data into 80%
23
training and 20% test sets for 2,000 times for cross
-
validation.
Different from Ridge
24
regressions, since there is no hyperparametric optimization is required in
o
rdinary
l
east
25
s
quares regressions, only this outer
-
loop cross
-
validation procedure was performed and no
26
nested cross
-
validation was needed.
27
ii) For ‘Ridge
-
select’, both the outer
-
loop cross
-
validation and nested cross
-
validation
28
procedures were performed. Different from Ridge regressions, here an additional bootstrap
29
sampling procedure at each outer
-
loop cross
-
validation was introduced to identify
30
statistically significant, non
-
zeroed
-
out independent variables that we kept for the second
-
31
stage
o
rdinary
l
east
s
quares regressions. Specifically, at each outer
-
loop cross
-
validation,
32
after determining the optimal regularization hyperparameter via the nested cross
-
33
validation,
we generated bootstrap samples of the training data for 2,000 times, each time
34
f
itting
a Ridge regression model with the optimal hyperparameter
to the bootstrapped
35
sampled training data. This procedure yielded a bootstrap distribution for each model
36
coefficient, from which we calculated bootstrap p
-
values to identify coefficients that were
37
significantly different from zero (using a two
-
sided test where the 95% confidence interval
38
of the empirical bootstrap distribution of the coefficient did not include zero).
Finally, at
39
each iteration of the outer loop, we fitted an
ordinary least squares regression
model using
40
only these statistically significant independent variables, thereby estimating their
41
coefficients without the influence of shrinkage.
42
5
iii) For Lasso regressions, similar to Ridge regressions, both the outer
-
loop cross
-
1
validation and nested cross
-
validation procedures were performed. That is, we optimized
2
the regularization hyperparameter for a range of 40 log
-
spaced hyperparameters between
3
0.001 and 100 using nested cross
-
validation. We then used the optimal regularization
4
hyperparameter
to
estimate the distribution of model prediction accuracy and model
5
coefficients across
the
outer
-
loop cross
-
validation
iterations
.
6
Results showed that the model weights from all three additional analys
is methods
7
validate those from
R
idge regression
s
: The model coefficients obtained from the
R
idge
8
regression
models were highly correlated with those from
o
rdinary
l
east
s
quares regressions
9
(
r
s = 0.967, 0.975, 0.988, and 0.959 for the four factor scores respectively, and on average
r
10
= 0.927 across 60 scenario
-
state inferences; for all
predictors
p
< 0.001, two
-
sided, assessed
11
with permutation test);
‘Ridge
-
select’
regression
s
(
r
s = 0.983, 0.976, 0.909, and 0.967 for the
12
four factor scores respectively, and on average
r
= 0.943 across 60 scenario
-
state inferences;
13
for all
predictors
p
< 0.001); and
L
asso regression
s
(
r
s = 0.993, 0.992, 0.982, and 0.994 for
14
the four factor scores respectively, and on average
r
= 0.960 across 60 scenario
-
state
15
inferences; for all
predictors
p
< 0.001). Detailed results are summarized in
Supplementary
16
Figures
4
and
5
shown below.
17
18
6
Supplementary
Materials
1
2
Supplementary Figure 1:
Experimental procedures
.
a,
Experimental procedures for the
scenario
-
state
t
ask.
b,
Experimental procedures for the
f
ace
-
t
rait task.
c,
Experimental procedures for the
f
ace
-
s
tate
task.
7
1
Supplementary Fig
ure
2
:
Prediction accuracy of mental state ratings from trait ratings as a
function of sample size
from pilot studies
.
Each graph plots the mean prediction accuracy
assessed with Pearson’s correlation
averaged across
the
40
repeats
of
randomly
selecting a
participant to remove
(y
-
axis, solid color line)
and the 2.5
th
and 97.5
th
percentiles across the
40 random repeats
(color shaded area)
as a function of the number of participants (x
-
axis).
The color dashed line indicates the prediction accuracy with the full
sample (N = 50, 45, 45,
57, 55 participants for attentive, gleeful, mad, panicked, sorrow, respectively). The black
dashed line indicates the chance
-
level of prediction accuracy, which was the 95
th
percentile
of the null distributions of the prediction accuracies obtained via permutation tests.
8
9
10
Supplementary Figure
3
:
Regression coefficients of the 13 representatively sampled traits.
Each radar plot
shows
results for one Ridge regression model that regressed the scenario
-
state inferences on the 13 representative
traits while controlling for 8 representative
face
-
states (i.e., the full model in the variance partition analysis). Dots indicate the mean
coefficient values averaged across 2,000 cross
-
validation iterations. Colours of dots indicate
the sign of the coefficient (red: positive, blue: negat
iv
e; greater saturation greater strength
).
11
Supplementary Figure
4
:
P
rediction accuracy (a) and model coefficients (b) compared
between Ridge regressions and one of the three additional analysis methods for scenario
-
state inferences.
Each model regresses one of the 60 scenario
-
state inferences (using
the
average scenario
-
state ratings per face on the given mental state
) on the 13 face
-
traits while
controlling for the 8 face
-
states across the faces.
The dots indicate the mean prediction
accuracies (a) and model coefficients for the 13 traits (b) averaged
across 2,000 outer
-
lo
op
cross
-
validation iterations
obtained using Ridge regressions (x
-
axis) versus one of the three
additional analysis methods (y
-
axis)
.
12
Supplementary Figure
5
:
Prediction accuracy (a) and model coefficients (b) compared
across four different analysis methods for mental state dimensions.
Each model regresses
one of the four factors (using their factor scores) on the 13 face
-
traits while controlling for
the 8 face
-
states across the faces. The color indicates results from four different analysis
methods
: Ridge, OLS, Ridge
-
select, and Lasso
.
The violin plots represent the density and
distribution of prediction accuracies
in (a) and model coeffi
cients in (b).
The central line
within each plot indicates the mean
values
across n = 2,000 cross
-
validation iterations, and
the whiskers extend to cover the full range
of the
values
across these iterations.
13
Supplementary Figure
6
:
Variance partitions of mental state dimensions
.
X
-
axes indicate
1
explained variances (squared Pearson correlation between predicted and actual values).
The
2
y
-
ax
i
s indicate
s
different models, one for each mental state dimension. The bar
length
3
indicates mean explained variance averaged across 2,000 cross
-
validation iterations. Error
4
bars indicate the
confidence interval
(i.e.,
2.5
th
and 97.5
th
percentiles
) across these
5
iterations.
The boxes span from the 25
th
to 75
th
percentiles and the mid
-
lines in the boxes
6
indicate the median values across these iterations.
Bar colours indicate different types of
7
explained variance
s
(blue: uniquely explained by 13 trait ratings inferred from faces; red:
8
uniquely explained by 8 state ratings inferred from faces; orange: commonly explained by
9
the 13 trait ratings as well as the 8 state ratings inferred from faces);
desaturated colo
u
r
s
10
indicate
non
significant
results (i.e.,
2.5
th
percentile
below zero).
11
12
14
Terms for
Scenario
-
S
tate Tasks
(N = 60)
affectionate
feeling fondness or tenderness
agitated
feeling
annoyed, restless, or nervous
amused
finding something funny or entertaining
attentive
paying close attention to something
awe
feeling respect or wonder
awkward
feeling
uncomfortable
or unpleasant
bored
lacking interest in one’s current
activity
brokenhearted
feeling overwhelmed by grief or
disappointment
calm
not feeling any worry, anger,
or
excitement
cheerful
feeling happy
or
optimistic
compassionate
feeling
what the other person is feeling and
wanting
to help
concerned
feeling worried or anxious
content
being in
a state of peaceful happiness
depressed
feeling hopeless, worthless, or lacking interest
in li
fe
desperate
feeling a situation is so bad
that it is
impossible
to deal with
distressed
feeling very unhappy, worried, or anxious
doubtful
feeling uncertain about something
embarrassed
feeling uncomfortable or nervous for the
attention of others
encouraged
feeling supported
and confident
enthusiastic
feeling
strong
excitement
fear
feeling an unpleasant emotion caused by
perceived danger or threat
gleeful
feeling triumphantly joyful
grief
-
stricken
feeling
deeply affected by
sorrow
that
caused
by
someone’s death
guilty
feeling shame as a result of bad conduct
hateful
feeling a strong emotional dislike
helpless
feeling lack of power to do
anything
helpful
hesitant
feeling unsure or t
e
ntative
homesick
yearning
for home or family
hopeful
feeling optimism about a future event
impatient
feeling
restlessly eager
indecisive
not making decisions quickly and effectively
indifferent
having no particular interest
, unconcerned
interested
feeling curious or concern about something
jealous
feeling envy of someone or their achievements
and
advantages
lonely
feeling
unhappy
because of not connecting with
others
longing
feeling a
strong desire
especially for something
unattainable
love
feeling a deep
affection
mad
feeling very
angry
nostalgic
yearning for the past, typically for a period or
place with happy personal associations
overwhelmed
feeling completely sub
merged by one’s
emotions
panicked
suddenly feeling
uncontrollably
anxious
petrified
feeling
so
frightened
that one is unable to move
pleased
feeling pleasure and satisfaction
regretful
feeling
sad or disappointed over something that
has happened or been done
relaxed
feeling free from tension and anxiety
relieved
no longer feeling distressed or anxious
sad
feeling
unhappy or disappointed
15
satisfied
feeling
a desired has been fulfilled
self
-
conscious
feeling
excessively conscious of one’s
appearance or manner
smug
feeling an excessive pride in oneself or one’s
achievements
sorrow
feeling deep distress caused by loss
sorry
feeling
apologetic
and acknowledging one’s
fault or failure
stressed
feeling mentally or emotionally
strained or
tense
suspicious
feeling cautious distrust of someone or
something
sympathetic
feeling concern about someone who is in a bad
situation
thankful
feeling grateful for what one has
threatened
feeling
others might do harm or act hostilely
against oneself
troubled
feeling
distressed
over problems or conflict
upset
feelin
g
disappointed, worried, or angry
wanting
feeling
a strong desire
Terms for Face
-
Trait Task
(N = 1
3
)
easygoing
a
person who is relaxed, tolerant, and not
prone to rigid rules or bouts of temper
sensitive
a
person who is aware of or careful about
others' attitudes, feelings, or circumstances
serious
a
person who shows deep thoughts and who
doesn't smile or laugh easily
abusive
a
person who is extremely offensive and
insulting
leader
-
like
a
person who can take charge and help a group
accomplish a goa
l
articulate
a
person who speaks fluently and clearly, and
who can express their ideas well
disorderly
a
person who is untidy and not organized
feminine
a
person whose
has qualities or an
appearance
traditionally associated with women
strong
a
person who is physically vigorous and is able
to exert great bodily or muscular powe
r
beautiful
a
person who looks appealing and physically
attractive
youthful
a
person who looks young
or has qualities
associated with young people
unhealthy
a person who is not in good health
white
a
person whose face looks like they are
Caucasian
or have
European
ancestry
Terms for Face
-
State Task
(N = 8)
contemplating
thinking
deeply and carefully
judging
forming an opin
i
on
attentive
paying close attention to something
gloomy
feeling depressed and
pessimistic
peaceful
feeling free from
disturbance
angry
feeling strong annoyance
, displeasure,
or
hostility
affectionate
feeling fondness or tenderness
bewildered
feeling confused or puzzled
Supplementary
Table 1
:
Definition of mental state
terms
and trait terms.
The
definition of
each
mental state term and trait term will be
provided to participants
in our experiment
instructions to
eliminate possible heterogeneity in how each individual understands the
meaning of a
term
. These definitions were obtained from Google dictionary, with necessary
modifications to make the definition easy to understand and fit the
study
context
.