Four dimensions
characterize
comprehensive trait judgments of faces
Chujun Lin
1
*
, Umit Keles
1
, Ralph Adolphs
1,2
1
Division of the Humanities and Social Sciences, California Institute of Technology,
Pasadena,
CA, USA.
2
Division of Biology and Biological Engineering, California Institute of Technology,
Pasadena,
CA, USA.
*
Correspondence to: clin7@caltech.edu
2
Abstrac
t
1
People readily attribute
many
traits
to faces: some look
beautiful
,
some
competent
, some
2
aggressive
1
. These
snap
judgments
have important consequences in real life, ranging from
3
success in political elections to
decisions in
cour
troom sentencing
2,3
. Modern psychological
4
theories argue that the hundreds of
different words people use to describe
others
from their
faces
5
are well captured by only two or three dimensions, such as
valence
and
dominance
4
, a highly
6
influential framework that has been the basis for
numerous
studies in social and developmental
7
psychology
5
–
10
, social neuroscience
11,12
, and in engineering applications
13,14
. However, all prior
8
work has used only a small
number
of
words
(12
to 18)
to derive underlying dimensions
, limiting
9
conclusions to date. Here we
employed
deep
neural networks to
select a
comprehensive set of
10
100
words
that are representative of
the trait
words
people use to describe face
s
, and
to select a
11
set of 100 faces.
In two
large
-
scale, preregistered studies we asked participants to rate the
100
12
faces on the 100 words
(obtaining 2,850,000 ratings from 1,710 participants
)
, and discovered a
13
novel set of four psychological dimensions that best explain
t
rait
judgments
of faces
: warmth,
14
competence, femininity, and youth.
We reproduced these
four
dimensions across
different
15
regions
around the world, in both aggregated and individual
-
level da
ta. These results provide a
16
new and
most comprehensive characterization of face
judgments
, and reconcile prior work on
17
face perception with work in social cognition
15
and personality psychology
16
.
18
3
Main
1
People attribute a wide range of
traits
(temporally stable characteristics, see
Methods
)
to
other
2
individual
s
upon viewing the
ir
face
s
, such as demographics (e.g.,
gender
,
age
), physical
3
appearance (e.g.,
baby
-
faced
,
beautiful
), social evaluation (e.g.,
trustworthy
,
competent
), and
4
personality (e.g.,
aggressive
,
sociable
)
4,17
. These
trait judgments
are made ubiquitously
and
5
rapidly
1
, and are known to influence most
subsequent processing, such as conscious perception
18
6
and
memory
19
o
f the face. Although
trait judgments of faces
are
in many cases ina
ccurate
and
7
reveal more about our own stereotypes than ground truth
1,20
,
they
have major consequences for
8
social deci
sion
-
making in real life
, ranging from success in job markets
and social relationships
9
to political elections and courtroom
decisions
2,3,21
–
23
.
10
Despite the considerable amount of work on the topic
1,24
–
33
, it remains unclear
how people make
11
these rapid
judgements
: do they have distinct representations for each of the hundreds of possible
12
words that describe
somebody
based on
the face
, or do they
map their
judgments
of the face
into
13
a
much
lowe
r
-
dimensional space? By analogy, we can perceive (and have words for) many
14
different shades of colors, but th
ey
are all the result of a three
-
dimensional color space. In the
15
case of color, the answer is easier because we know that there are only three kind
s of cones in the
16
retina; in the case of
trait
judgment
s
of faces
, we must infer the psychological space from
17
behavioral data
(human subjects’ ratings of faces on different trait wor
ds
)
.
Prior
approaches
have
18
discovered dimensional frameworks that have
largely shaped studies both within and outside the
19
field
1,5
–
12,24,26,34
–
36
, but
those approaches
used
only a small number of trait
words
(12 to 18)
th
at
20
were
common across studies
17,34,37
or
in use
by lay people
4,5
.
Moreover, th
o
se words
are
partly
21
redundant
in meaning
a
nd
may not
encompass
the
full
range of
trait
words
that
people
can
use
to
22
describe faces
, contributing to disagreements in the literature.
23
4
Here we argue that to understand the true dimensionality of face
judgments
, it is essential to
1
investigate a more comprehensively sampled set
of
judgments
.
To meet this challenge, we
2
assembled an
extensiv
e
list of
trait words
that people use to describe faces from multiple
3
sources
1,3,4,16,17,21,25
–
31,33,38,39
and applied a pre
-
trained neural network to derive a representative
4
subset of 100
words
(
Fig. 1
a
-
d
). Similarly, we combined multiple extant face databases and
5
applied a pre
-
trained neural network to derive a represent
ative subset of 100 face
images
(
Fig.
6
1
e
-
h
) [
see
Methods
].
We verified that
our
100
words
were
indeed
representative of the
trait
7
words people
spontaneously
generate for
the selected 100 faces
(
Extended Data Fig. 1
a,
b
;
Fig.
8
1d
)
, and that our 100 faces
were representative of the structural physiognomy of natural faces
9
(
Extended Data Fig
.
1c,d,e
; Fig. 1h
) although we note that we only used Caucasian faces with
10
no emotional expressions [see
Methods
]
. We collected ratings of the 100
faces
on the 100
word
s
11
b
oth sparsely online (
Study 1) [
750,000 ratings from 1,500
participants
with
repeated ratings for
12
assessing
within
-
subject consistency
for every trait
]
and densely on
-
site
(Study 2) [
10,000
ratings
13
from
each of 210 participants across North America, Latvia,
Peru, the Philippines, India, Kenya,
14
and Gaza
]
. All experiments were preregistered on
the
Open Science Framework (
see
Methods
).
15
5
1
Fig. 1: Sampling trait
words
(a
-
d
) and face
images
(
e
-
h
) to generate a comprehensive set.
2
a,
We began by
assembling a
n extensive list of
trait
word
s
1,3,4,16,17,21,25
–
31,33,38,39
spanning all
3
important categories of trait judgments of faces
.
b,
E
ach
adjective
was represented
with a vector
4
of 300 semantic features
that describe
word embeddings and text classification using a state
-
of
-
5
6
the
-
art neural networ
k
that had been pre
-
trained to assign words to their contexts across 600
1
billion words
40
.
c,
Three filters were applied to remove words with similar meanings, unclear
2
meaning, and infrequent usage
(see Methods)
.
d,
Comparing the final selected 100 traits
3
(
Extended Data Table
S
1
) with spontaneous trait judgments of f
aces (Extended Data Fig.1)
.
4
Uniform Manifold Approximation and Projection (UMAP
41
,
a dimension
ality
reduction
5
technique
that
generalize
s
to nonlinearities
) showed that the
100 selected
traits (blue dots;
6
examples
labeled
in blue
) were
representative of the words people freely
generated
to describe
7
spontaneous
judgments
of the 100 faces (gray dots
, see Methods
;
non
-
overlapping
examples
8
labeled in gray
, which
were mostly
momentary
mental states
rather than
traits
).
e,
For face
s
we
9
began b
y assembling a
set
of frontal, neutral, white faces from t
hree
popular
face databases
42
–
44
.
10
f,
Each face was represented with a vector of 128 facial features
that are used to classify
11
individual id
entities
using neural network
45
pre
-
trained to identify individuals across millions of
12
faces of all different aspects and race
s
.
g,
Maximum variation sampling
46
was applied to select
13
faces with maximum variability in facial structure
in this 128
-
D space
.
h,
UMAP
showed that
the
14
final selected 100 faces (stars) w
ere representative of a larger set of frontal, neutral, white
faces
15
from
various
databases
47
–
49
(dots
)
[
Extended Data Fig.
1
]
.
16
Four dimensions
underlie
trait
judgments of
faces
17
S
tudy 1
applied exploratory factor analysis (EFA
; preregistered
) on
aggregate
-
level
ratings that
18
participants
had given
to faces
. We confirmed that
these
ratings
showed sufficient variance
19
(
Extended Data Fig. 2
a
), within
-
subject
consistency
(
assessed with
Pearson’s correlations
,
M
=
20
0.47,
Range
= [0.28, 0.84]
, as well as linear mixed
-
effect modeling
[preregistered];
Fig. 2
)
,
and
21
between
-
subject consensus (
preregistered;
all
ICC
s > 0.60) [
Fig. 2
and
Methods
]. Eight traits
22
7
with low factorizability were excluded from EFA (
Extended Data Fig. 2b
; including
them
did
1
not change the dimensions
we eventually
found
).
2
We first
determine
d
the optimal number of factors
to retain in EFA using
five
widely
3
recommended methods
50,51
(
see
Methods
)
, as
solution
s
are considered
most reliable when
4
multiple methods agree.
Four
methods
—
Horn’s
parallel analysis, Cattell’s scree test
,
optimal
5
coordinates, and
empirical BIC
—
all ind
icated that
the optimal number of factors to retain was
6
four
(
Extended Data Fig. 3a
).
7
EFA was thus applied to extract four factors using the minimal residual method
,
and the
8
solutions were rotated with oblimin for interpretability. The four factors each e
xplained 31%,
9
31%, 11%, and 12% of the common variance in the data (85% in total; 87% in total if five
10
factors were extracted)
and were weakly correlated
(
r
13
=
-
0.33
,
r
14
=
-
0.23
,
r
23
= 0.21
,
r
24
= 0.33
11
[
p
s < 0.0
5
];
r
12
=
-
0.15,
r
34
= 0.12
[
p
s
> 0.05]
)
.
None of the factors were biased by words with
12
particularly low or high
within
-
subject consistency or between
-
subject
consensus; and the trait
13
words occupied
the
four
-
dimensional space fairly homogeneously (
Fig. 2
).
We interpreted t
hese
14
four fact
ors
as
describ
ing
judgments of
warmth, competence, femininity, and youth (
Fig. 2
;
see
15
Extended Data Fig. 4
a
for factor loading
s
), labels that were
validated
both with
an independent
16
set of participant
ratings
and
with
word embedding metrics
[see
Methods
]
.
17
8
1
Fig.
2
:
Reliability and d
imensionality of comprehensive trait judgment
s of
faces.
2
Upper right
scatterplot
:
within
-
subject
consistency
as assessed with linear mixed
-
effect modeling
3
(
y
-
axis,
regression coefficients
)
plotted
against
between
-
subject consensus as assessed with
4
intraclass correlation coefficients (x
-
axis)
of
the 100 trait
s
. The color scale indicates the
product
5
between
the
x
-
and y
-
values
.
Four
histo
grams in
diagonal
:
each plot
s
the distribution
of the
6
factor
loading
s
across
all
words in
EFA
,
on each of the four dimensions
(see Extended Data Fig.
7
4
a
for
factor loadings
; color code as in upper right scatterplot)
.
S
ix scatterplots in lower
left
:
8
each plots
the
factor loading
of
all words
in EFA
against
two
of
the
four
dimensions
(dots)
.
9
Labels
are
shown for a
small
subset of
datapoints
(
blue dot
s
)
due to limited space
(see Extended
10
Data Fig.
4
b for full labels).
11
9
Comparison
with existing dimensional frameworks
1
Prior work
4,5,17,35,37
suggest
s
that the
various
words people use
to describe
faces
can
be
2
represented by
two or three
dimensions
(e.g., valence and dominance
4
). Our
findings
support
the
3
general idea of a low
-
dimensional space
, but
reve
aled four dimensions that differ from those
4
previously proposed
.
Th
is
discrepancy was
not
explained by
methodological differences: we
5
reanalyzed our data using principal components analysis
(PCA)
, a method used in prior
6
work
4,17,37
,
and
reproduced
the same four dimensions
as reported here
(
Extended Data Fig. 5a
).
7
Instead, our
four
-
dimensional space
did not appear
in
previous studies
because
of
limited
8
sampling of
traits
in prior work
: we interrogated two subsets of our data which each consisted of
9
13 traits that corresponded to those used in the discovery of the two most popular
prior
10
dimensional frameworks
(2D and 3D frameworks)
4,37
. Our four
-
dimensional space was not
11
evident when analyses were restricted to these two small subsets of traits; instead, we reproduced
12
the
prior
2D and 3D frameworks, respectively
(
Extended Data Table 2a
-
b
).
13
We next used our reproduction of the popular prior
2D
-
framework
for more detailed
14
comparisons with our four dimensions. Replicating
prior
findings
4
, we found that
judgments
of
15
faces
on the
trait
s
sociable
,
trustworthy
,
responsible
, and
weird
were
represented by
the 2D
-
16
framework’s
valence dimension (
absolute
r
s = 0.94, 0.88 0.86, 0.85
between
factor scores
on the
17
valence factor we derived,
and
ratings for these words
across the
100 faces
), but
less well
18
represented by
our
own
warmth dimension (
absolute
r
s = 0.47, 0.67, 0.43, 0.23
;
Extended
Data
19
Fig. 4a
)
;
the
valence and warmth
dimensions
were
moderately correlated (absolute
r
= 0.41
20
between factor scores
).
Similarly,
as previously found
4
,
judgments on
aggressive
and
submissive
21
were
represented
by
the 2D
-
framework’s
dominance dimension (
absolute
r
s = 0.94, 0.95
), but
22
10
not
by
our
own
competence dimension (absolute
r
s = 0.15
, 0.14
,
p
s > 0.05
;
Extended Data Fig.
1
4a
)
;
the two dimensions were
not significantly
correlated (
r
= 0.01
)
[see
Methods
]
.
2
We directly compared how well
different
frameworks characterized trait
judgments of
faces.
3
Using linear combinations of traits
with
the highest loadings on each dimension
as regressors
4
(two for each
dimension
,
due to only two traits
loading
on one
of the dimensions in the 3D
5
framework
)
, we found that o
ur
four
-
dimensional
framework better explain
ed the variance
for
6
82
% of the trait
judgments
(that were not part of the linear combinations)
than
did any of the
7
existing frameworks
(
Extended Data Fig. 5b
;
mean adjusted R
-
squared across all predictions
8
was 0.
81
for our framework, 0.
72
for the
3D
framework
37
, and 0.
72
for the
2D
framework
4
)
.
9
A final question of interest was how our dimensions that characterize face
judgments
might
10
relate to dimensions
that characterize
personality, such as the Big Five
16
.
W
e ask
ed an
11
independent sample of 343
participants
to rate themselves on a subset of 68 of our trait words
12
that correspond to personality traits (no faces were shown; the task was self
-
report
on the words;
13
see
Methods
).
As expected, the five Big Five personality
dimensions emerged in this dataset
14
(
Extended Data Fig.
5c
; using the same EFA method as for the face
-
trait ratings).
Although
15
there was some overlap in the way that
our four face
-
judgment dimensions and the personality
16
dimensions
captured the variance in ratings evoked by these 68 words (
Extended Data Fig.
5d
)
,
17
further analysis showed distinct hierarchical structure
s
(
Extended Data Fig.
5e,f
). We
conclude
18
that trait
judgments
of unfamiliar faces
(the
focus
of
the
present study)
,
and self
-
reports of
19
personality
(as in the Big Five)
, are best characterized by two distinct dimensional spaces.
20
Robustness
and validity
of the four dimensions
21
11
We quantified
the robustness of
our results
across different numbers of trait words or
1
partici
pants. We
removed
trait
words
one by one
(based on
their rank
-
ordered
me
aning similarity
2
and
un
clarity
)
and
reperformed EFA
as before
.
Our four dimensions
were highly robust
(
with 75
3
or more of our 100 words
,
r
s between factors
were
>
0.95;
with 65 or more words,
r
s were >
0.7;
4
Extended Data Table S2.c
).
Similarly, we randomly removed participants one by one (50
5
randomizations each) and used the new aggregated ratings for EFA to show that the four
6
dimensions were robust to participant sample
size (Tucker indices of factor congruence > 0.95
7
for all four factors between the full dataset and all sub
-
datasets with no fewer than 19
8
participants per trait). Finally, we
extracted
the
small
est
subset of
specific
trait
words
that
still
9
yield
the
origin
al
four
-
dimensional space
,
a set of 18 words that
could be used
most efficiently in
10
future studies
when collecting
ratings
for a large
r
set of traits
is
not feasible
(
Extended Data
11
Table S2.d
)
.
12
To confirm our four
dimensions
and
rule out
the possibility of more complex hierarchical
13
structure, we adopted an approach with minimal assumptions,
using
artificial neural networks
14
(ANN)
and cross
-
validation
to compare different
factor structures
(see
Methods
).
Autoencode
r
15
ANNs
that
differed in th
e number of neurons and hidden layers
were
constructed
so as to model
16
the factor structures that we wished to confirm
(the existing 2D and 3D
4,37
, our 4D
, and
17
hierarchical versions thereof
)
. ANNs
trained
on
half of the data and tested
on
the
held
-
out
other
18
half
confirmed
a four
-
dimensional
representation
(explained variance
obtained with linear
19
activation functions
= 75% on the test data
[SD = 0.6%]
,
Extended Data Fig. 6
). This
20
performance is
comparable to PCA,
and the i
mprovement
in model performance became trivial
21
beyond four dimensions (explained variance on the test data from 1 to 4 n
eurons
in the hidden
22
layer
increased
by 18%, 5%, and 5%;
but
by less than 1% beyo
nd 4 neurons). The
four
-
23
12
d
imensional representation learned
by the
ANN
reproduced our four dimensions (mean
r
s =
1
0.98, 0.92, 0.91, 0.94
[SDs = 0.01, 0.05, 0.02, 0.05]
between
our original
factor loadings from
2
EFA and
the ANN’s
decoder layer weights
using
varimax rotation).
Adding hierarchical
3
structure (additional layers
to the ANN) did not
explain
more variance
(
Extended Data
4
Fig.6
e,f
).
5
Generalizability
across different
countries and
regions
6
Prior
work
has reported both common
7
and
discrepant dimensions
in
different cultures
4,17,24,35,37
.
7
To
test
the generalizability of o
ur findings, we conducted a second preregistered study to collect
8
data across seven different
regions of the world.
We first analyzed the aggregate
-
level
ratings
for
9
each sample (
preregistered;
we confirmed these
ratings
had satisfactory
consistency
and
10
co
nsensus
, see
Methods
).
11
We began by asking whether the seven samples shared a similar correlation structure (the
12
Pearson correlation matrix across trait
ratings
) with the sample in Study 1, using r
epresentational
13
similarity analysis
33
[RSA; Fisher z
-
transformation was applied before computing the correlation
14
between correlation matrices]. Highly similar correlation structures were found across samples
15
(
RSA
s
with Study 1 = 0.96, 0.92, 0.85, 0.85, 0.75, 0.83, 0.86 for North America, Latv
ia, Peru,
16
Philippines, India, Kenya, and Gaza, respectively). These high RSAs strongly suggest that a
17
similar psychological space underlies face
judgments
across different
sample
s
.
18
Parallel analysis
,
optimal coordinates
, and empirical BIC
all
showed that
a four
-
dimensional
19
space
was most common across samples
(
in 5 of 7 samples:
North America, Latvia, Peru, the
20
Philippine
s, India) [
Fig. 3
a
and
Extended Data Fig.
3b
-
h
]
.
We therefore applied
EFA
to extract
21
four factors from each sample.
Results
showed
that
the
warmth, competence, femininity, and
22
13
youth
dimensions emerged
in multiple
samples
(
interpreted
based
on factor
loadings
, see
1
Extended Data Fig. 7
)
.
W
e
further
computed Tucker indices of factor congruence
(the cosine
2
distance between pairs of factor loa
dings), which confirmed that the four
-
dimensional space was
3
largely
reproduced
across samples
(
Fig. 3b
)
, but
, as expected,
reproducibility was attenuated by
4
the
data quality
available
(as assessed
by
within
-
subject consistency,
Fig. 3c
)
5
6
Fig.
3
:
Dimensionality of comprehensive trait judgments of faces across different
samples
.
7
14
a,
Eigenvalue decomposition. Dots plot the eigenvalues of the first 10 factors across seven
1
samples, indicated by different colors.
b,
Tucker indices of factor congruence.
Columns indicate
2
the four dimensions found in Study 1
: warmth (W), competence (C), femininity (F), and
y
outh
3
(Y)
. Rows indicate the dimensions found in the samples from North America (NA), Latvia (LV
),
4
Peru (PE), the Philippines (PH), Kenya (K
E
), India (IN
), and Gaza (GZ). Numbers report the
5
Tucker indices
(with orthogonal Procrustes rotation)
. The color scale shows the sign and
6
strength of the indices.
c,
Individual
within
-
subject consistency
by sample
(assessed with
7
Pearson’s correlations)
. Every particip
ant in Study 2 had rated a subset of 20 traits twice for all
8
faces to provide an assessment of individual data quality in terms of
within
-
subject consistency
.
9
Reproducibility across
individual participants
10
So far, we have reproduced the four
-
dimensional space across samples,
but
we have not ruled out
11
the possibility that this space might be an artifact of aggregating data across participants. Could
12
the same four
-
dimensional space be reproduced in a single pa
rticipant? This important question
13
has been difficult to address since one needs
to have complete data per participant
.
We met this
14
challenge by
collec
ting
ratings
on
all traits
for all faces
from every participant in Study 2
15
(
requiring approximately 10 ho
urs of testing per participant
;
see
Methods
)
.
16
W
e first performed RSA to investigate
whether single participants (
n
= 86 who had complete
17
datasets
for all traits
after data exclusion;
see
Methods
) shared the correlation structure of
our
18
Study 1
sample
. RSAs varied considerably across participants (range = [0.14, 0.85],
M
= 0.56,
19
SD
= 0.16) and, as expected, were attenuated by data quality as assessed by within
-
subject
20
consistency
(
Fig. 4
a
,
b
).
21
15
We
next
analyzed the
dimensionality of each individual dataset
. Parallel analysis (preregistered)
1
showed that a
four
-
dimensional space was most common (
Fig. 4
c
) but, again, attenuated by data
2
quality (four
-
dimensional spaces were found for data with higher
wi
thin
-
subject consistency
than
3
data that produced other
-
dimensional spaces [unpaired t
-
test
t
(34.57)
= 3.29,
p
= 0.001]).
We
4
therefore applied EFA to extract four factors from each participant’s data
set
and computed their
5
factor congruence with
the data fro
m
Study 1.
We
found that our four dimensions
were
6
reproduced in
some
participants (
see examples of factor loading matrices in
Extended Data Fig.
7
8a
, and Tucker indices for all participants in
Extended Data Fig. 8b
)
,
but also
found
a
8
considerable amount of individual differences, in
line with prior research
52
.
9
10
Fig.
4
:
Dimensionality of comprehensive trait judgments of faces in individual data.
11
a,
Representational
similarity between aggregated data from Study 1 and indiv
idual
-
level data
12
from Study 2
for individuals
who had complete data
after exclusion
(
n = 86,
see Methods
)
.
13
16
Colors indicate different
samples
(as in Fig. 3)
. B
oxplots
indicat
e
t
he minima
(bottommost line),
1
first quartile
s
(box bottom), median
s
(line in box), third quartile
s
(box top), and maxima
2
(topmost line) of RSAs.
b,
C
orrelation between
within
-
subject consistency
and RSA (R = 0.66, p
3
< 0.001). Each
point
plots an individual’s
within
-
subject consistency
(x
-
axis) and th
at
4
individual’s RSA with the aggregated data in Study 1 (y
-
axis).
c
,
Distribution of dimensionality
5
(from parallel analysis)
across 86 individual
-
level datasets.
6
Discussion
7
Across
two large
-
scale, pre
-
registered studies
we found
that
comprehensive
trait judgments of
8
faces
are
best described by
a four
-
dimensional space, with dimensions
interpreted as
warmth,
9
competence, femininity and youth
(
Fig. 2
,
Extended Data Fig. 4
). This finding was largely
10
reproduced ac
ross
samples from different regions
, even
using different languages (Spanish in
11
Peru) [
Fig. 3
,
Extended Data Fig. 7
], as well as
in
individual participants (although this was
12
more difficult to assess, due to data quality) [
Fig. 4
and
Extended Data Fig. 8
]. We showed that
13
our divergence from
prior
work
was not due simply to methodological differences, but
to
the
14
prior lack of comprehensively
and representatively
sampled trait
words
(
Fig. 1
,
Extended Data
15
Fig
s
.
1, 5
a,b
, 6
, and
Table
S
2
a,
b
).
16
These findings
help to
reconcile
studies of face perception with the
broader
social cognition
17
literature, which has long theorized that warmth and competence are two universal dimensions of
18
social cognition
15
.
The other two dimensions we found, f
emininity and youth
,
are
likely
linked to
19
overgeneralization
27
and corroborate
recent neuroimaging findings
on social categorization from
20
face perception
32,53
.
With an inclusion of a
representative list
of personality words
16
,
our
results
21
also
provide new insights
into
the
distinctions
between
the
dimension
s
people use to describe
22
17
people
from faces
,
and
from self
-
reported
personality
, such as the Big Five
(
Extended Data Fig.
1
5
d,e,f
;
see
also
Methods
)
.
2
Despite the predominance of our four
-
dimensional space, we also
found notable v
ariation across
3
samples and individuals
(
Figs. 3
,
4
).
Since the sources of t
his variation are unknown and may
4
largely reflect measurement error (
Fig. 3c, 4b
), w
e refrain from drawing any
specific
conclusions
5
about cultural differences, for which larger
-
scale studies focusing on cultural effects
will be
6
needed. Similarly, conclusio
ns about individual differences will require future studies that collect
7
much denser, and
likely
longitudinal
,
data
in individual
participants
.
F
ace stimuli
incorporating
8
various
race
s
or emotional expression
s
will likely modify the
dimensions
of face
judgments
1,24,27
,
9
as will viewing angle, background, and other c
ontext effects. Our findings provide the most
10
comprehensive characterization of
trait judgments
from the physiognomy of faces alone, yielding
11
candidate
mental dimensions
to investigate
with respect to all th
e
se
further variables, as well as
12
in neuroimaging
studies
of face judgments
54
.
13
14
18
Methods
1
Sampling
of t
rait words
2
Here w
e
follow the
defin
ition
of
a
biological
trait
as
being
a
t
emporally stable characteristic
.
3
Traits in
our study include personality traits as well as other temporally stable characteristics
that
4
people spontane
ously infer from faces, such as
age, gender, race
, socioeconomic status
,
and
5
social
evaluative qualities (
Extended Data Fig. 1a
, e.g., “young”, “
fe
male”, “white”,
6
“educated”
, “trustworthy”)
.
By contrast, we excluded state attributions, such as “smiling” or
7
“thinking”
(words that can describe both trait
and state
variables
were not excluded
, e.g.,
we
8
included
“happy
,
”
but disambiguated its usage as a trait in our instructions to participants
, e.g.,
9
“A person who is usually cheerful;”
Extended Data Table S1
).
10
Our goal was to
representatively
sample
a
comprehensive list of trait
words that are used to
11
describe people
from
their faces. We derived a final set of 100 trai
t
s
(
Extended Data Table 1
)
12
through a series of combinations and filters
(
detailed below;
also in
our
preregistr
ation at
13
https://osf.io/6p542
)
.
These 100 trait
s
were further verified to be representative of words that
14
people freely generate
to describe trait judgments of
our face stimuli (
Fig. 1d
and
Extended
15
Data Fig. 1
a,b
).
16
To
derive the final set
of trait words
, we first gathered an inclusive list of 482 adjectives and 6
17
nouns that
included all
major
categories of trait judgments of faces:
demographic characteristics,
18
physical appearance, social evaluative qualities, personali
ty, and emotional traits, from multiple
19
sources
1,3,4,16,17,21,25
–
31,33,38,39
.
Many of the 482 adjectives
were synonyms or antonyms
. To avoid
20
redundancy while conserving semantic variability, we sampled these adjectives according to
21
three criteria: their semantic similarity
(detailed below)
, clarity
in meaning
(from an independent
22
19
set of
29
MTurk participants)
, and frequency in usage
(detailed below)
. For those
words
with
1
similar meanings, clarity was the second selection criterion (the one with the highest clarity was
2
retained). For those with simila
r meanings and clarity, usage frequency was the third selection
3
criterion (the one with the highest usage frequency was retained).
4
To quantify the semantic similarity between these 482 adjectives, we represented each of them
as
5
a vector of 300 computation
ally extracted semantic features
that describe
word embeddings and
6
text classification using a neural network provided within the FastText library
40
;
this neural
7
network
had been trained on Common Crawl data of 600 billion words to predict the identity of a
8
word given a context. We then applied hierarchical agglomerative clustering (HAC) on the word
9
vectors based on their cosine distances to vi
sualize their semantic similarities. To quantify clarity
10
of meaning, we obtained ratings of clarity from an independent set of participants tested via
11
MTurk (N = 31, 17 males, Age (M = 36, SD = 10)). To quantify usage frequency, we obtained
12
the average mon
thly Google search frequency for the bigram of each adjective (i.e., the adjective
13
together with the word “person” added after it) using the keyword research tool Keywords
14
Everywhere (
https://keywordseverywhe
re.com/
).
15
The 94 adjectives representatively sampled using the above procedures and the additional 6
16
nouns consisted of our final set of 100 trait words.
To verify the representativeness of
these
100
17
trait words
,
we compared the distributi
ons of our selected words and
of
973
words human
18
subjects freely generated to describe their spontaneous impressions of the
same
faces (see
19
Extended Data Fig. 1a
and
Methods
below)
,
using
the 300 computationally extracted semanti
c
20
dimensions. We visualized these distributions using
Uniform Manifold Approximation and
21
Projection
(UMAP
41
)
as shown in
Fig. 1d
.
22
20
To ensure that the dimensionality of the meanings of the words that we used was not limiting the
1
dimensionality of the four fa
ctors we discovered in our study,
we derived a similarity matrix
2
among our 100 words using the
FastText vector of their
meaning
s
in the specific one
-
sentence
3
definition
s
we gave to participants in the experiments (Extended Data Table S1; basic stop
-
4
words s
uch as “a”, “about”, “by”, “can”, “often”, “others” were removed from the one
-
sentence
5
definitions for the computation of vector representation
s
), and then conducted factor analysis on
6
the similarity matrix. Parallel analysis, Optimal Coordinate Index, and
Kaiser’s Rule
all
7
suggested
13 dimensions; Velicer’s MAP suggest
ed
14 dimensions, and empirical BIC suggest
ed
8
5 dimensions (empirical BIC penalizes model complexity). We used EFA to extract 5 and 13
9
factors using the same method as for the trait ratings (
13 factors explained the same common
10
variance as 14 factors, 70%; 5 factors explained 60%; factors were extracted with minimal
11
residual method and rotated with oblimin to allow for potential factor correlations).
None of the
12
dimensions obtained bore resemb
lance to our four reported dimensions, arguing that the mere
13
semantic similarity structure of our 100 trait words was not a constraint in deriving the four
14
factors that we report.
15
Sampling
of face images
16
Our goal was to derive a representative set of
neutral, frontal, white faces
of
high
quality (clear,
17
direct gaze, frontal,
unoccluded, and high resolution
)
that
are diverse in facial structure
. We
18
aimed to
maximize
variability in facial structure while con
trolling for f
actors such as race,
19
expression
,
viewing angle
, gaze, and background
, which our present project did
not intend to
20
investigate.
We
first combined 909 high
-
resolution photographs of male and female faces from
21
three publicly available face datab
ases: the Oslo Face Database
44
, the Chicago Face Database
43
,
22
and the Face Research Lab London Set
42
. We then ex
cluded faces that were not front
-
facing, not
23
21
with direct
-
gaze, with glasses or other adornments obscuring the face. We further restricted
1
ourselves to
images
of Caucasian adults and neutral expression. This yielded a set of 426 faces
2
from the three databas
es.
3
To
reduce the size of the stimulus set while conserving variability in facial structure, we sampled
4
from the 426 faces using maximum variation sampling. For each image, the face region was first
5
detected and cropped using the dlib library
45
, and then represented with a vector of 128
6
computationally extracted facial features for face recognition, using a neural network provided
7
within the dlib library that had been trained to identify individuals across millions of faces of all
8
different a
spects and races with very high accuracy
45
. Next, we sampled 50 female faces and 50
9
male faces that respectively maximized the sum of the Euclidean distances between their face
10
vectors. Specifically, a face image was first randomly selected fr
om the female or male sampling
11
set, and then other images of the same gender were selected so that each new selected image had
12
the farthest Euclidean distance from the previously selected images. We repeated this procedure
13
with 10,000 different initializat
ions and selected the sample with the maximum sum of Euclidean
14
distances. We repeated the whole sampling procedure 50 times to ensure convergence of the
15
final sample. All 100
images in the final sample
were high
-
resolution color images
, with the eyes
16
at the same height across images, had a
uniform
grey background, and were cropped to a
17
standard size
.
See
preregistration at
https://osf.io/6p542
.
18
To verify the representativeness of our selected 100 fa
ce images, we
again
performed
UMAP
19
analysis
41
to compare
the distribution of our selected faces
with
a)
N
=
632
neutral, frontal, white
20
faces from a broader set of databases
47
–
49
(
Fig. 1h
)
and
b)
N
=
5376
white
faces
“
in the wild
”
55,56
21
t
hat varied
in
angle, gaze, facial expression, lighting,
and
backgrounds
(
Extended Data Fig. 1c
)
,
22
22
using
the
128
computationally extracted
facial
identity
dimensions
45
as well as 30 traditional
1
facial metric dimensions
43
(
Extended Data Fig. 1d
-
e
)
.
2
Freely generated trait words
3
To verify that our selected 100 trait
words
were indeed representative of the
trait jud
gmen
ts
4
people spontaneously make from faces, we collected an independent dataset from participants
5
who freely generated
words
about the person that came to mind upon viewing the face.
As
6
preregistered, 30 partici
pants were recruited via MTurk (see preregistration at
7
http://bit.ly/osfpre4); different from the preregistration, we decided to not only include Caucasian
8
participants but included participants of any race (27 participants were white, 3 participants were
9
black).
10
Participants viewed the 100 face
images
one by one, each for 1 second
, and
type
d
in the words
11
(preferably single
-
word adjectives) that came to mind about the person whose face they just saw.
12
Participants could type in as many as ten words and were
encouraged to type in at least four
13
words (the number of words entered per trial
—
words entered by a participant for
a
face
—
ranged
14
from 0 words [for 8 trials] to 10 words [for 190 trials] with mean = 5 words). There was no time
15
limit; participants clicked
“confirm” to move on to the next trial when they finished entering all
16
the words they wanted to enter for the current trial.
All data can be accessed at
17
https://osf.io/4mvyt/
.
18
Study 1 Participants
19
All studies in this
report were
approved by the Institutional Review Board of the California
20
Institute of Technology and informed consent was obtained from all participants. We
21
predetermined our sample size
for Study 1
based on a recent study that investigated the point of
22