ARTICLE
Four dimensions characterize attributions from
faces using a representative set of English trait
words
Chujun Lin
1
✉
, Umit Keles
1
& Ralph Adolphs
1,2
People readily (but often inaccurately) attribute traits to others based on faces. While the
details of attributions depend on the language available to describe social traits, psychological
theories argue that two or three dimensions (such as valence and dominance) summarize
social trait attributions from faces. However, prior work has used only a small number of trait
words (12 to 18), limiting conclusions to date. In two large-scale, preregistered studies we ask
participants to rate 100 faces (obtained from existing face stimuli sets), using a list of 100
English trait words that we derived using deep neural network analysis of words that have
been used by other participants in prior studies to describe faces. In study 1 we
fi
nd that
these attributions are best described by four psychological dimensions, which we interpret
as
“
warmth
”
,
“
competence
”
,
“
femininity
”
, and
“
youth
”
. In study 2 we partially reproduce
these four dimensions using the same stimuli among additional participant raters from
multiple regions around the world, in both aggregated and individual-level data. These results
provide a comprehensive characterization of trait attributions from faces, although we note
our conclusions are limited by the scope of our study (in particular we note only white faces
and English trait words were included).
https://doi.org/10.1038/s41467-021-25500-y
OPEN
1
Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
2
Division of Biology and Biological Engineering,
California Institute of Technology, Pasadena, CA, USA.
✉
email:
clin7@caltech.edu
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications
1
1234567890():,;
P
eople attribute a wide range of traits (temporally stable
characteristics, see Methods) to other individuals upon
viewing their faces, such as demographics (e.g., gender,
age), physical appearance (e.g., baby-faced, beautiful), social
evaluation (e.g., trustworthy, competent), and personality (e.g.,
aggressive, sociable)
1
,
2
. These trait attributions are made ubiqui-
tously and rapidly
3
, and are known to in
fl
uence most subsequent
processing, such as conscious perception
4
and memory
5
of the
face. Although trait attributions from faces may not re
fl
ect peo-
ple
’
s actual traits and reveal more about our own biases and
stereotypes
3
,
6
,
7
, they can in
fl
uence social decision-making in real
life, ranging from success in job markets and social relationships
to political elections and courtroom decisions
8
–
12
.
Despite the considerable amount of work on the topic
3
,
13
–
22
,it
remains unclear how people make these rapid attributions: do
they have distinct representations for each of the hundreds of
possible words that describe somebody based on the face (which
might well vary depending on the language), or do they map their
attributions of the face into a much lower-dimensional psycho-
logical space? By analogy, we can perceive (and have words for)
many different shades of colors but they are all the result of a
three-dimensional color space. In the case of color, the answer is
easier because we roughly know the biological mechanism for
these perceptions (i.e., there are only three kinds of cones in the
retina); in the case of trait attributions from faces, the biological
mechanism underlying particular perceptions is unclear and we
must infer the descriptive dimensions from behavioral data
(typically, using participants
’
ratings of faces on different trait
words). Prior approaches have discovered dimensional frame-
works that have largely shaped studies both within and outside
the
fi
eld
3
,
13
,
15
,
23
–
35
but those approaches used only a small
number of trait words (typically 12
–
18) that were common across
studies
2
,
31
,
36
or in use by lay people
1
,
23
. Moreover, those words
are partly redundant in meaning and may not encompass the full
range of trait words that people can use to describe faces. Con-
sequently, the psychological dimensions suggested by such prior
studies may be incomplete.
Here we argue that to understand the comprehensive dimen-
sionality of trait attributions from faces, it is essential to inves-
tigate a more comprehensively sampled set of trait words. To
meet this challenge, we assembled an extensive list of English trait
words that people use to describe faces from multiple
sources
1
–
3
,
8
,
10
,
14
–
20
,
22
,
37
–
39
and applied a data-driven approach
with a pretrained neural network to derive a representative subset
of 100 traits (Fig.
1
a
–
d). Similarly, we combined multiple extant
face databases and applied a data-driven approach with a pre-
trained neural network to derive a representative subset of 100
neutral face images of white, adult individuals (Fig.
1
e
–
h) [see
Methods]. We focus on English words because English is the
most-spoken language (native and learned) around the world
40
.
We limit our stimulus images to frontal facing, faces of white,
adult individuals with what are perceived as neutral facial
expressions in an attempt to control for factors, such as racial and
age discrimination, which are known to bias face
perception
23
,
41
–
44
. Relatedly, this restriction of the variance in
our face stimuli served to increase statistical power, by elim-
inating factors that our study did not intend to investigate, such
as facial expressions (see Methods). We veri
fi
ed that the
100 selected traits were representative of the trait words English-
speaking people spontaneously use to describe the 100 face
images (Fig.
2
a, b) and that the 100 selected face images were
representative of the physical structure of white, adult faces
(Fig.
2
c, d). We collected ratings of the 100 faces on the 100 traits
both sparsely online (Study 1) [750,000 ratings from 1500 par-
ticipants with repeated ratings for assessing within-subject con-
sistency for every trait] and densely on-site (Study 2) [10,000
ratings from each of 210 participants across North America,
Latvia, Peru, the Philippines, India, Kenya, and Gaza]. All
experiments were preregistered on the Open Science Framework
(see Methods).
Results
Broader considerations and study limitations
. Our study is a
basic research investigation of the psychological dimensions that
people use to make social trait attributions from unfamiliar faces.
It offers a speci
fi
c methodological advance over prior work in this
fi
eld by representatively sampling study stimuli. This method
starts with more comprehensive sets of stimuli, capitalizes on
advances in machine learning algorithms to quantify stimuli, and
applies statistical procedures to sample stimuli. This method
could be
fl
exibly adapted to a wide range of stimuli and other
research domains. This methodological improvement discovered
four dimensions that differ to some degree from the dimensions
discovered in previous work, highlighting the importance of
representative stimulus sampling, and suggesting that the psy-
chological space people use to organize social attributions from
faces is more complex than previously thought.
The four dimensions we found here describe how people
attribute traits to others based on faces
—
they do not describe
people
’
s actual traits. In fact, we cannot make any claims about
whether or not these attributions were valid since we do not
measure the actual traits of the people whose faces were used as
stimuli. It is generally well known that people
’
s trait attributions
from faces are not accurate
3
but instead re
fl
ect the rater
’
s biases
and stereotypes. Indeed, our
fi
ndings identify four important
dimensions that may contribute to biases and stereotypes that
people exhibit when viewing faces, which potentially may inform
future work on stereotyping.
We attempted to extend and improve on prior work by being
more comprehensive in several aspects, including preregistering
our studies, representatively sampling our stimuli, analyzing our
data with different methods, testing the robustness of our
fi
ndings
against different factors, and replicating our study in different
samples and individuals around the world. However, our study
also has important limitations, which constrain the generality of
our
fi
ndings.
First, our study is unlikely to be representative with respect to
faces in general. We utilized images of front facing, with what
would be perceived as a neutral expression, adult, white faces.
This decision was made based on three considerations: control-
ling for factors, such as race and age discrimination, which are
known to in
fl
uence face perception; relatedly, increasing the
statistical power for our aims by reducing those sources of
variance that fall outside the scope of our study (e.g., facial
expressions); and the availability of a suf
fi
cient number of faces
for representative sampling in extant face databases.
Second, it remains unknown the extent to which our study is
representative of the concepts that people use to make social trait
attributions. It is possible that a broader range of concepts are
commonly used but were not representatively sampled in our
study
—
for instance, those concepts denoted by derogatory or
swear words, and slang words. It is also notable that our study
focuses on the concepts denoted by words in English
—
only one
of more than 6000 languages that exist today. It is possible that
cultures and languages shape the concepts available to make trait
attributions from faces, and thus the underlying psychological
dimensions. Our Study 2 investigated samples in different regions
around the world to test the reproducibility of our
fi
ndings but it
is not intended to survey cultural effects and we make no claims
to that effect.
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-25500-y
2
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications
Fig. 1 Sampling traits a
–
d and face images e
–
h to generate a comprehensive set. a
Sampling of traits began by assembling an extensive list of trait
words
1
–
3
,
8
,
10
,
14
–
20
,
22
,
37
–
39
spanning all-important categories of trait attributions from faces.
b
Each adjective was represented with a vector of 300 semantic
features that describe word embeddings and text classi
fi
cation using a state-of-the-art neural network that had been pretrained to assign words to their contexts
across 600 billion words
70
.
c
Three
fi
lters were applied to remove words with similar meanings, unclear meaning, and infrequent usage (see Methods).
d
The
fi
nal set of 100 traits consisted of the sampled adjectives and nouns (see Supplementary Table 1).
e
Sampling of face images began by assembling a set of frontal,
neutral, white faces from three popular face databases
71
–
73
. f, Each face was represented with a vector of 128 fac
ial features that are used to classify individual
identities using a neural network
74
pretrained to identify individuals across millions of faces of all different aspects and races.
g
Maximum variation sampling
86
was applied to select faces with maximum variability in facial structure in this 128-D space.
h
Multidimensional scaling visualization of the sampled 100 face
images (green and orange dots).
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-25500-y
ARTICLE
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications
3
Finally, our study makes no claims about our four factors being
universal, biologically basic, or evolved. This is not only because
of the limitations listed above but also because it is unknown what
kinds of faces and what kinds of social trait concepts were
available to our ancestors.
Four dimensions underlie trait attributions from faces
. Study 1
examined the underlying dimensions of the ratings that partici-
pants had given to the faces (ratings aggregated across partici-
pants) by
fi
rst applying an exploratory method (exploratory
factor analysis [EFA]; preregistered) and subsequently a con-
fi
rmatory method with cross-validation (an autoencoder arti
fi
cial
neural network [ANN]). We con
fi
rmed that these ratings showed
suf
fi
cient variance (Supplementary Fig. 2a), within-subject con-
sistency (assessed with Pearson
’
s correlations,
M
=
0.47,
Range
=
[0.28, 0.84], as well as linear mixed-effect modeling
[preregistered]; Fig.
3
), and between-subject consensus (pre-
registered; all ICCs > 0.60) [Fig.
3
and Methods]. Eight traits with
low factorizability were excluded from further analyses (Supple-
mentary Fig. 2b; including them did not change the dimensions
we eventually found).
We determined the optimal number of factors to retain in EFA
using
fi
ve widely recommended methods
45
,
46
(see Methods), as
solutions are considered most reliable when multiple methods
agree. Four methods
—
Horn
’
s parallel analysis, Cattell
’
s scree test,
optimal coordinates, and empirical BIC
—
all indicated that the
optimal number of factors to retain was four (Supplementary
Fig. 3a).
EFA was thus applied to extract four factors using the minimal
residual method, and the solutions were rotated with oblimin for
interpretability. The four factors each explained 31, 31, 11, and 12%
of the common variance in the data (85% in total; 87% in total if
fi
ve
factors were extracted) and were weakly correlated (
r
13
=
−
0.33,
r
14
=
−
0.23,
r
23
=
0.21,
r
24
=
0.33 [
p
s
=
8.122 × 10
−
4
,0.021,0.040,
8.358 × 10
−
4
];
r
12
=
−
0.15,
r
34
=
0.12 [
p
s
=
0.129, 0.237]). None of
the factors were biased by words with particularly low or high
within-subject consistency or between-subject consensus; and the
trait words occupied the four-di
mensional space fairly homoge-
neously (Fig.
3
). We interpreted these four factors as describing
attributions of warmth, competence, femininity, and youth (Fig.
3
;
see Supplementary Fig. 4a for factor loadings) [see Methods]. We
note that all trait attributions
based on faces, and therefore the
dimensions describing these attributions, are a re
fl
ection of people
’
s
Fig. 2 Representativeness of the sampled traits a
–
b and face images c
–
d. a
Distributions of word similarities. The similarity between two words was
assessed with the cosine distance between the 300-feature vectors
70
of the two words. The blue histogram plots the pairwise similarities among the
100 sampled traits. The red histogram plots the similarities between each of the freely generated words during spontaneous face attributions (
n
=
973, see
Supplementary Fig. 1a) and its closest counterpart in the sampled 100 traits. Dashed lines indicate means. All freely generated words were found to be
similar to at least one of the sampled traits (all similarities greater than the mean similarity among the sampled traits [except for the words
“
moving
”
and
“
round
”
]). Eighty-
fi
ve freely generated words were identical to those in the 100 sampled traits.
b
Uniform Manifold Approximation and Projection of words
(UMAP
75
, a dimensionality reduction technique that generalizes to nonlinearities). Blue dots indicate the 100 sampled traits (examples labeled in blue) an
d
gray dots indicate the freely generated words during spontaneous face attributions (see Methods; nonoverlapping examples labeled in gray, which we
re
mostly momentary mental states rather than temporally stable traits).
c
UMAP of the
fi
nal sampled 100 faces (stars) compared with a larger set of frontal,
neutral, white faces from various databases
76
–
78
(dots,
N
=
632; see also Supplementary Fig. 1b for comparison with faces in real-world contexts). Each
face was represented with 128 facial features as extracted by a state-of-the-art deep neural network
74
.
d
UMAP of the
fi
nal sampled 100 faces (stars)
compared with the larger set of faces (dots) as in
c
. Each face was represented here with 30 automatically measured simple facial metrics
72
(e.g., pupillary
distance, eye size, nose length, cheekbone prominence). Source data are provided as a Source Data
fi
le.
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-25500-y
4
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications
stereotypes of some sort, since in our study nothing else is known
about the people whose faces are used as stimuli. Here we omitted
“
-stereotypes
”
in our labeling of all dimensions for conciseness.
To corroborate the four dimensions discovered from EFA, we
applied an approach with minimal assumptions
—
arti
fi
cial neural
networks (ANN) with cross-validation to compare different
factor structures (see Methods). Autoencoder ANNs with one
hidden layer that differed in the number of neurons (range from 1
to 10) were constructed (Fig.
4
a). These ANNs were trained on
half of the data (i.e., aggregated ratings across half of the
participants) and tested on the other held-out half (Adam
optimization algorithm
47
and mean squared error loss function
with a batch size of 32 and 1500 epochs were used to train the
ANNs, repeated for 50 iterations). Both the linear and nonlinear
activation functions were examined (Fig.
4
b). Model performance
of the best con
fi
guration (i.e., linear activation functions in both
the encoder and decoder layers) increased substantially as the
number of neurons in the hidden layer increased from 1 to 4
(explained variance on the test data increased by 18, 5, and 5%,
respectively); the improvement was trivial beyond four neurons
(increased by less than 1%) [Fig.
4
c]. Critically, the four-
dimensional representation learned by the ANN reproduced the
four dimensions discovered from EFA (mean
r
s
=
0.98, 0.92, 0.91,
0.94 [SDs
=
0.01, 0.05, 0.02, 0.05] between the factor loadings
from EFA and the ANN
’
s decoder layer weights with varimax
rotation) and con
fi
rmed good performance (explained variance
obtained with linear activation functions was 75% [SD
=
0.6%]
on the test data, comparable to PCA).
Fig. 3 Reliability and dimensionality of trait attributions from faces.
Upper right scatterplot: within-subject consistency as assessed with linear mixed-
effect modeling (
y
-axis, regression coef
fi
cients) plotted against between-subject consensus as assessed with intraclass correlation coef
fi
cients (
x
-axis) of
the 100 traits. The color scale indicates the product between the
x
- and
y
-values. We used 94 traits selected from the literature and supplemented the list
with additional trait words for which we believe there was no equivalent in the initial list but would re
fl
ect vocabulary used to describe
fi
rst impressions.
Four histograms in diagonal: each plots the distribution of the factor loadings across all traits in EFA, on each of the four dimensions (color code as i
n upper
right scatterplot; see also Supplementary Fig. 4a for factor loadings). Six scatterplots in the lower left: each plots the factor loading of all trait
s in EFA
against two of the four dimensions (dots). Labels are shown for a small subset of datapoints (blue dots) due to limited space (see Supplementary Fig. 4b
for full labels). Source data are provided as a Source Data
fi
le.
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-25500-y
ARTICLE
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications
5
Comparison with existing dimensional frameworks
. Prior
work
1
,
2
,
23
,
32
,
36
suggests that attributions from faces with a more
limited set of descriptive words can be represented by two or
three dimensions. Our
fi
ndings support the general idea of a low-
dimensional space but revealed four dimensions that differ from
those previously proposed. One plausible source for this dis-
crepancy could be methodological differences
48
,
49
, which turned
out not to be the case: we reanalyzed our data using principal
components analysis (PCA), a method used in prior work
1
,
2
,
36
in
which dimensions are forced to be orthogonal, and reproduced
the same four dimensions as reported above (Supplementary
Fig. 5a).
Instead, the four-dimensional space did not appear in previous
studies because of limited sampling of traits in prior work: we
interrogated two subsets of our data which each consisted of 13
traits that corresponded to those used in the discovery of the two
most popular prior dimensional frameworks (2D and 3D
frameworks
1
,
36
). The four-dimensional space was not evident
when analyses were restricted to these two small subsets of traits;
instead, we reproduced the prior 2D framework (Table
1
) and 3D
framework (Table
2
).
We next showed that using a more comprehensive set of trait
words here not only revealed a larger number of dimensions but a
dimensional space that is distinct from prior frameworks. While
our choice of labels for the
fi
rst two dimensions (warmth,
competence) might suggest correspondence to the two dimen-
sions of the popular prior 2D framework (valence, dominance)
due to the semantic similarity between the words, the face
attributions these dimensions describe are distinct: using
the subset of 13 traits that replicated the 2D framework (Table
1
),
we found that the warmth dimension and the valence dimension
were weakly correlated (
r
=
0.41 based on EFA factor scores;
r
=
0.09 based on scores from PCA, the method used in prior
work, with which we also replicated the four dimensions from our
full dataset and the 2D framework from the subset of 13 traits);
the competence dimension and the dominance dimension were
not signi
fi
cantly correlated (
r
=
0.01,
p
=
0.894 based on EFA
factor scores;
r
=
0.09,
p
=
0.383 based on PCA scores). We note
that the youth dimension found here was highly correlated with
the youthful/attractiveness dimension proposed in the prior 3D
framework (
r
=
0.71 based on EFA scores;
r
=
0.76 based on PCA
scores).
Finally, we directly compared how well different frameworks
characterized trait attributions from faces. Using linear combina-
tions of traits with the highest loadings on each dimension as
regressors (two for each dimension, due to only two traits
loading on one of the dimensions in the 3D framework, Table
2
),
we found that the four-dimensional framework better explained
the variance for 82% of the trait attributions (that were not part
of the linear combinations) than did any of the existing
frameworks (Supplementary Fig. 5b; mean adjusted
R
-squared
across all predictions was 0.81 for the four-dimensional frame-
work, 0.72 for the 3D framework, and 0.72 for the 2D
framework).
Robustness of the four dimensions
. We quanti
fi
ed the robust-
ness of our results both across different numbers of trait words
and across different numbers of participants. First, we removed
trait words one by one and reperformed EFA to extract four
factors as before (all pairs of trait words were ranked from the
most to the least similar, and the trait with lower clarity rating
was removed from each pair). The four dimensions discovered
from the full set versus the subsets of traits were highly correlated
(Fig.
5
a; see Supplemental Table 2a for the complete list of cor-
relations). Second, we randomly removed participants one by one
Fig. 4 Dimensionality analysis with arti
fi
cial neural network and cross-validation. a
An example of an autoencoder model with one hidden layer and four
neurons in the hidden layer to learn the underlying representation of the data.
b
, The means (points) and standard deviations (bars) of the explained
variance (
n
=
50 iterations) on the training data from autoencoders with various numbers of neurons in the hidden layer (red dots in a). Colors indicate
different con
fi
gurations of activation functions in the encoder and decoder layers (linear, tanh, sigmoid, recti
fi
ed linear activation unit, L1-norm
regularization); for example, the blue line indicates con
fi
gurations with linear functions in both the encoder and decoder layers (AE-linear-linear).
c
, Means
(points) and standard deviations (bars) of the explained variance (
n
=
50 iterations) on the test data from autoencoders shown in
b
. Source data are
provided as a Source Data
fi
le.
Table 1 Factor loadings from EFA on the subset of 13 traits
used in the 2D framework.
Factor loadings from EFA on the
subset of data corresponding to 13 traits (
fi
rst column) that
are the same or most similar to those used in a prior study
that discovered the popular 2D framework[1] (
fi
rst column, in
brackets). Two factors
—
the optimal number of factors as
indicated by both the Cattell
’
s Scree Test and empirical BIC
—
were extracted and rotated with oblimin. The largest absolute
loading across factors for each trait is highlighted in bold.
Source data are provided as a Source Data
fi
le.
Traits from our set [traits in 2D framework
1
] Valence Dominance
Sociable [Sociable]
0.89
0.14
Weird [Weird]
−
0.88
0.13
Beautiful [Attractive]
0.86
0.03
Con
fi
dent [Con
fi
dent]
0.85
−
0.53
Responsible [Responsible]
0.82
0.12
Trustworthy [Trustworthy]
0.77
0.38
Wise [Intelligent]
0.70
−
0.06
Thoughtful [Caring]
0.64
0.55
Happy [Unhappy]
0.54
0.45
Submissive [Dominant]
−
0.18
1.00
Aggressive [Aggressive]
−
0.13
−
0.90
Mean [Mean]
−
0.22
−
0.86
Emotional [Emotionally stable]
0.48
0.54
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-25500-y
6
NATURE COMMUNICATIONS
| (2021) 12:5168 | https://doi.org/10.1038/s41467-021-25500-y | www.nature.com/naturecommunications