of 3
2
n
a
t
u
r
e
r
e
s
e
a
r
c
h
|
r
e
p
o
r
t
i
n
g
s
u
m
m
a
r
y
A
p
r
i
l
2
0
2
0
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences
Behavioural & social sciences
Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see
nature.com/documents/nr-reporting-summary-flat.pdf
Behavioural & social sciences study design
All studies must disclose on these points even when the disclosure is negative.
Study description
Research sample
Sampling strategy
Data collection
Timing
Data exclusions
Our study is quantitative experimental. We applied deep neural networks to representatively sample multiple stimulus sets, and
derived a novel set of 100 traits and 100 faces for a comprehensive protocol we administered in two pre-registered studies. We
collected data both on-line and on-site in different countries and regions (North America, Latvia, Peru, the Philippines, India, Kenya,
and Gaza). We analyzed the data using linear mixed modeling, exploratory factor analysis, artificial neural network, and
representational similarity analysis.
The research samples include Amazon Mechanical Turk (MTurk) workers in the United States (Study 1), and participants in North
America (Canada and the United States), Latvia, Peru, the Philippines, India, Kenya, and Gaza recruited via the social enterprise Digital
Divide Data (Study 2). The MTurk sample (Study 1) is not representative of the U.S. population; any worker with an MTurk account
who satisfies the following criteria can participate in our study: located in U.S., HIT approval rate for all requesters’ HITs greater than
or equal to 95%, aged 18 and older, has normal or corrected-to-normal vision, native English speaker, self-identified as white, the
highest level of education has completed is no less than high school (details can be access at our preregistration at Open Science
Framework: https://osf.io/6p542/?view_only=fff024253b604edb832a9824cbdafa75). This sample was targeted based on the
following rationale: first, an on-line sample was chosen because the experiment procedure allows for an Internet-based data
collection, where participants could complete the study at convenient times and places; second, the requirements about HIT
approval rate and vision help ensure data quality; third, only participants who self-identified as white were included to match the
race of the faces, which in turn help alleviate potential cultural-effects in face perception that we do not intend to investigate in
Study 1; lastly, only participants who were located in the U.S., native English speaker, and had completed high school were included
to help alleviate noises in participants' understanding of the trait words. The samples in the seven countries and regions (Study 2) are
not nationally representative, any participant in the Digital Divide Data subject pools of those locations who satisfies the following
criteria can participate in our study: aged 18-40, has been educated and completed at minimum high school, trained in computer
skills, proficient in English (except for participants in Peru), has never visited western-cultural countries (except for participants in
North America and Latvia); we also targeted an equal ratio of males and females in each location (details can be accessed at our
preregistration at Open Science Framework: https://osf.io/qxgmw/?view_only=fd43b2e8b25248f7b7de51b9aeae1894). This sample
was targeted based on the following rationale: first, using samples from different parts of the world helps test the generalizability of
our results from Study 1; second, using samples from as many different continents as possible and requiring participants in non-
western-cultural countries to have minimum cultural exposure to the western cultures allow for a more stringent test of
generalizability of our results from Study 1; third, the requirements about education, computer training, and proficient in English help
ensure data quality.
Participants were randomly sampled from the subject pools satisfying the inclusion criteria stated above. We predetermined sample
size based on the point of stability (POS)—a minimum sample size needed to achieve a stable average measure, because most of our
analyses used averaged ratings across participants. Our estimation was based on a recent study that analyzed the point of stability for
the inferences of 24 traits from faces using 698,829 ratings across 6,593 participants and 3,353 facial stimuli (Hehman, Xie, Ofosu, &
Nespoli, 2018). Given that data will be collected on a 7-point Likert scale, the corridor of stability (COS) deemed acceptable to us is
+/- 0.5 or +/- 1.00 and the level of confidence deemed acceptable to us is 95%. Across the 24 traits, the POSs ranged from 18 to 42
participants for a COS of +/- 0.5, and from 5 to 11 participants for a COS of +/- 1.00. For the MTurk sample (Study 1), given a large
subject pool was available, we predetermined the COS to be +/- 0.5 and the sample size to be 60 participants per trait (details can be
access at our preregistration at Open Science Framework: https://osf.io/6p542/?view_only=fff024253b604edb832a9824cbdafa75).
For the seven samples in different countries and regions (Study 2), given smaller subject pools were available, we predetermined the
COS to be +/- 1.00 and the sample size to be 30 participants per location (details can be accessed at our preregistration at Open
Science Framework: https://osf.io/qxgmw/?view_only=fd43b2e8b25248f7b7de51b9aeae1894).
Data were collected using computers. The researchers were not presented during any of the data collection procedure. The
researchers were not blinded to experimental condition or the study hypothesis. For the Mturk sample (Study 1), workers completed
the studies online using their own computers in their own environments. For the seven samples in different countries and regions
(Study 2), participants completed the studies on-site using the computers at the local offices of Digital Divide Data.
MTurk data (Study 1) were collected from April 5, 2018 to July 14, 2018; there was no gap during this data collection period. Cross-
cultural data (Study 2) were collected from Dec 10, 2018 to Dec 26, 2018; there was no gap during this data collection period.
For the MTurk data in Study 1, of the full sample with a registered size of N = 1,500 participants and L = 750,000 ratings, n = 48
participants and l = 27,491 ratings were excluded from further analysis. Data exclusion was performed according to our
preregistration: a. Trial-wise deletion would be done if responses are missing or timed out, or if RT is less than 100ms; b. Participant-
wise deletion would be done if a participant has more than 10% of invalid trials in any block as per (a); c. Block-wise (trait-wise per
subject) deletion would be done if all trials in a given block have the same rating (detail can be accessed at https://osf.io/6p542/?
view_only=fff024253b604edb832a9824cbdafa75). For the cross-cultural data in Study 2, we have preregistered data exclusion
criteria in our initial preregistration (https://osf.io/qxgmw/?view_only=fd43b2e8b25248f7b7de51b9aeae1894) and our second
preregistration (https://osf.io/tbmsy/?view_only=6d8b94575bf0469fb157c89eb9292371). According to the preregistered data