of 19
1
Estimating
the
heritability
of
psychological
measures
in
the
1
Human
Connectome
Project
dataset
2
Yanting
Han
1*
,
Ralph
Adolphs
1,2,3
3
1
Division
of
Biology
and
Biological
Engineering,
California
Institute
of
Technology,
Pasadena,
4
CA,
USA
5
2
Division
of
the
Humanities
and
Social
Sciences,
California
Institute
of
Technology,
Pasadena,
6
CA,
USA
7
3
Chen
Neuroscience
Institute,
California
Institute
of
Technology,
Pasadena,
CA,
USA
8
*
Correspondence:
9
Yanting
Han
10
yhhan@caltech.
edu
11
Keywords:
Human
C
onnectome
Project
,
heritability,
machine
learning,
twin
studies
,
12
Ridge,
Random
Forest
13
Abstract
14
The
Human
Connectome
Project
(HCP)
is
a
large
structural
and
functional
MRI
dataset
with
a
15
rich
array
of
behavioral
measures
and
extensive
family
structure.
This
makes
it
a
valuable
16
resource
for
investigating
questions
about
individual
differences,
including
questions
about
17
heritability.
While
its
MRI
data
have
been
analyzed
extensively
in
this
regard,
to
our
knowledge
18
a
comprehensive
estima
tion
of
the
heritability
of
the
behavioral
dataset
has
never
been
19
conducted.
Using
a
set
of
behavioral
measures
of
personality,
emotion
and
cognition,
we
show
20
that
it
is
possible
to
re
-
identify
the
same
individual
across
two
testing
times,
and
identify
21
ide
ntical
twins.
Using
machine
-
learning
(univariate
linear
model,
Ridge
classifier
and
Random
22
Forest
model)
we
estimated
the
heritability
of
37
behavioral
measures
and
compared
the
results
23
to
those
derived
from
twin
correlations.
Correlations
between
the
stan
dard
heritability
metric
and
24
each
set
of
model
weights
ranged
from
0.42
to
0.67,
and
questionnaire
-
based
and
task
-
based
25
measures
did
not
differ
significantly
in
their
heritability.
We
further
derived
nine
latent
factors
26
from
the
37
measures
and
repeated
th
e
heritability
estimation;
in
this
case,
the
correlations
27
between
the
standard
heritability
and
each
set
of
model
weights
were
lower,
ranging
from
0.15
to
28
0.38.
One
specific
discrepancy
arose
for
the
general
intelligence
factor,
which
all
models
29
assigned
h
igh
importance,
but
the
standard
heritability
calculation
did
not.
We
present
an
30
alternative
method
for
qualitatively
estimating
the
heritability
of
the
behavioral
measures
in
the
31
HCP
as
a
resource
for
other
investigators,
and
recommend
the
use
of
machine
-
learning
models
32
for
estimating
heritability.
33
Introduction
34
Decades
of
research
have
accumulated
abundant
knowledge
on
the
heritability
of
various
human
35
traits.
A
recent
meta
-
analysis
studied
28
functional
domains
and
found
the
largest
heritability
36
estimate
s
for
several
physical
trait
domains
(such
as
the
ophthalmologic
and
skeletal
domain
s)
37
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
2
but
the
lowest
heritability
for
some
psychological
domains
(such
as
the
social
values
domain;
38
Polderman
et
al.,
2015).
This
domain
-
wise
characterization
was
largely
cons
istent
with
reported
39
values
from
studies
that
focused
on
individual
traits.
For
example,
height
is
one
of
the
most
40
studied
traits
in
the
physical
domain.
An
earlier
study
involving
twins
from
eight
countries
41
estimated
the
heritability
of
height
to
be
0.87
-
0.93
for
males
and
0.68
-
0.84
for
females
42
(Silventoinen
et
al.,
2003),
although
a
more
recent
study
of
larger
samples
produced
estimates
up
43
to
0.83
in
boys
and
0.76
in
girls
(Jelenkovic
et
al.,
2016),
comparable
to
the
reported
meta
44
heritability
of
0.73
(Polderman
et
al.,
2015).
By
contrast,
the
heritability
of
psychological
traits
is
45
generally
estimated
to
be
lower:
episodic
memory
has
a
heritability
around
0.3
0.6
46
(Papassotiropoulos
and
de
Quervain,
2011)
(
with
meta
heritability
around
0.6),
and
pers
onality
47
has
a
heritability
around
0.4
(Vukasovi
ć
and
Bratko,
2015)
(with
meta
heritability
around
0.48).
48
These
traits
have
typically
been
studied
in
isolation
in
previous
studies.
Here
we
took
advantage
49
of
the
comprehensive
set
of
measures
available
in
the
Human
Connectome
Project
(HCP)
dataset
50
(including
both
self
-
report
questionnaires
and
behavioral
tasks),
which
allowed
us
to
describe
an
51
individual’s
psychological
profile
and
similarity
to
others.
Our
goal
was
to
apply
modern
52
machine
learning
methods
to
estimate
heritability
in
this
dataset,
at
the
same
time
providing
a
53
resource
that
could
be
used
for
studies
of
heritability
in
the
neuroimaging
data
component.
54
The
Human
Connectome
Project
(HCP)
offers
a
uniquely
rich
sample
of
measures
across
the
55
same
12
00
subjects:
structural,
diffusion,
and
functional
MRI,
together
with
questionnaire
-
and
56
task
-
based
measures
that
assess
many
different
psychological
domains
(Van
Essen
et
al.,
2013).
57
The
HCP
dataset
has
proven
to
be
a
valuable
resource
for
investigating
i
ndividual
differences.
A
58
number
of
recent
studies
have
utilized
the
HCP
dataset
to
predict
personal
identity,
gender,
fluid
59
intelligence,
personality,
and
executive
function
from
brain
connectivity
(Dubois
et
al.,
2018;
60
Finn
et
al.,
2015;
Liu
et
al.,
2018;
Zhang
et
al.,
2018).
61
Another
valuable
aspect
of
the
HCP
is
that
it
has
a
rich
and
extensive
family
structure,
including
62
149
genetically
confirmed
monozygotic
twin
pairs
and
94
genetically
confirmed
dizygotic
twin
63
pairs.
In
principle,
this
provides
a
power
ful
resource
for
investigating
the
heritability
of
brain
-
64
behavior
relationships.
Several
studies
have
used
MRI
data
in
the
HCP
to
investigate
the
65
heritability
of
brain
structures
and
connectivity
patterns,
many
aspects
of
which
are
heritable
(Ge
66
et
al.,
2
016)
.
For
instance,
surface
area
and
cortical
thickness
(Strike
et
al.,
2019),
the
depth
of
67
Sulcal
Pits
(Le
Guen
et
al.,
2018),
subcortical
shape
(Gutman
et
al.,
2015),
hippocampal
subfield
68
volumes
(Patel
et
al.,
2017)
and
cortical
myelination
(Liu
et
al.
,
2019)
are
all
heritable
structural
69
features.
Similarly,
connectivity
patterns,
especially
resting
-
state
fMRI,
have
been
shown
to
be
70
heritable
(Colclough
et
al.,
2017
;
Adhikari
et
al.,
2017),
with
highest
estimates
found
for
repeat
71
measurements
that
acco
unt
for
transient
fluctuations
(Ge
et
al.,
2017).
Other
studies
have
also
72
probed
the
neural
correlates
of
cognitive
processes
in
the
context
of
heritability
using
HCP
data
73
(Babajani
-
Feremi,
2017;
Guen
et
al.,
2018;
Kochunov
et
al.,
2016;
Vainik
et
al.,
201
8).
For
74
instance,
one
study
used
bivariate
genetic
analyses
to
identify
brain
networks
that
were
75
genetically
correlated
with
cognitive
tasks
in
math
and
language
(Guen
et
al.,
2018).
Similarly,
76
another
study
found
common
genetic
influences
for
white
matter
microstructure
and
processing
77
speed
(Kochunov
et
al.,
2016).
Both
studies
demonstrated
that
heritability
can
provide
a
78
powerful
link
between
brain
and
behavior.
79
Behavioral
h
eritability
is
defined
as
the
genetic
contribution
to
the
total
variance
for
a
phenotypic
80
trait
in
a
population,
an
important
statistic
for
understanding
individual
differences.
Twins
(both
81
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
3
monozygotic/MZ
and
dizygotic/DZ)
are
particularly
useful
for
the
estimation
of
heritability
as
82
they
can
help
to
differentiate
the
contribution
of
genes
versus
environment.
In
classical
twin
83
studies,
the
basic
assumptions
are
that
MZ
twins
share
on
average
100%
of
their
alleles,
while
84
DZ
twins
share
on
average
50%
of
their
alleles,
a
nd
both
MZ
and
DZ
twins
share
a
common
85
environment.
The
total
variance
can
be
split
into
three
components:
additive
genetics,
shared
86
environment
and
unique
environment
(often
referred
to
as
the
ACE
model)
(Bouchard
Jr
and
87
Propping,
1993;
Falconer
et
al.,
1
996;
Plomin
et
al.,
1997).
The
simplest
method
for
calculating
88
heritability
is
to
use
Falconer’s
formula.
The
formula
assumes
that
unique
environment
89
contributes
equally
to
the
phenotypic
variance
for
both
MZ
and
DZ
twins,
and
that
therefore
the
90
difference
between
MZ
phenotypic
correlation
and
DZ
phenotypic
correlation
arises
solely
91
because
of
genetic
factors
(Mayhew
and
Meyre,
2017;
Polderman
et
al.,
2015).
Modern
92
maximum
likelihood
-
based
modeling
estimates
various
components
for
the
total
variance
93
(Martin
and
Eaves,
1977;
Winkler
et
al.
,
2015
),
but
in
essence
relies
on
the
same
set
of
94
assumptions
and
logic,
which
are
continually
debated.
The
equal
environment
assumption
95
(EEA),
for
example,
is
often
believed
to
be
violated.
MZ
twins,
due
to
their
physical
96
r
esemblance,
are
likely
to
encounter
a
more
similar
social
environment
than
DZ
twins.
97
Furthermore,
gene
-
environment
interaction
is
often
not
properly
modeled
or
completely
omitted
98
as
in
the
case
of
using
Falconer’s
formula
in
twin
studies
(Beckwith
and
Morr
is,
2008;
Charney,
99
2017;
Joseph,
2002;
Kamin
and
Goldberger,
2002;
Schönemann,
1997).
Yet
a
recent
meta
-
100
analysis
paper
that
investigated
the
heritability
of
a
wide
range
of
human
traits
based
on
twin
101
studies
in
the
past
fifty
years
showed
that
for
69%
of
t
he
traits
analyzed,
there
was
a
twofold
102
difference
in
the
MZ
correlations
relative
to
DZ
correlations,
consistent
with
a
simple
model
that
103
all
twin
resemblance
was
solely
due
to
additive
genetic
variation
(Polderman
et
al.,
2015).
104
Given
the
lack
of
consen
sus
on
modeling
the
exact
causes
for
the
difference
between
MZ
and
DZ
105
twins,
we
here
present
a
model
-
free
approach,
using
data
-
driven
machine
-
learning
tools.
These
106
have
been
shown
to
yield
better
results
in
the
literature,
most
notably
in
improving
the
pr
ediction
107
of
human
phenotypic
traits
using
single
-
nucleotide
polymorphism
(SNP)
data
(de
Vlaming
and
108
Groenen,
2015;
Koo
et
al.,
2013;
Mieth
et
al.,
2016;
Paré
et
al.,
2017;
Sun
et
al.,
2008).
One
109
review
that
evaluated
Ridge
regression
(which
is
a
model
used
in
our
study)
lists
several
110
advantages
over
conventional
genome
-
wide
association
methods:
(1)
substantially
increased
111
accuracy,
especially
for
large
sample
sizes;
(2)
the
regularization
term
in
the
Ridge
regression
112
allows
flexible
accounting
of
the
linkag
e
disequilibrium
between
SNPs;
(3)
more
113
computationally
efficient
than
repeated
simple
regressions
(de
Vlaming
and
Groenen,
2015).
114
Other
models,
such
as
Random
Forest,
a
nonlinear
machine
learning
model,
have
been
used
to
115
predict
coronary
artery
calcificat
ion
using
SNP
data,
achieving
not
only
good
prediction,
but
also
116
reliably
identifying
best
predictors
across
different
datasets
(Sun
et
al.,
2008).
Feature
weights
117
have
been
further
utilized
in
one
study
that
trained
support
vector
machines
(SVM)
to
class
ify
118
siblings
versus
unrelated
people
using
resting
-
state
fMRI
data
to
derive
heritability
for
brain
119
activity
(Miranda
-
Dominguez
et
al.,
2018).
Overall,
machine
learning
models
have
demonstrated
120
superior
prediction
performance
compared
to
conventional
metho
ds,
and
the
feature
weights
121
learned
by
the
models
have
the
potential
to
be
used
for
qualitative
estimation
of
heritability.
122
The
present
study
has
two
broad
aims:
1,
We
tried
to
identify
the
same
individuals
and
identical
123
twins
based
on
their
behavioral
p
rofile,
testing
if
the
success
in
connectome
fingerprinting
that
124
has
been
applied
to
the
neuroimaging
component
of
the
HCP
(Finn
et
al.,
2015)
could
be
125
replicated
using
this
set
of
rich
behavioral
measures.
2,
We
set
out
to
characterize
the
heritability
126
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
4
of
the
behavioral
data
in
this
dataset
using
both
the
classical
method
and
novel
machine
-
learning
127
based
methods,
for
raw
behavioral
scores
as
well
as
nine
latent
factors.
Aside
from
valuable
128
comprehensive
data
on
the
heritability
of
psychological
variables,
our
results
can
motivate
129
hypotheses
about
the
heritability
of
the
neural
underpinnings,
which
we
hope
future
studies
will
130
pursue
in
the
same
subject
sample.
131
Materials
and
M
ethods
132
Data
133
We
used
behavioral
data
from
the
Human
Connectome
Project
(HCP)
S1200
release
under
the
134
domains
of
cognition,
emotion
and
personality
(Van
Essen
et
al.,
2013).
The
37
selected
135
variables
were
summary
scores
for
either
a
behavioral
task
or
a
questionnaire
(see
Table
S1
for
136
more
detailed
description
for
each
variable,
and
Figu
re
1
A
for
their
correlation
structure).
The
137
NEO
agreeableness
score
was
re
-
calculated
since
item
#59
was
incorrectly
coded
at
the
time
of
138
downloading
the
data
(an
issue
reported
to
and
verified
by
HCP
1
).
Since
the
variables
were
on
139
different
scales,
we
fir
st
pre
-
processed
them
to
all
have
zero
mean
and
unit
variance.
Each
140
subject
was
thus
essentially
described
by
a
vector
of
37
scores/features,
representing
their
141
psychological
profile.
142
Of
1206
subjects,
1189
subjects
had
complete
data
for
the
37
scores
of
interest,
and
1142
had
143
family
relationship
data
verified
by
genotyping,
yielding
a
final
set
of
149
pairs
of
genetically
144
confirmed
monozygotic
(MZ)
twins
(298
subjects,
all
of
the
same
sex)
and
90
pairs
of
dizygotic
145
(DZ)
twins
(180
subjects,
one
twin
pair
was
of
opposite
sex
and
thus
excluded)
with
complete
146
data
for
the
37
behavioral
variables
of
interest.
A
subset
of
46
MZ
subjects
had
complete
test
-
147
retest
data
for
the
selected
37
scores,
which
we
used
to
calculate
test
-
retest
reliability
(as
their
148
Pearson
’s
correlation
coefficients,
Figure
1
B)
.
We
thus
used
1189
subjects
in
total,
of
which
478
149
were
either
MZ
or
DZ
twins.
150
Same
individual
and
twin
identification
151
Same
individual:
We
first
asked
how
well
a
subject
could
be
re
-
identified
from
their
retest,
152
com
pared
to
all
other
subjects,
for
the
46
subjects
who
had
test
-
retest
data
available.
We
153
calculated
pairwise
Euclidean
distances
between
a
given
subject’s
retest
data
and
each
of
the
154
1189
subjects’
original
data
(including
the
subject’s
own
original
data)
a
nd
then
ranked
the
155
distances
in
ascending
order
to
see
if
the
subject’s
retest
data
was
closest
to
his/her
own
original
156
data.
157
MZ
twin:
Similar
to
the
above,
we
took
one
person
(target)
out
of
the
298
MZ
twins
and
158
calculated
pairwise
Euclidean
distances
be
tween
this
subject
and
each
of
the
remaining
1188
159
subjects,
and
then
ranked
the
distances
in
ascending
order
to
see
if
the
corresponding
MZ
twin
160
was
closest
to
the
target.
161
Standard
calculation
of
heritability
162
1
ht
t
ps
:
/
/
w
w
w
.
m
a
i
l
-
a
r
c
hi
v
e
.
c
om
/
h
c
p
-
us
e
r
s
@
hum
a
nc
on
ne
c
t
o
m
e
.
or
g
/
m
s
g06007
.
ht
m
l
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
5
In
the
behavioral
genetics
literature,
a
standard
way
to
derive
heritability
is
based
on
twin
163
correlations
calculated
using
Falconer’s
formula
(Falconer
et
al.,
1996)
:
164
"
=
2
(
푅푚푧
푅푑푧
)
(1)
165
Where
"
is
the
overall
heritability,
푅푚푧
the
correlation
for
a
phenotypic
trait
between
166
monozygotic
twins,
and
푅푑푧
the
correlation
for
a
phenotypic
trait
between
dizygotic
twins.
167
Machine
learning
approach
168
We
took
as
input
data
the
absolute
feature
-
wise
difference
between
each
twin
pair,
described
by
169
a
vector
of
37
pre
-
processed
behavioral
variables
as
described
above,
giving
us
149
MZ
pair
data
170
and
90
DZ
pair
data
which
we
tried
to
classify.
To
resolve
unbalanced
classes,
we
randomly
171
sampled
the
DZ
class
with
replacement
to
match
the
number
of
MZ
cases.
172
We
used
three
widely
used
models:
a
Ridge
classifier,
a
simple
univariate
model,
and
a
Random
173
Forest
model,
which
is
a
nonlinear
decision
tree
-
based
model
that
ensures
accurate
feature
174
weights
even
when
features
are
correlated.
For
the
univariate
model,
th
e
dependent
variable
was
175
the
class
and
the
independent
variable
was
each
of
the
37
features;
we
used
this
simple
model
176
because
it
most
clearly
tests
the
maximal
contribution
of
each
feature
in
isolation.
177
We
fitted
both
Ridge
(the
alpha
parameter
for
the
r
egularization
term
was
determined
by
cross
178
validation
to
be
alpha
=
100
for
using
37
features,
alpha
=
10
for
using
the
set
of
9
factor
scores
179
calculated
using
linear
regression
,
and
alpha
=
100
for
using
both
sets
of
18
factor
scores)
and
180
Random
Forest
mo
del
s
(maximum
tree
depth
was
set
to
be
5
with
100
trees
in
the
forest
to
181
prevent
overfitting).
Each
model
was
estimated
1000
times;
for
each
iteration,
data
was
sampled
182
as
described
above
and
then
randomly
split
into
70%
training
data
and
30%
testing
data
.
For
183
Ridge
classification,
the
testing
accuracy
and
the
coefficients
for
each
of
the
37
features
were
184
recorded.
For
Random
Forest,
the
model
returns
feature
importances
that
reflect
mean
decrease
185
impurity
(averaged
across
all
decision
trees
in
the
random
forest)
(Leo
et
al.,
1984).
So,
a
feature
186
with
a
higher
importance
score
is
better
at
decreasing
node
impurity
(which
is
a
metric
of
the
187
number
of
mis
-
labeled
data
points
at
the
current
node
of
a
decision
tree),
i.e.,
it
is
more
188
informative
than
other
feat
ures.
We
evaluated
the
performance
of
Random
Forest
models
using
189
both
testing
accuracy
and
ROC
curve
analysis.
190
Factor
analysis
191
Given
the
strong
inter
-
correlations
between
the
37
behavioral
variables
(Figure
1
A
)
and
the
192
consideration
that
a
single
individ
ual
variable/task
will
yield
an
imprecise
measure
of
the
193
underlying
psychological
construct,
we
performed
an
exploratory
factor
analysis
using
SPSS
194
with
principal
axis
factoring
as
the
extraction
method,
and
kept
nine
factors
that
had
195
eigenvalues
>1,
which
together
explained
about
60%
of
the
variance.
Factors
were
rotated
using
196
Promax
rotation,
since
there
was
no
evidence
that
the
factors
were
orthogonal.
We
also
197
calculated
the
factor
scores
using
both
regression
and
Bartlett
methods.
198
Statistical
testing
199
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
6
Th
e
statistical
significance
of
our
identification
tests
was
evaluated
with
permutation
testing.
200
Over
1000
iterations,
subject
identity
was
randomly
shuffled
from
the
original
dataset
across
the
201
1189
subjects,
and
the
same
identification
procedures
described
above
(both
same
-
individual
202
identification
and
identical
-
twin
identification)
were
performed
to
derive
the
empirical
203
distribution
for
chance
-
level
identification
accuracy.
204
To
assess
the
statistical
significance
of
our
classification
performance,
we
const
ructed
the
95%
205
confidence
interval
from
the
empirical
testing
accuracy
distribution
(resulting
from
the
1,000
206
bootstraps
that
we
performed)
for
each
classification
problem.
A
bootstrap
p
-
value
was
also
207
computed
as
the
ratio
of
the
instances
of
having
a
tes
ting
accuracy
equal
or
lower
than
50%
208
(which
is
the
expected
chance
accuracy
for
random
guessing
with
equal
probability
for
a
209
balanced
binary
classification)
out
of
the
total
number
of
bootstraps.
210
Permutation
testing
was
also
used
to
test
for
a
significan
t
difference
in
average
heritability
211
between
the
questionnaire
domain
and
behavioral
task
domain.
The
null
hypothesis
was
that
the
212
task
and
the
questionnaire
domain
comprised
the
same
distribution.
Under
the
null
hypothesis,
213
the
number
of
all
possible
perm
utations
(selecting
15
out
of
37
measures
as
task
scores)
was
214
9
.
4
10
2
,
which
we
approximated
using
Monte
Carlo
sampling
of
100,000
permutations.
For
215
each
permutation,
we
randomly
assigned
15
values
to
the
task
domain
and
the
rest
to
the
216
questionnaire
domai
n
and
then
calculated
the
absolute
difference
between
the
two
heritability
217
means
as
our
test
statistic.
Statistical
significance
was
quantified
as
the
probability
(under
the
218
null
hypothesis)
of
observing
a
value
of
the
test
statistic
more
extreme
than
wha
t
was
actually
219
observed.
We
performed
the
same
analysis
for
four
sets
of
heritability
estimates
(heritability
220
calculated
using
Falconer’s
formula,
univariate
model
weights,
Ridge
weights,
and
feature
221
importances
for
the
Random
Forest
model,
each
consisting
of
37
values).
For
heritability
222
calculated
using
Falconer’s
formula,
we
set
any
negative
value
to
be
zero.
223
Results
224
Same
individual
and
Monozygotic
twin
identification
based
on
psychological
profiles
225
Given
the
rich
behavioral
measures,
we
first
attempted
to
re
-
identify
the
same
individual
using
226
all
of
the
37
measures.
Of
the
46
subjects
with
retest
data,
we
were
able
to
re
-
identify
26,
227
yielding
an
accuracy
of
56.5
%
with
a
median
distance
rank
of
1.0
a
nd
a
mean
distance
rank
of
228
12.1
among
1189
people.
We
performed
permutation
testing
to
assess
the
statistical
significance
229
of
our
identification
accuracy.
Across
1,000
iterations,
the
highest
success
rate
achieved
was
230
2/46
which
is
roughly
4.3%
and
the
p
-
v
alue
associated
with
obtaining
at
least
26
correct
231
identifications
was
<0.0001.
232
We
carried
out
the
same
analysis
for
MZ
twin
identification:
compared
to
other
siblings
and
233
genetically
unrelated
people,
MZ
twins
should
be
most
similar
to
one
another
(Bouch
ard
Jr
and
234
Propping,
1993;
Falconer
et
al.,
1996;
Plomin
et
al.,
1997).
Of
the
298
MZ
subjects,
we
235
identified
the
exact
corresponding
MZ
twin
for
21
of
them,
yielding
an
accuracy
of
7.0
%
with
a
236
median
distance
rank
of
47.5
among
1188
people.
Assessing
sta
tistical
significance
with
1000
237
permutations,
the
highest
success
rate
achieved
was
3/298,
roughly
1.0%,
and
the
p
-
value
238
associated
with
obtaining
at
least
21
correct
identifications
was
<0.0001.
Thus,
our
ability
to
239
identify
somebody’s
identical
twin
base
d
on
the
behavioral
data
was
considerably
worse
than
our
240
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
7
ability
re
-
identify
the
same
individual
(7%
accuracy
vs.
56.5%),
even
though
statistically
highly
241
significant.
242
The
ability
to
re
-
identify
a
given
individual
from
test
-
retest
essentially
sets
an
uppe
r
bound
on
243
our
ability
to
identify
a
MZ
twin,
and
presumably
reflects
the
specific
limitations
of
this
244
particular
dataset,
including
factors
such
as
the
number
of
features
(37
compared
to
ideally
245
infinite)
and
the
reliability
of
the
features
(test
-
retest
r
eliability
in
Figure
1
B
).
We
next
246
investigated
the
heritability
of
each
measure
and
the
fundamental
assumptions
in
twin
studies.
247
The
standard
method
of
calculating
heritability
248
In
twin
studies,
the
most
common
approach
to
calculate
heritability
is
to
co
mpare
the
difference
249
in
correlations
between
MZ
and
DZ
twins
(see
Introduction).
In
this
framework,
we
calculated
250
the
heritability
using
Falconer’s
formula
(Figure
2
A
).
As
can
be
seen
from
the
figure,
the
251
heritability
calculated
in
this
manner
had
a
very
l
arge
range
across
the
different
tasks
and
252
actually
yielded
a
negative
value
for
two
of
them
(MZ
correlation
was
smaller
than
the
DZ
253
correlation).
This
demonstrates
some
of
the
flaws
with
using
Falconer’s
formula
on
this
dataset.
254
One
possible
explanation
for
this
theoretically
invalid
result
could
be
that
the
measures
have
255
poor
test
-
retest
reliability.
Yet,
for
the
two
tasks
in
question,
the
short
Penn
line
orientation
test
256
had
a
test
-
retest
reliability
of
0.76
and
the
life
satisfaction
questionnaire
had
a
test
-
retest
257
reliability
of
0.89.
Another
limiting
factor
could
be
the
sample
size
used
to
calculate
the
twin
258
correlations
(on
the
order
of
100
here).
There
exist
more
complex
modeling
approaches
to
259
estimate
heritability
(Martin
and
Eaves,
1977;
Winkler
e
t
al.
,
2015
),
but
fundamentally,
those
260
methods
rely
on
the
same
assumptions.
Given
the
patent
limitations
of
the
standard
approach,
261
which
is
well
known
in
the
literature
(Beckwith
and
Morris,
2008;
Charney,
2017;
Joseph,
2002;
262
Kamin
and
Goldberger,
2002;
M
ayhew
and
Meyre,
2017;
Schönemann,
1997),
we
took
an
263
alternative
approach
of
estimating
heritability,
which
is
to
make
use
of
machine
learning
models
264
that
are
more
data
-
driven
and
less
model
-
based.
265
A
machine
learning
alternative
for
estimating
heritabilit
y
for
the
37
measures
266
The
traditional
approach
derives
heritability
from
the
differences
between
MZ
and
DZ
twins.
If
267
we
assume
that
any
differences
between
the
two
types
of
twin
pairs
indeed
arise
solely
from
268
genetics,
then
a
classifier
trained
to
disting
uish
MZ
twins
and
DZ
twins
should
assign
greater
269
weights
to
the
features
that
have
higher
heritability,
as
they
are
more
informative
for
270
discriminating
the
two
classes.
This
allows
us
to
test
at
least
qualitatively
how
reasonable
the
271
heritability
estimatio
ns
were
that
we
derived
above
using
standard
methods.
272
The
first
approach
we
used
was
Ridge
classification,
which
is
a
variant
of
a
simple
multivariate
273
model
with
a
regularization
term
that
forces
the
weights
to
be
more
stable
and
robust
to
274
correlated
feat
ures
(Freckleton,
2011;
Gopakumar
et
al.,
2016)
(which
was
the
case
for
the
275
measures
we
selected
as
illustrated
in
Figure
1
A
).
The
mean
coefficients
for
each
feature
are
276
plotted
in
Figure
2
C
,
the
model
had
a
mean
testing
accuracy
of
68.7%
(95%
confidence
i
nterval
277
for
the
testing
accuracy:
[58.9%,77.8%];
the
bootstrap
p
-
value
under
the
null
hypothesis
that
278
testing
accuracy
is
not
significantly
higher
than
50%
was
<0.0001).
In
addition
to
Ridge
279
regression,
we
also
fitted
the
simplest
univariate
model
for
each
of
the
37
measures,
an
OLS
280
regression
model
with
a
single
feature,
each
one
of
the
coefficients
are
shown
in
Figure
2
B
.
This
281
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
8
univariate
regression
would
therefore
reflect
the
maximal
contribution
from
each
feature
in
282
isolation,
allowing
a
clearer
quantif
ication
of
each
individual
feature’s
heritability
than
the
Ridge
283
or
Random
Forest
models,
which
incorporate
multicollinearity
between
features.
The
two
sets
of
284
coefficients
(univariate
and
Ridge)
had
a
Spearman’s
rank
-
order
correlation
of
0.82
across
the
3
7
285
features.
286
Another
popular
approach
is
the
Random
Forest
classifier,
which
is
a
nonlinear
model
comprised
287
of
many
decision
trees.
For
each
decision
tree
inside
the
forest,
the
method
draws
a
randomly
288
sampled
training
set
and
only
considers
a
random
sampl
e
of
features
for
splitting
at
each
node.
289
The
structure
of
the
model
helps
with
the
problem
of
highly
correlated
features
and
allows
more
290
stable
and
accurate
estimations
of
feature
weights
(importances).
The
mean
feature
importances
291
are
plotted
in
Figure
2
D
,
the
model
had
a
mean
predictive
accuracy
of
79.4%
(95%
confidence
292
interval:
[71.1%,87.8%];
p
<0.0001);
mean
area
under
the
ROC
curve
was
0.88
(with
a
standard
293
deviation
of
0.04).
294
To
compare
all
these
different
results,
we
quantified
the
correlations
bet
ween
all
four
sets
of
295
values,
including
classic
heritability
as
calculated
from
Falconer’s
formula,
Ridge
classifier
296
coefficients,
univariate
model
coefficients
and
Random
Forest
feature
importances.
We
found
297
good
agreement
across
different
approaches
with
Spearman’s
rank
correlation
ranging
from
0.42
298
to
0.82
(Figure
2
E
),
demonstrating
the
validity
of
our
novel
machine
-
learning
approach
for
299
estimating
heritability
qualitatively.
Considering
that
we
had
correlated
features
in
the
dataset
300
(Figure
1
A
),
the
res
ults
also
partially
confirmed
the
capability
of
both
Ridge
and
Random
Forest
301
at
handling
feature
correlations
as
they
both
agreed
well
with
the
univariate
coefficients,
302
correlated
at
0.82
and
0.7
respectively.
Results
that
corrected
for
test
-
retest
reliabi
lity
were
303
similar
to
the
uncorrected
ones
presented
here
(Figure
S
1
).
304
We
next
asked
a
more
general
question:
are
the
heritability
or
feature
weights
on
average
305
significantly
different
for
the
behavioral
task
domain
compared
to
the
self
-
report
questionnaire
306
domain?
Under
the
null
hypothesis
that
average
heritability
for
the
task
and
the
questionnaire
307
domain
are
not
significantly
different,
we
constructed
the
distribution
of
the
absolute
difference
308
for
average
heritability
between
the
task
and
qu
estionnaire
domain
(Figure
3),
and
calculated
the
309
p
-
values
for
four
sets
of
heritability
estimates
(see
more
details
in
the
method
section).
For
all
310
cases
except
Ridge
(for
which
the
p
-
value
was
0.021,
uncorrected
for
testing
our
hypothesis
with
311
the
four
s
ets
of
heritability
estimates),
we
found
no
strong
evidence
to
reject
the
null
hypothesis.
312
When
taking
test
-
retest
reliability
into
consideration
by
simple
disattenuation
(dividing
by
rest
-
313
retest
reliability),
again
only
Ridge
coefficients
had
the
smallest
p
-
value
of
0.008
(Figure
S
2
).
314
However,
it
may
not
be
valid
simply
to
divide
by
test
-
retest
reliability,
since
measures
with
very
315
poor
reliability
could
yield
artificially
inflated
heritability.
As
noted
above,
a
single
task
or
316
questionnaire
is
often
limit
ed
in
reflecting
the
meaningful
psychological
variable
of
which
it
is
a
317
measure,
and
we
therefore
next
conducted
factor
analysis
to
derive
latent
factors
across
our
37
318
measures.
319
Estimating
heritability
for
the
factors
320
We
extracted
nine
factors
from
all
3
7
measures
that
together
accounted
for
59.7%
of
the
total
321
variance
(Table
S2).
The
interpretations
and
accounted
variances
of
the
factors
were
factor
1:
322
positive
social
ability
(22.2%);
factor
2:
negative
affect
(11.0%);
factor
3:
general
intelligence
323
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
9
(5.
1%);
factor
4:
self
-
regulation
(4.7%);
factor
5:
attention
and
processing
speed
(4.0%);
factor
6:
324
agreeableness
(3.6%);
factor
7:
self
-
efficacy
(3.2%):
factor
8:
language
and
communication
325
(3.2%)
and
fac9:
competitiveness
(2.8%).
326
We
also
computed
factor
s
cores
using
both
regression
and
Bartlett
methods
for
reliability
(since
327
factor
scores
are
indeterminate).
These
two
methods
produced
two
sets
of
very
similar
factor
328
scores
for
the
same
nine
factors
(see
correlation
structure
between
18
factor
scores
in
Fi
gure
S
3
).
329
We
used
these
two
set
of
factor
scores
simultaneously
as
features
in
the
Ridge
classifier
and
330
Random
Forest
model
to
further
assess
the
ability
of
each
model
to
handle
highly
correlated
331
features
(a
more
challenging
task
than
handling
the
37
varia
bles
which
were
less
inter
-
correlated
332
in
comparison).
For
a
model
that’s
robust
to
correlation
among
features,
it
should
be
able
to
333
assign
similar
weights
or
importances
to
features
that
are
highly
correlated
to
each
other.
334
We
repeated
the
previous
analys
es
using
both
sets
of
factor
scores
so
that
each
subject
was
335
represented
by
a
vector
of
18
factor
scores
to
derive
standard
heritability,
Ridge
coefficients,
336
univariate
coefficients
and
Random
Forest
feature
importances
for
the
nine
factors
(Figure
S
4
).
337
Fo
r
Heritability
using
Falconer’s
formula
and
univariate
coefficients
(Figure
S
4
A
,
B
),
each
338
factor
score
was
treated
independently,
so
they
were
not
susceptible
to
the
influence
of
339
correlation
among
factors.
For
the
Ridge
classifier,
for
the
two
sets
of
fact
or
scores,
the
two
sets
340
of
coefficients
(Figure
S
4
C
)
had
a
Pearson’s
correlation
of
0.79.
For
the
Random
Forest
341
analysis,
the
correlation
between
the
two
sets
of
feature
importances
(Figure
S
4
D
)
was
0.61.
342
Therefore,
these
results
further
confirmed
that
Ri
dge
and
Random
Forest
were
able
to
assign
343
similar
weights
to
highly
correlated
features
and
that
their
estimation
of
heritability
was
reliable.
344
We
repeated
the
analysis
for
the
Ridge
classifier
and
Random
Forest
using
only
the
one
set
of
345
factor
scores
der
ived
from
regression
methods
(Figure
4
C
,
D
).
When
using
the
nine
regression
346
factor
scores
alone,
The
Ridge
classifier
had
a
mean
accuracy
of
64.2%
(95%
CI:
347
[55.3%,73.3%];
bootstrap
p
-
value
=
0.006)
while
the
Random
Forest
classifier
had
a
mean
348
testing
accu
racy
of
77.9%
(95%
CI:
[67.8%,86.7%];
bootstrap
p
-
value
<0.0001)
and
mean
area
349
under
the
ROC
curve
of
0.86
(with
a
standard
deviation
of
0.04).
The
reduction
of
model
350
performance
compared
to
using
all
37
measures
was
minimal,
indicating
that
the
latent
fac
tors
351
captured
the
information
relevant
to
estimating
heritability.
For
the
set
of
factor
scores
derived
352
by
regression,
when
trained
alone
versus
together
with
the
other
set
of
factor
scores
computed
by
353
the
Bartlett
method,
the
Spearman’s
rank
correlation
o
f
Ridge
coefficients
was
0.73.
For
the
354
Random
Forest
classifier,
the
feature
importances
were
correlated
at
0.93.
These
results
355
demonstrated
that
the
feature
weights
that
Ridge
and
Random
Forest
learned
for
the
nine
factors
356
(calculated
using
Regression
met
hod)
were
robust
and
consistent.
357
Recall
that
for
the
37
measures,
standard
heritability
and
feature
importances
from
the
three
358
models
agreed
relatively
well,
from
0.42
to
0.67
(Figure
2
E
).
However,
for
the
nine
factors,
the
359
classical
heritability
estim
ates
from
Falconer’s
formula
(Figure
4
A
)
had
lower
correlations
with
360
the
three
other
sets
of
model
estimation,
from
0.15
to
0.38
(Figure
4
E
).
One
specific
difference,
361
for
example,
was
the
estimation
of
factor
3
which
reflects
general
intelligence.
All
thre
e
models
362
assigned
high
importance
to
this
factor
while
the
traditional
heritability
calculation
assigned
a
363
rather
low
value
at
22.9%.
In
the
literature,
the
estimation
for
the
heritability
of
intelligence
is
364
quite
high,
often
above
50%
and
sometimes
report
ed
to
be
as
high
as
80%
(Bouchard,
2004;
365
Panizzon
et
al.,
2014;
Plomin
and
Deary,
2015).
The
machine
-
learning
models
are
thus
likely
to
366
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
10
have
produced
a
more
accurate
estimation
of
heritability
from
this
dataset
than
the
standard
367
formula
was
able
to.
368
Discu
ssion
369
Summary
of
results
370
In
this
study,
we
analyzed
a
comprehensive
set
of
37
behavioral
scores
in
the
Human
371
Connectome
Project.
When
representing
each
subject
using
this
set
of
behavioral
data,
we
were
372
able
to
achieve
a
behavioral
fingerprinting
accuracy
of
56.5%
for
individuals,
and
in
the
case
of
373
identifying
identical
twins,
an
accuracy
of
7.0%
(both
significantly
above
chance).
We
further
374
computed
heritability
for
those
37
scores
in
two
general
schemes:
classical
correlation
-
based
375
method
using
Falconer’
s
formula,
and
three
machine
-
learning
based
methods
(univariate
linear
376
model;
Ridge
classify
and
Random
Forest
model),
and
found
relatively
high
correlations
377
between
the
two
schemes
(Figure
2
E
).
Given
the
inter
-
correlations
among
the
37
scores,
an
378
explorat
ory
factor
analysis
was
conducted
to
extract
nine
latent
factors,
whose
heritability
we
379
assessed
similarly.
In
this
case,
the
correlations
between
the
classical
method
and
machine
-
380
learning
-
based
ones
were
lower
(Figure
4
E
).
381
Individual
and
MZ
twin
identification
382
Our
behavioral
fingerprinting
scheme
was
inspired
by
the
success
of
connectome
fingerprinting
383
using
HCP
data
(Finn
et
al.,
2015).
Our
accuracy
of
56.5%
was
relatively
high
considering
the
384
limiting
factors
that
we
faced:
a
small
number
of
fe
atures
compared
to
the
connectome
385
fingerprinting
(which
had
268
nodes
and
35778
edges)
and
measurement
error
from
some
386
measures
with
relatively
low
test
-
retest
reliability.
Our
identification
of
MZ
twins
faced
the
same
387
limitations,
but
we
observed
a
drop
o
f
performance
to
an
accuracy
of
7.0%.
This
accuracy
drop
388
alone
would
seem
to
put
a
limit
on
the
strength
of
the
heritability
of
our
measures.
389
One
possible
explanation
is
that
the
unique
environment
actually
accounts
for
a
substantial
390
portion
of
the
varian
ce
for
those
measures,
overwhelming
the
contribution
of
common
391
environment
and
genes.
According
to
a
study
that
used
maximum
likelihood
modeling,
unique
392
environment
does
account
for
the
majority
of
variances
for
many
of
the
measures
in
the
HCP,
393
including
s
ome
of
the
ones
we
selected
(Winkler
et
al.
,
2015
).
This
may
also
partly
explain
the
394
modest
classification
accuracy
of
Ridge
classification
between
MZ
twins
and
DZ
twins,
since
395
stronger
contribution
of
unique
environment
implies
weaker
contribution
of
gene
tics
and
396
common
environment
to
the
overall
phenotypic
variances,
thus
diminishing
group
differences
397
between
MZ
twin
pairs
and
DZ
twin
pairs.
398
Comparison
of
the
standard
correlation
-
based
method
versus
machine
-
learning
based
399
methods
of
estimating
heritabili
ty
400
The
standard
analysis
calculates
the
heritability
based
on
the
difference
between
MZ
and
DZ
401
correlations
for
a
phenotypic
trait.
One
immediate
shortcoming
of
this
approach
is
that
it
can
402
sometimes
yield
negative
heritability
in
cases
where
the
MZ
correl
ation
is
actually
smaller
than
403
the
DZ
correlation.
In
our
case,
we
found
that
two
measures
that
had
good
test
-
retest
reliability
404
had
negative
heritability
using
Falconer’s
formula.
Possible
reasons
for
negative
heritability
405
could
be
due
to
small
sample
siz
e
and/or
lack
of
explicit
knowledge
of
the
common
environment.
406
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;
11
However,
it
should
be
mentioned
that
a
negative
estimation
of
heritability
is
not
rare
using
such
407
methods
and
although
most
researchers
attribute
such
invalid
results
to
noise,
they
could
in
fa
ct
408
be
evidence
against
the
assumptions
behind
the
calculations
(Schö
nemann,
1997;
Steinsaltz
et
409
al.,
2018).
410
We
therefore
developed
an
alternative
approach
to
estimate
heritability,
that
is,
to
train
machine
411
learning
models
to
distinguish
MZ
twin
pairs
and
DZ
twin
pairs.
If
the
ACE
model
stands,
then
412
measures/features
that
have
high
heritability
would
be
assigned
larger
weights
since
they
are
413
more
informative
for
the
classification.
We
found
good
rank
correlations
between
the
standard
414
heritability
and
anoth
er
three
sets
of
model
coefficients
for
the
37
behavioral
variables
(Figure
415
2
E
).
However,
when
applied
to
nine
latent
factors,
the
agreement
between
the
standard
416
heritability
and
another
three
sets
of
model
coefficients
were
substantially
lower
(Figure
4
E
)
.
417
However,
the
three
machine
learning
models
had
good
agreement
with
one
another,
as
shown
by
418
relatively
high
rank
correlations
(all
above
0.6)
(Figure
4
E
).
As
mentioned
above,
the
standard
419
heritability
estimation
for
the
general
intelligence
factor
deviat
ed
greatly
from
the
other
three
420
models,
and
from
the
literature.
Such
disagreement
raises
concerns
about
the
validity
of
the
421
assumptions
made
by
the
ACE
model
and
the
usage
of
traditional
methods
for
calculating
422
heritability,
leading
us
to
recommend
the
us
e
of
machine
learning
methods
to
estimate
423
heritability
empirically.
424
Limitations
and
future
directions
425
To
the
best
of
our
knowledge,
this
is
the
first
application
of
utilizing
machine
learning
models
to
426
estimate
heritability
for
behavioral
measures
using
t
he
HCP
data.
We
will
evaluate
each
model
427
respectively
and
make
recommendations
for
future
usages.
428
For
the
univariate
linear
model,
a
conceptually
simple
model,
each
measure
was
evaluated
429
independently
for
its
maximal
contribution
for
the
classification.
F
or
both
raw
measures
and
430
latent
factors,
univariate
model
coefficients
agreed
best
with
standard
heritability
calculations.
431
Though
it
should
be
noted
that
given
the
shortcomings
of
standard
calculations
that
we
discussed
432
before,
good
agreement
with
these
d
oesn’t
necessarily
imply
agreement
with
the
true
set
of
433
heritability
values.
434
The
second
model
we
used
was
a
Ridge
classifier,
a
commonly
used
linear
model
to
deal
with
435
correlated
features
(Dormann
et
al.,
2013;
Freckleton,
2011;
Gopakumar
et
al.,
2016).
A
recent
436
paper
(using
single
-
nucleotide
polymorphism
data)
concludes
that
Ridge
classification
will
437
improve
predictive
accuracy
substantially
compared
to
standard
repeated
univariate
regression
438
for
a
large
enough
sample
size
(de
Vlaming
and
Groenen,
2015).
In
our
case,
we
also
wanted
to
439
derive
accurate
coefficients,
as
estimation
of
heritability.
As
a
regularized
regression,
Ridge
has
440
proven
to
be
effective
at
handling
feature
correlation,
illustrated
by
its
good
agreement
with
the
441
univariate
coefficients
(F
igure
2
E
,
Figure
4
E
)
and
its
ability
to
assign
similar
weights
to
the
two
442
sets
of
factor
scores
(Figure
S
4
C
).
443
The
Random
Forest
model
was
also
robust
with
respect
to
correlations
among
features
(e.g.,
444
Figure
S
4
D
,
for
two
sets
of
almost
identical
factor
scores
for
the
same
nine
factors,
the
two
sets
445
of
feature
importances
had
a
Pearson’s
correlation
of
0.61),
and
achieved
the
highest
accuracy
446
for
the
classification
between
MZ
twin
pairs
and
DZ
twin
pairs.
Given
th
e
nonlinear
nature
of
the
447
All rights reserved. No reuse allowed without permission.
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which
.
http://dx.doi.org/10.1101/704023
doi:
bioRxiv preprint first posted online Jul. 16, 2019;