Article
https://doi.org/10.1038/s41467-024-48548-y
Heterogeneity in strategy use during
arbitration between experiential and
observational learning
Caroline J. Charpentier
1,2
,QianyingWu
1
, Seokyoung Min
1
,WeilunDing
1
,
Jeffrey Cockburn
1
& John P. O
’
Doherty
1
To navigate our complex social world, it
is crucial to deploy multiple learning
strategies, such as learning from directly experiencing action outcomes or
from observing other people
’
s behavior. Despite the pr
evalence of experiential
and observational learning in humans
and other social animals, it remains
unclear how people favor one strategy over the other depending on the
environment, and how individuals vary in their strategy use. Here, we describe
an arbitration mechanism in which the pr
ediction errors associated with each
learning strategy in
fl
uence their weight over behavior. We designed an online
behavioral task to test our computa
tional model, and found that while a
substantial proportion of participant
s relied on the proposed arbitration
mechanism, there was some meaningful heterogeneity in how people solved
this task. Four other
groups were identi
fi
ed: those who used a
fi
xed mixture
between the two strategies, those who r
elied on a single strategy and non-
learners with irrelevant strategies. Furthermore, groups were found to differ
onkeybehavioralsignatures,andontra
nsdiagnostic symp
tom dimensions, in
particular autism traits and anxiety. T
ogether, these results demonstrate how
large heterogeneous datasets and compu
tational methods can be leveraged to
better characterize individual differences.
As humans, we learn about the world around us by seeking and inte-
grating information from multiple sources. On the one hand, we
heavily rely on our own past experience to predict the future. Experi-
ential learning (EL) is such that actions that were rewarded in the past
tend to be repeated, while actions that were punished in the past tend
to be avoided. EL can be relied on to solve many reinforcement
learning problems, from learning simple associations between stimu-
lus, action and reward (model-free learning) to complex cognitive
maps (model-based learning) and exploitation/exploration trade-
offs
1
–
4
. On the other hand, as a social species with sophisticated
social skills that allow us to make collective decisions and function in
society, humans can learn from observing others
5
–
7
. Such observa-
tional learning (OL) is thought to confer the evolutionary advanta-
geous ability to assess the consequences of actions available in the
environment without having to directly experience the potentially
negative outcomes of those actions. OL is prevalent across many
domains, from basic sensory-motor learning
8
–
10
to complex strategic
decision-making
11
,
12
, from aversive
13
,
14
to reward learning
15
–
18
,andcan
even extend to learning from non-human agents
19
or from replayed
actions
18
.
Depending on the uncertainty of the environment, a given strat-
egy may become more reliable to deploy at different points in times
2
,
consistent with a
“
mixture of experts
”
framework in which different
expert systems take the lead in guiding behavior when their predic-
tions are most reliable
20
. Evidence for such uncertainty- or reliability-
based arbitration between learning strategies, as well as its neural
correlates, has been provided within each domain. In EL, people
dynamically arbitrate between model-free and model-based
Received: 14 April 2023
Accepted: 6 May 2024
Check for updates
1
Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, USA.
2
Department of Psychology & Brain and Behavior Institute,
University of Maryland, College Park, MD, USA.
e-mail:
ccharpen@umd.edu
Nature Communications
| (2024) 15:4436
1
1234567890():,;
1234567890():,;
learning
21
,
22
. In OL, recent evidence suggests a similar arbitration
mechanism between imitation
–
the tendency to repeat other people
’
s
choices
–
and emulation
–
the tendency to infer their goals
18
,aswellas
between cooperative and competitive learning during strategic
interactions
23
. Yet, whether and how people may engage dynamic
arbitration processes across domains
–
that is, between EL and OL
–
remains unclear. How people integrate experiential and social infor-
mation during learning has been the focus of a lot of research in the
past two decades. Multiple studies have shown that not only are
decisions in
fl
uenced by both sources of information
24
,
25
, but experi-
ential and social learning signals co-exist in the brain and can be
integrated to predict decisions
15
,
26
–
33
. However, these studies have not
directly assessed the possibility of a dynamic arbitration mechanism
between the two learning domains. In other words, it remains unclear
whether the weight attributed to each strategy before making a deci-
sion varies depending on the environment.
Here, we designed and optimized a novel task that probes both EL
and OL and manipulates uncertainty in each strategy
’
s predictions to
promote dynamic arbitration. We collected data in two independent
online studies, to test the predictions that when outcomes are more
predictable, EL should be favored, and when inferences from the
observed agent
’
s actions are more reliable, OL should be favored. A
second goal of this study was to characterize heterogeneity in the
strategies participants deploy during learning. Given the recent
explosion (partly driven by COVID-19) of online data collection, it has
become clear that, despite attention checks, performance in online
studies tends to be noisier than in the lab, most likely because of the
uncontrolled environment, lack of direct interaction between partici-
pant and experimenter, and larger possible distractions
34
,
35
. However,
online studies allow for the collection of large-scale datasets in shorter
timeframes, often exhibiting good replicability of in-lab
fi
ndings
36
,
thus providing increased power for a more thorough characterization
of individual differences and of their relevance to psychopathology.
Therefore, in addition to probing the dynamic arbitration framework
described above, we also investigated the possibility that not all par-
ticipants relied on this computational model to solve the task
37
.
Though such heterogeneity is likely to exist in any study sample
(online and in-person studies, clinical and general populations, etc), it
is not usually well characterized in existing studies, given that sample
sizes are too small or that most computational modelling approaches
tend to select a
“
winning
”
model and apply it to all participants. Here,
we predicted that different groups of people might rely on different
strategies and set out to characterize this heterogeneity. Speci
fi
cally,
we tested for the possibility that in addition to dynamic arbitration,
some individuals might combine the two strategies in a less
fl
exible
way, such as by relying on an unchanging allocation between the two
strategies (without dynamically arbitrating), or that some might
instead predominantly rely on a single strategy (either EL-only; or OL-
only) to solve the task, while others might use irrelevant heuristics,
such as preferring a given action (left versus right) or a given stimuli
throughout. We also tested whether groups that are solely de
fi
ned
based on model-
fi
tting would differ from each other in meaningful
ways in their behavior on the task and in transdiagnostic symptom
dimensions.
Recent literature in computational psychiatry has shown that
anxiety is associated with dif
fi
culties in adapting to volatility and chan-
ges in uncertainty
38
–
40
, increased exploration to reduce uncertainty
41
,
and faster learning from negative outcomes
42
,
43
. Social anxiety has also
been found to be associated with excessive deliberation
44
and with
suboptimal learning
45
. Finally, autism has been linked to de
fi
cits in
behavioral adaptation during social inference
46
,speci
fi
cally suboptimal
fl
exibility and lower mentalizing sophistication
47
, overestimation of the
volatility of sensory environment
48
, reduced implicit causal inference
about sensory signals
49
, and enhanced observational learning in the
aversive domain
50
. Therefore, we hypothesized that individual
differences in subclinical traits related to anxiety, autism and social
anxiety are likely to be sensitive to the computational heterogeneity in
strategy use during EL, OL, and
the arbitration between them.
In this work, we show that there is substantial heterogeneity in
how participants solve this task, and that individuals can be reliably
characterized by the computational model that best explains their
behavior. We additionally validate this heterogeneity by demonstrat-
ing marked differences in key behavioral markers across groups, as
well as differences in subclinical transdiagnostic traits related to aut-
ism and anxiety.
Results
Behavioral evidence for learning and mixture of strategies
Two groups of participants (Study 1:
N
=126, Study 2:
N
=493, see
Methods for details) performed a novel task online designed to sepa-
rately quantify experiential and observational learning tendencies
during behavior (Fig.
1
). In the task (160 trials), participants learn which
of two tokens (orange or blue) is more likely to yield a reward, which
can be achieved by observing another player choose between two
boxes (identi
fi
ed by unique fractals superimposed on each box) to
obtain a token, or through direct experience of the outcome asso-
ciated with the chosen token (Fig.
1
A). Importantly, participants were
instructed that the other player knew which token was more valuable
at any point in time and were instead learning which of the two boxes
was more likely to yield the valuable token. By observing the other
player
’
s choices, one can thus infer which token color they were tar-
geting as having the highest value. To promote continuous learning, as
well as push the balance between EL and OL and test our proposed
uncertainty-based arbitration mechanism, we manipulated the token
reward probability (including reversals as well as periods of low vs high
uncertainty) and the box-to-token transition probability (also alter-
nating between periods of low and high uncertainty), depicted by blue
and orange lines, respectively, in Fig.
1
B. We also manipulated the
variance in the reward magnitude so that in some blocks, when a token
was rewarded, the variance in magnitude was high, and in other blocks
the variance was low (see Methods for details). While this did not
directly affect the reliability of EL predictions
–
that is, the ability to
predict the occurrence of a reward remained the same
–
we hypo-
thesized that high variance may constitute a form of (task-irrelevant)
uncertainty and tested whether it played a role in the arbitration
process, whereby EL may be weighed less in the high variance
condition.
We
fi
rst examined mean behavioral accuracy (probability of
choosing the more valuable token, calculated across all trials). Accu-
racy was 0.582 ( ± 0.087 SD) in Study 1 and 0.604 ( ± 0.083 SD) in Study
2, signi
fi
cantly above chance level of 0.5 (Study 1: t(125) = 10.67,
P
< 0.001,
d
= 0.94, 95% CI [0.067, 0.097]; Study 2: t(492) = 26.08,
P
< 0.001,
d
= 1.25, 95% CI [0.097, 0.111]). Behavioral evidence for
learning behavior was then obtained by calculating trial-by-trial accu-
racy for the
fi
rst 8 trials after a reversal in token values. There was a
clear increase in accuracy throughout those 8 trials, from 0.528
directly after a reversal to 0.60 in Study 1 and from 0.544 to 0.619 in
Study 2. This increase, modelled as a linear main effect of trial in a
mixed-effect linear model predicting accuracy (
lme4
package in R,
including a random intercept, followed by Type III ANOVA), was sta-
tistically signi
fi
cant (Study 1: F(1,875) = 33.03,
P
< 0.001,
η
p
2
=0.036,
95% CI [0.0067, 0.014], Fig.
2
A; Study 2: F(1,3423) = 96.14,
P
< 0.001,
η
p
2
= 0.027, 95% CI [0.007, 0.011], Fig.
2
B).
We then classi
fi
ed whether participants
’
choice on each trial was
consistent with experiential learning and with observational learning
(see Fig.
1
C, D for an illustration). Out of the trials where the two
strategies predicted different choices according to this classi
fi
cation,
we then calculated the proportion of choices consistent with OL (vs EL)
as an index of preference for one or the other strategy. Mean OL choice
propensity was 0.515 ( ± 0.095) in Study 1 (Fig.
2
C) and 0.493 ( ± 0.107)
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
2
in Study 2 (Fig.
2
D), not signi
fi
cantly different from 0.5 (both
d
<0.16,
P
>0.08,Study1:BF
10
= 0.436, Study 2: BF
10
= 0.163). This implies that
both OL and EL strategies were relied on approximately equally across
participants and across studies, although there was also substantial
individual variability in the degree of engagement of these two stra-
tegies, with some participants exhibiting a clear preference for one
strategy or the other.
To more formally assess whether participants used a mixture of
the two strategies during the task, we ran a mixed-effects general linear
model (ME-GLM) predicting choice on each trial from the outcome of
the past trial (signature of EL) and from the partner
’
spastchoice
(signature of OL). We found that both effects were signi
fi
cant, both in
Study 1 (EL
fi
xed effect: estimate = 0.357 ± 0.051 (SE), t(19862) = 6.94,
P
< 0.001; OL
fi
xed effect: estimate = 0.216 ± 0.026 (SE), t(19862) = 8.21,
P
< 0.001; Fig.
2
E) and in Study 2 (EL
fi
xed effect: estimate =
0.519 ± 0.025 (SE), t(77960) = 20.74,
P
< 0.001; OL
fi
xed effect: esti-
mate = 0.241 ± 0.012 (SE), t(77960) = 19.60,
P
< 0.001; Fig.
2
F), indica-
tive of hybrid behavior between OL and EL (see Table S1A for all
statistics).
Uncertainty-driven behavioral changes in strategy
We next examined whether participants
fl
exibly switched between OL
and EL depending on the variations in uncertainty. First, we classi
fi
ed
trials as low versus high OL uncertainty trials and low versus high EL
uncertainty trials depending on the recent trial history (Fig.
1
B, see
Methods for details). Those trials broadly overlapped with the low vs
Fig. 1 | Observational learning (OL) & Experiential learning (EL) task design.
A
On each trial, participants
fi
rst observe another agent choose between two boxes
represented by fractal images, then observe which token was obtained by the agent
from the chosen box. Then participants choose for themselves between the two
tokens and receive an outcome (from 0
–
no reward
–
to 99 points) associated with
the chosen token. From the
‘
observe
’
part of the trial, participants can learn from
observation which token the other agent is trying to get. From the
‘
play
’
part of the
trial, participants can learn from directly experiencing the outcomes associated
with each token.
B
Example time course of probabilities, condition and block
changes, and reversals. The task contained 8 blocks of 20 trials each. Each block
started with a new pair of boxes (fractals), which had a transition probability
towards their corresponding token of either 0.8 or 0.6, depicted by the orange line.
Within each block there was one reversal in the valuable token, depicted by the
magenta triangles, with the blue line representing the reward probability asso-
ciated with the orange token ( = 1
–
P(reward | blue token)). While the lines repre-
sent EL and OL uncertainty conditions, for behavioral analyses, we de
fi
ned key
uncertainty trials as follows: EL uncertainty (blue points) was deemed low on trials
where past outcome-action-outcome sequence was consistent, and high otherwise.
OL uncertainty (orange points) was deemed low if the past two box-token transi-
tions were consistent, and high otherwise. Finally, reward magnitude was con-
sidered low if the past outcome magnitude was equal to or below 25 points, and
high otherwise.
C
,
D
Illustration of the trial de
fi
nitions used to classify trials as
consistent with EL (
C
) and consistent with OL (
D
).
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
3
high uncertainty conditions that were de
fi
ned by design (larger pro-
portion of low OL uncertainty trials in low compared to high OL
uncertainty conditions: Study 1: t(125) = 30.3, Study 2: t(492) = 61.1;
larger proportion of low EL uncertainty trials in low compared to high
EL uncertainty conditions: Study 1: t(125) = 52.2, Study 2: t(492) = 91.5;
all
P
s< 0.001), but were de
fi
ned to capture trial-by-trial variations in
uncertainty. We hypothesized those variations would be more repre-
sentative of how dynamic changes in uncertainty were experienced by
participants, given that actual changes in uncertainty were not cued,
which would lead to a lag in information integration when considering
the blocked conditions. Indeed, we found that uncertainty
trials
were
stronger predictors of choice throughout the task than uncertainty
conditions
(Table S2). We then calculated the breakdown of OL choice
propensity as de
fi
ned above (illustrated in Fig.
1
C, D, data shown in
Fig.
2
C, D) across these uncertainty trial types and tested their sig-
ni
fi
cance in a linear mixed-effects model predicting OL (vs EL) choice
propensity for each participant from OL uncertainty trial type, EL
uncertainty trial type, and their interaction. We found that the main
effect of both factors was signi
fi
cant, both in Study 1 (effect of OL
uncertainty trial type: F(1,833) = 32.64,
P
< 0.001,
η
p
2
=0.038, 95% CI
[0.014, 0.107]; effect of EL uncertainty trial type: F(1,833) = 39.31,
P
< 0.001,
η
p
2
= 0.045, 95% CI [
−
0.186,
−
0.093]; Fig.
3
A) and in Study 2
(effect of OL uncertainty trial type: F(1,3234) = 149.88,
P
< 0.001,
η
p
2
= 0.044, 95% CI [0.056, 0.105]; effect of EL uncertainty trial type:
F(1,3234) = 268.27,
P
< 0.001,
η
p
2
= 0.077, 95% CI [
−
0.196,
−
0.148];
Fig.
3
B). Moreover, there was also a signi
fi
cant interaction between EL
and OL uncertainty trial types (Study 1: F(1,833) = 4.31,
P
= 0.038,
η
p
2
= 0.005, 95% CI [0.0038, 0.135]; Study 2: F(1,3234) = 9.896,
P
= 0.0017,
η
p
2
=0.003,95%CI[0.021,0.090]),suchthattheeffectof
OL uncertainty was stronger when EL uncertainty was low. For com-
parison, the same analysis conducted on uncertainty
conditions
instead of
trials
is shown in Fig. S1.
Note that by design, and as explained above, we manipulated the
variance in reward magnitude, with the prediction that high variance in
reward magnitude may reduce the tendency to rely on EL, and there-
fore indirectly promote OL. However, in Study 1 we found that reward
magnitude variance had no effect on OL vs EL choice propensity
(t(125) = 0.73,
P
=0.46,
d
=0.065,BF
10
= 0.129), which was also found
to be the case in Study 2 (t(492) = 1.68,
P
=0.095,
d
= 0.076,
BF
10
= 0.203). Instead, whether the magnitude itself was high or low
Fig. 2 | Behavioral signatures of learning and strategy use. A
,
B
Behavioral evi-
dence for learning behavior for Study 1 (
A
) and Study 2 (
B
), calculated as the mean
accuracy (choice of correct token) for each of the
fi
rst 8 trials following a reversal in
token values, then averaged across participants. Error bars represent SEM.
C
,
D
The
proportion of choices consistent with observational learning (OL) versus experi-
ential learning (EL) was calculated out of the trials where OL and EL made different
predictions (according to the de
fi
nitions depicted in Fig.
1
.C
,
D
For Study 1 (
C
)and
Study 2 (
D
). Each dot depicts an individual participant.
E
,
F
Main effects of past
outcome (EL effect, blue) and of past partner
’
s action (OL effect, orange) on current
participant
’
s choice were quanti
fi
ed in a mixed-effects generalized linear model
(ME-GLM), for Study 1 (
E
) and Study 2 (
F
). Bars represent the
fi
xed effect coef
fi
cient
estimates; error bars represent the standard error associated with those estimates;
stars represent the signi
fi
cance of the
fi
xed effects obtained from the ME-GLM (two-
sided, all
P
< 0.001); and each dot is an individual participant (random effect); Study
1:
N
= 126 independent participants (
A
,
C
,
E
); Study 2:
N
= 493 independent parti-
cipants (
B
,
D
,
F
).
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
4
(de
fi
ned as higher or lower than th
emeanexpectedrewardof25
points, see yellow dots on Fig.
1
B), had a strong effect, with a lower
propensity to rely on OL (therefore, higher propensity to rely on EL)
when reward magnitude was high than low (Study 1: t(125) = 6.08,
P
< 0.001,
d
= 0.54; Study 2: t(492) = 15.63,
P
< 0.001,
d
= 0.70). There-
fore, we used magnitude, rather than variance, as a condition to
examine arbitration, but because those predictions were not part of
our initial uncertainty-driven arbitration hypothesis, we opted to focus
the analyses on the effects of OL and EL uncertainty trial types only in
the main text, and we present the additional
fi
ndings related to mag-
nitude as a supplementary analysis in Fig. S2.
We then ran two separate ME-GLMs (one for each manipulation)
to speci
fi
cally quantify the effect of EL and OL uncertainty trial types
on each strategy separately. In the previous analyses, we found effects
of both manipulations on OL vs EL choice propensity in the expected
direction, however, looking only at this behavioral metric we cannot
disentangle whether, for example, OL uncertainty impacts behavior by
increasing OL, decreasing EL, or both. With ME-GLMs quantifying both
OL and EL effects we can address this. Each ME-GLM included four
predictors of choice on each trial (both as
fi
xed and random effects):
past outcome for low and high uncertainty trials, and past partner
action for low and high uncertainty trials (see Table S1B, C for statis-
tics). The resulting random effects were then compared in a 2-by-2
ANOVA, revealing signi
fi
cant interactions between the strategy (OL vs
EL effect) and the manipulation of interest, both in Study 1 (strategy *
OL uncertainty: F(1,375) = 31.06,
P
< 0.001,
η
p
2
= 0.076, 95% CI [0.284,
0.594]; strategy * EL uncertainty: F(1,375) = 99.87,
P
< 0.001,
η
p
2
=0.21,
95% CI [
−
0.574,
−
0.385]; Fig.
3
C) and in Study 2 (strategy * OL uncer-
tainty: F(1,1467) = 176.00,
P
< 0.001,
η
p
2
= 0.107, 95% CI [0.409, 0.551];
strategy * EL uncertainty: F(1,1467) = 752.21,
P
< 0.001,
η
p
2
=0.339,95%
CI [
−
0.720,
−
0.624]; Fig.
3
D). Crucially, the interactions were driven
by a stronger effect of uncertainty trial type on the relevant strategy.
In high OL uncertainty trials, the tendency to rely on OL was reduced
more strongly than the tendency to rely on EL (Fig.
3
C, D left).
And interestingly, high EL uncertainty trials impacted the reliance on
both strategies in opposite directions, that is, not only were they
associated with a reduction in EL but also with an increase in OL
(Fig.
3
C, D right).
Fig. 3 | Behavioral signature of uncertainty-driven arbitration between
experiential (EL) and observational learning (OL). A
,
B
The proportion of OL
choices was computed like in Fig.
2
C,D, but separately for each of 4 trial types
de
fi
ned by OL uncertainty (low or high) and EL uncertainty (low or high), for Study 1
(
A
)andStudy2(
B
). See Methods for details about how uncertainty trial types were
de
fi
ned, and Fig.
1
B for an illustration. Each dot is an individual participant; error
bars represent SEM.
C
,
D
Separate mixed-effects generalized linear models (ME-
GLM) were run to quantify the effect of each uncertainty manipulation on EL (blue)
and OL (orange) separately. Both
fi
xed and random effects of past partner
’
saction
and past outcome, for high and low uncertainty trials, were included into the ME-
GLM, allowing us to quantify each effect for low and high OL uncertainty trials (left),
and low and high EL uncertainty trials (right), for Study 1 (
C
) and Study 2 (
D
). Data
represent the
fi
xed effect coef
fi
cient estimates for each uncertainty trial type; error
bars represent the standard error associated with those estimates; and each dot is
an individual participant (random effect); Study 1:
N
= 126 independent participants
(
A
,
C
); Study 2:
N
= 493 independent participants (
B
,
D
). See Table S1 for statistics.
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
5
Computational modelling: heterogeneity in strategy use
To assess whether participants
’
behavior was better explained by a
single unitary model, or whether different individuals deploy different
strategies, we
fi
t a set of
fi
ve parsimonious models to each participant
’
s
data. Speci
fi
cally,themodelsincludedbothsinglestrategymodels(EL
only, OL only), a mixture model that combines the two strategies using
a
fi
xed weight, a dynamic arbitration model in which the weight varies
dynamically depending on the reliability of each strategy, and
fi
nally, a
baseline model that captures irrelevant, non-leaning strategies (see
Methods for details and equations). We
fi
rst tested whether the models
were uniquely identi
fi
able by calculating a confusion matrix (Fig. S3A).
This analysis showed that all
fi
ve models can be perfectly separated
from each other, such that data generated by any given model is best
explained by that model (exceedance probability of 1 relative to all the
other models). Parameter recovery analyses were also performed for
each model, consistently showing high correlations between actual
and recovered parameters (Fig. S3B
–
F).
Model
fi
tting to data was performed using hierarchical Bayesian
inference in Matlab
’
s
cbm
toolbox
51
, both for each model separately to
ensure reliable parameter estimates and including all
fi
ve models as a
set for Bayesian model comparison. Model frequencies from the latter
analysis, as well as AIC and out-of-sample model predictive accuracy
averaged across participants (see Methods for details) are reported in
Table
1
. Overall, those
fi
ndings suggest that there was no clear and
consistent winner. In both studies, the AIC values suggest a marginal
advantage for the
fi
xed mixture model, while out-of-sample accuracy
slightly favored the dynamic arbitration model. Additionally, the
model frequency values suggest somewhat of an even split, with no
model exhibiting a frequency higher than 33%, with Study 1 showing
the largest frequency for the observational learning model (31.1%) and
Study 2 for the
fi
xed mixture model (32.4%). Therefore, we reasoned
that not every participant
’
s data may be best explained by a single
model across the group as a whole, and that instead, the data may be
better analyzed by taking into account the best-
fi
tting model for each
participant. To do that, we relied on the individual model frequency
values (model responsibility values provided as an output of
cbm
hierarchical Bayesian inference) to classify participants into
fi
ve
groups based on each participant
’
s highest responsibility value. Group
sizes are provided in Table
1
, consistent with our hypothesized het-
erogeneity in strategy use.
Posterior predictive checks
Posterior predictive checks were performed on the models using
participants
’
best-
fi
tting parameters. We
fi
rst demonstrated the clear
dissociation between the EL and OL models, showing that each model
generates choices consistent with its predictions, and that our beha-
vioral signature of interest was recovered by each model as expected
depending on participants
’
preferred strategy (Fig. S4). Through more
in-depth simulations, we then proceeded to generate data from each
model using participants
’
best-
fi
tting parameters, and ran the mixed-
effectsGLMsshowninFig.
2
E, F (signature of hybrid EL/OL behavior)
and in Fig.
3
C, D (effect of EL and OL uncertainty trial types) on the
model-generated data. First, examining the effect of past outcome (EL
effect) and of past partner
’
s action (OL effect) on choice (Fig.
4
A, B), we
found that as expected, the EL effect was well recovered by the EL
model and both arbitration models, while the OL effect was well
recovered by the OL model and both arbitration models. The baseline
model was not able to recover any EL or OL learning effect. Correla-
tions between the data and the model
predictions across individuals
con
fi
rmed this result (Fig.
4
C, D), with the EL model accurately pre-
dicting the EL but not OL effect, the OL model accurately predicting
the OL but not EL effect, and the dynamic arbitration model accurately
predicting both effects. Second, we predicted that the uncertainty
effects, i.e. the extent to which each strategy use varies with EL and OL
trial uncertainty, should be appropriately recovered by the dynamic
arbitration model, since this is the only model that explicitly modulate
strategy weights based on uncertainty. And indeed, we found that the
interactions between strategy use
and uncertainty in data generated
by the dynamic arbitration model (Fig.
4
E
–
H right) matched those
observed in the data (Fig.
4
E
–
H left), with the model showing a clear
effect of OL trial uncertainty on the OL effect (Fig.
4
E, G) and of EL trial
uncertainty on the EL effect (Fig.
4
F, H). Correlations between the data
and model predictions across individuals also showed strong recovery
for the effect of uncertainty on the corresponding strategy (change in
EL effect for low vs high EL uncertainty trials
–
data vs model predic-
tions: Study 1: R(126) = 0.795,
P
< 0.001, Study 2: R(493) = 0.870,
P
< 0.001; change in OL effect for low vs high OL uncertainty trials
–
data vs model predictions: Study 1: R(126) = 0.867,
P
< 0.001, Study 2:
R(493) = 0.886, P < 0.001; Fig. S5A
–
D). Interestingly, we also found that
when running that same posterior predictive check analysis with the
condition
de
fi
nition of OL and EL uncertainty (instead of the
trial
de
fi
nition), the predictions of the dynamic arbitration model were not
as strongly correlated with the data (change in EL effect for low vs high
EL uncertainty condition
–
data vs model predictions: Study 1:
R(126) = 0.588,
P
< 0.001, Study 2: R(493) = 0.712,
P
< 0.001; change in
OL effect for low vs high OL uncertainty condition
–
data vs model
predictions: Study 1: R(126) = 0.633,
P
< 0.001, Study 2: R(493) = 0.593,
P
< 0.001; Fig. S5E
–
H). This further validate
s the uncertainty trial
de
fi
nitions shown in Fig.
1
B. Finally, we also found that dynamic arbi-
tration weight values extracted for each participant from the dynamic
arbitration model varied as predicted according to these trial de
fi
ni-
tions (Fig. S6).
Group differences in learning, mixture of strategies and
arbitration
To assess the behavioral relevance of this classi
fi
cation of participants
in groups according to each individual best-
fi
tting model and to fur-
ther characterize the underlying heterogeneity, we calculated the
Table 1 | Summary of model
fi
ts
Study 1
Study 2
Model
N
par
AIC
OOS acc
Frequency
N
best
(%
tot
)
AIC
OOS acc
Frequency
N
best
(%
tot
)
Baseline
4
215.7
0.521
0.205
25 (19.8)
219.8
0.511
0.159
83 (16.8)
Experiential learning
3
207.7
0.539
0.147
21 (16.7)
204.9
0.552
0.060
24 (4.9)
Observational learning
2
197.2
0.569
0.311
40 (31.8)
194.8
0.575
0.190
95 (19.3)
Fixed mixture
6
191.0
0.593
0.115
14 (11.1)
186.0
0.607
0.324
160 (32.5)
Dynamic arbitration
6
191.2
0.595
0.222
26 (20.6)
187.1
0.608
0.267
131 (26.6)
Each of the
fi
ve models (N
par
= number of parameters) was
fi
tted to participants data
fi
rst using Matlab
’
s
cbm
toolbox. Using individual model-
fi
tting, we computed the mean AIC as well as mean out-
of-sample accuracy (OOS acc) across participants. OOS accuracy was calculated for each individual by
fi
tting the model on 7 task blocks and using the best-
fi
tting parameters to calculate the
likelihood of predicting the participant
’
s choices in the remaining block (then iterating across all 8 blocks). We then used
cbm
’
s hierarchical Bayesian inference
fi
tting across all
fi
ve models to
compute model frequency. Selecting the best-
fi
tting model for each individual participant (highest model responsibility), we then calculated the number and proportion of participants for whom
each model explains their data best (N
best
column).
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
6
behavioral metrics shown in Figs.
2
,
3
, and compared them across
groups, separately for each study. We also performed posterior pre-
dictive checks on the data split by actual groups, or by randomly
assigning participants into groups, to ensure our behavioral differ-
ences between groups were appropriately recovered by the models. In
all statistical analyses, we additionally controlled for individual differ-
ences in gender, age, education level and cognitive ability scores (from
the ICAR) to ensure those factors co
uld not explain any differences
between groups (see Table S4 for all mixed-effect models equations
and statistics). We note that there were some group differences in
some of these variables (see Fig. S7 for details), hence the necessity to
ensure our results were robust to controlling for them. Note also that
the sample size in those analyses was slightly reduced, given missing
data (
N
= 125 in Study 1 because of one participant missing ICAR score;
and
N
= 489 in Study 2 because of four participants missing educa-
tion level).
First, we found that calculating the
fi
ve models
’
out-of-sample
accuracy separately for each group con
fi
rmed that each group was
best
fi
t by its respective model (Fig.
5
A, B). Then, to compare the
learning curves (Fig.
5
C, D, Table S4A), we ran a linear mixed effect
model predicting accuracy from the interaction between trial since last
reversal (varying from 1 to 8) and group, controlling for the covariates
of no interest described above and with a random intercept. We found
asigni
fi
cant interaction between trial and group (Study 1:
Fig. 4 | Posterior predictive checks of strategy-speci
fi
c effects.
The mixed-effects
generalized linear model (ME-GLM) predicting choice from past outcome
(experiential learning (EL) effect, blue) and past partner
’
s action (observational
learning (OL) effect, orange) was run on choice data generated with each of the 5
models, using participants
’
best-
fi
t parameters.
A
,
B
Plotted are boxplots of the
resulting individual random effects for Study 1 (
A
)andforStudy2(
B
). Random
effects obtained on the actual data are shown on the left-most box plot for com-
parison. Horizontal lines represent the median, boxes represent the inter-quartile
range, whiskers range from the minimum to maximum value excluding outliers,
which are shown as individual dots.
C
,
D
Individual random effects obtained from
participants
’
data (x-axis) and from model-generated data (y-axis) are shown for
Study 1 (
C
)andforStudy2(
D
), together with the best-
fi
t linear regression line and
correlation coef
fi
cient R, for the EL model (left), OL model (middle) and dynamic
arbitration model (right).
E
–
H
To ensure the dynamic arbitration model can
reproduce the effect of uncertainty on behavior, we ran the ME-GLM on partici-
pants
’
choices and on choices generated by the dynamic arbitration model sepa-
rately for low and high OL uncertainty (Study 1:
E
,Study2:
G
) and for low and high
EL uncertainty (Study 1:
F
,Study2:
H
). Data depicts the
fi
xed effect coef
fi
cient
estimates for low and high uncertainty (solid lines: data, dashed lines: model pre-
dictions); error bars are the standard error of the ME-GLM coef
fi
cients; dots are
individual random effects from the data. Study 1:
N
= 126 independent participants
(
A
,
C
,
E
,
F
); Study 2:
N
= 493 independent participants (
B
,
D
,
G
,
H
).
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
7
F(4,875) = 13.49,
P
< 0.001,
η
p
2
= 0.058; Study 2: F(4,3423) = 18.81,
P
< 0.001,
η
p
2
= 0.022), suggesting learning differences between
groups, such that people in the baseline group essentially show no
learning, while those in the dynamic arbitration group show the stee-
pest learning curve (95% CI for the effect of trial in the dynamic arbi-
tration group compared to the baseline group: Study 1: [0.021, 0.042];
Study 2: [0.006, 0.017]). Learning curves generated from the four
learning models with completely hypothetical parameters (i.e. not the
best-
fi
tting parameters) con
fi
rmed that learning occurs in those
models, and that the dynamic arbitration model produced the fastest
learning (Fig. S8A). Examining learning curves generated from each
group
’
sbest-
fi
tting model (using individual participants
’
best-
fi
tting
parameters within that group) showed an almost perfect match to the
data (solid vs dashed lines in Fig.
5
C, D), which was not observed when
group membership was randomly shuf
fl
ed (Fig. S8B, C).
To compare the GLM betas measuring EL and OL contributions to
behavior (Fig.
6
, Table S4B), we ran a linear mixed effect model pre-
dicting the mixed effect GLM coef
fi
cient values from the interaction
between effect type (past outcome versus past partner action) and
group, also controlling for covariates and with a random intercept. We
found a signi
fi
cant interaction between effect type and group (Study 1:
F(4,125) = 25.55,
P
<0.001,
η
p
2
= 0.450; Study 2: F(4489) = 73.73,
P
< 0.001,
η
p
2
= 0.376). The interaction was mostly explained by dif-
ferent drivers of behavior in the ExpLearn and ObsLearn groups.
Speci
fi
cally, as expected, people in the ExpLearn group relied more
strongly on past outcome (EL effect) than past partner action (OL
effect) to guide behavior (paired two-tailed t-test
–
Study 1: t(20) = 3.16,
P
= 0.005,
d
= 0.706, 95% CI [0.084, 0.410]; Study 2: t(23) = 4.27,
P
< 0.001,
d
= 0.89, 95% CI [0.179, 0.514]), while behavior in the
ObsLearn group was more strongly driven by past partner action than
past outcome (Study 1: t(39) = 3.60,
P
< 0.001,
d
= 0.577, 95% CI =
[0.052, 0.184]; Study 2: t(94) = 3.07,
P
= 0.003,
d
= 0.316, 95% CI [0.020,
0.092]). We also found a main effect of group (Study 1:
F(4,125) = 40.74,
P
< 0.001,
η
p
2
= 0.566; Study 2: F(4,489) = 106.4,
P
< 0.001,
η
p
2
= 0.769), driven as expected by overall weakest EL and
OL effects in the baseline group, consistent with no learning, but also
by overall strongest EL and OL effects in the dynamic arbitration group
(95% CI of DynArb vs Baseline group difference: Study 1: [0.889, 1.196],
Study 2: [0.750, 0.900]). The latter can be explained by higher overall
accuracy in the dynamic arbitration group, combined with positive
correlations between accuracy and strength of both EL and OL effects
(Accuracy & EL effect: Study 1: R(126) = 0.662,
P
<0.001, Study 2:
R(493) = 0.723,
P
< 0.001; Accuracy & OL effect: Study 1: R(126) = 0.895,
P
< 0.001, Study 2: R(493) = 0.896,
P
< 0.001). Additionally, posterior
predictive checks con
fi
rmed that pattern of GLM effects between
groups, whereby GLM effects from model-generated data matched the
data well when split by actual groups (darker colored bars in Fig.
6
), but
not when split by randomly shuf
fl
ed groups (grey bars in Fig.
6
).
Fig. 5 | Model out-of-sample predictive accuracy and learning curves by group.
A
,
B
We computed out-of-sample accuracy for each participant across blocks
(leaving one block out) and for each of the
fi
ve models, in Study 1 (
A
)andStudy2
(
B
). The top row of the heatmap shows the average predictive accuracy for each
model, while the bottom
fi
ve rows show the breakdown for each group.
C
,
D
Mean
learning curves (similar to Fig.
2
A, B) were computed from participants
’
data
separately for each group (solid lines, see Table S4A for statistics), and from model-
generated data using each group
’
sbest-
fi
tting model (dashed lines), in Study 1 (
C
)
and Study 2 (
D
). The shaded area represents standard errors across participants
within each group.
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
8
Finally, we compared the behavioral signatures of arbitration by
highlighting group differences that dissociate participants expressing
a
fi
xed mixture versus a dynamic arbitration between strategies. Based
on the breakdown of OL choice propensity per trial type (shown on Fig.
S2A, B), we computed a behavioral index of arbitration as the differ-
ence in OL choice propensity between trials where OL is expected to be
most preferred (trials with low OL uncertainty, high EL uncertainty, low
reward magnitude) and trials where EL is expected to be most pre-
ferred (trials with high OL uncertainty, low EL uncertainty, high reward
magnitude). This difference is depicted with the green arrow on Fig.
S2A, B. We then calculated this index separately for each group and
found a signi
fi
cant effect of group on arbitration index in a linear
model controlling for all covariates (Study 1: F(4,112) = 17.13,
P
< 0.001,
η
p
2
= 0.380, Fig.
7
A; Study 2: F(4,465) = 80.55,
P
< 0.001,
η
p
2
=0.409,
Fig.
7
B). Speci
fi
cally, arbitration was found to be maximal in the
dynamic arbitration group and signi
fi
cantly larger than in the
fi
xed
mixture group (Welch two-sample t-test assuming unequal variance;
Study 1: t(17.23) = 4.74,
P
< 0.001,
d
= 1.70, 95% CI [0.252, 0.656]; Study
2: t(277.44) = 7.15,
P
< 0.001,
d
=0.836,95%CI[0.172,0.303]),sug-
gesting a behavioral dissociation between the two arbitration groups,
whereby dynamic arbitration is associated with a more extreme var-
iation in strategies according to the conditions of the environment.
To further examine the effect of each uncertainty trial type on
each strategy separately, we also analyzed how the random effects of
past outcome (EL) and past partner action (OL), estimated separately
for each uncertainty trial type (and shown on Fig.
3
C, D), differed
between groups. For each trial type (OL uncertainty, EL uncertainty),
we ran a linear mixed model predicting the random effect from an
interaction between uncertainty trial type (high, low), strategy (EL, OL)
and group, controlling for covariates and including a random intercept
(Table S4C, D). We found a signi
fi
cant 3-way interaction for each
manipulation, and for both Study 1 (Fig.
7
C: OL uncertainty * strategy *
group, F(4,375) = 20.22,
P
<0.001,
η
p
2
= 0.177; Fig.
7
D: EL uncertainty *
strategy * group, F(4,375) = 19.90,
P
< 0.001,
η
p
2
=0.175) and Study 2
Fig. 6 | Group differences in single strategy use and associated model predic-
tions.
The main effects of past outcome (EL effect,
A, C
) and the effect of past
partner
’
s action (OL effect,
B, D
) on choice (previously shown in Fig.
2
E-F) are now
calculated separately for each group for Study 1 (
A-B
,N
Baseline
= 25, N
ExpLearn
=21,
N
ObsLearn
=40,N
FixArb
= 14, N
DynArb
= 26) and Study 2 (
C, D
,N
Baseline
=83,
N
ExpLearn
=24,N
ObsLearn
= 95, N
FixArb
=160,N
DynArb
= 131). Light colored bars repre-
sent the mean mixed-effects generalized linear model (ME-GLM) coef
fi
cients from
participants
’
data (see Table S4B for statistics), whereby each dot is an individual
participant (random effect). Dark colored bars represent the ME-GLM coef
fi
cients
from data generated by each of the 5 models using participants
’
best-
fi
tting para-
meters, then showing the mean effect for each group using that group
’
sbest-
fi
tting
model. Grey bars similarly represent the mean ME-GLM coef
fi
cients from model-
generated data, but after assigning participants into the 5 groups at random then
using each group
’
s corresponding model. Error bars represent standard errors of
the ME-GLM coef
fi
cients.
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
9
(Fig.
7
E: OL uncertainty * strategy * group, F(4,1467) = 33.50,
P
< 0.001,
η
p
2
= 0.084; Fig.
7
F: EL uncertainty * strategy * group, F(4,1467) = 97.38,
P
< 0.001,
η
p
2
= 0.210). This showed that the in
fl
uence of uncertainty
trials on the signature of EL vs OL further varied across groups. More
speci
fi
cally, OL uncertainty trials primarily in
fl
uenced the effect of past
partner action (OL signature) and did so more strongly for individuals
in the ObsLearn and arbitration groups. In contrast, EL uncertainty
trials primarily in
fl
uenced the effect of past outcome (EL signature)
and did so more strongly for individuals in the dynamic arbitration
group, compared to all other groups. This last result is particularly
noteworthy as it suggests that it is mostly EL arbitration (i.e. arbitration
driven by EL uncertainty) that differentiates between the
fi
xed and
dynamic arbitration groups, rather than OL arbitration (i.e. arbitration
driven by OL uncertainty), which seems to be exhibited in both arbi-
tration groups.
Relevance of groups for psychopathology
Having established that the groups de
fi
ned based on model-
fi
tting
displayed the expected differences in behavioral signatures of interest
(learning, reliance on OL vs EL, and arbitration), we set out to explore
whether the
fi
ve groups also differed in meaningful ways on a range of
transdiagnostic symptom dimensions. Given our hypothesized link
between strategy used and symptom dimensions relevant to anxiety,
social anxiety, and autism, we collected four questionnaires (State-
Trait Anxiety Inventory, Beck Depression Inventory, Liebowitz Social
Anxiety Scale, and Social Responsiveness Scale, see Methods for
details). To extract underlying symptom dimensions and reduce col-
linearity between summary scores on those scales, we
fi
rst ran a factor
analysis on the individual item scores from the questionnaires, pooled
across the two studies to ensure suf
fi
cient power to run the factor
analysis (
N
=568).We
fi
rst determined the optimal number of factors
Fig. 7 | More extreme signatures of arbitration in the dynamic
arbitration group. A
,
B
An index of arbitration was calculated as the difference in
the propensity to choose according to observational (OL) vs experiential (EL)
learning between trials where OL should be most favored and trials where EL should
be most favored
–
see green arrows on Fig. S2A, B for an illustration. This arbitration
index was then calculated and averaged separately for each group, in Study 1
(
A
,N
Baseline
=25,N
ExpLearn
=21,N
ObsLearn
=40,N
FixArb
=14,N
DynArb
= 26) and Study 2
(
B
,N
Baseline
=83,N
ExpLearn
=24,N
ObsLearn
= 95, N
FixArb
=160,N
DynArb
= 131). Sig-
ni
fi
cance was assessed through a linear regression predicting arbitration index
from group and controlling for covariates, followed by a two-sided t-test to spe-
ci
fi
cally compare the dynamic arbitration and
fi
xed mixture groups (
A:
t(17.23) = 4.74, 95% CI [0.252, 0.656],
P
< 0.001;
B:
t(277.44) = 7.15, 95% CI [0.172,
0.303],
P
< 0.001). Each dot is an individual participant.
C
–
F
The random effects
obtained from the analyses presented in Fig.
3
C, D were averaged separately for
each group, for Study 1 (
C
,
D
) and Study 2 (
E
,
F
), thus showing the effect of OL
uncertainty trial type (
C
,
E
) and EL uncertainty trial type (
D
,
F
)ontheGLMeffectsof
past partner action (OL effect; top) and past outcome (EL effect; bottom). See
Table S4C, D for GLM statistics. For each manipulation, the difference between high
and low uncertainty trials is also depicted, allowing for a direct comparison of each
arbitration signature between groups. Error bars represent SEM for each group.
Two-sided t-tests were run to speci
fi
cally test whether the dynamic arbitration
group (magenta lines) exhibited a more extreme signature of arbitration than the
fi
xed arbitration group (blue lines), with signi
fi
cant differences observed in the
effect of OL uncertainty on OL in Study 1 (t(32.38) = 2.81, 95% CI [0.103, 0.644],
P
=0.008,
C
), the effect of EL uncertainty on both strategies in Study 1 (OL:
t(37.02) =
−
5.39, 95% CI [
−
0.323,
−
0.147],
P
<0.001,
D
top; EL: t(36.29) = 4.86, 95% CI
[0.196, 0.476],
P
<0.001,
D
bottom), and in Study 2 (OL: t(261.11) =
−
11.08, 95% CI
[
−
0.458
−
0.273],
P
<0.001,
F
top; EL: t(288.18) = 4.67, 95% CI [0.103, 0.253],
P
<0.001,
F
bottom).
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
10
by running the factor analysis from 1 to 20 factors and selected the
number of factors among this set that provided the lowest BIC (Fig.
S9A). We found that 8 factors provided the best-
fi
tting solution
(BIC =
−
59393, BIC difference from other number of factors > 14,
Tucker-Lewis index = 0.745, Root Mean Square Error of Approximation
= 0.042,
fi
t = 0.967). The factor loadings suggest the following trans-
diagnostic symptom interpretation associated with each factor (Fig.
S9B). Factor 1 re
fl
ected depressive symptoms, with highest loadings on
all BDI items, as well as some of the STAI-Trait items such as failure,
unhappiness, and dissatisfaction. Factor 2 re
fl
ectedheightenedsocial
anxiety, loading on a majority of items from the LSAS. Factor 3
re
fl
ected autism-like traits, loading on a majority of items from the
SRS. Factor 4 re
fl
ected state anxiety symptoms, loading mostly on
STAI-State items. Factor 5 re
fl
ected poor social responsiveness, load-
ing speci
fi
cally on positively scored SRS items. Factor 6 loaded on
items from both SRS and LSAS that re
fl
ect social group avoidance.
Factor 7 re
fl
ected trait anxiety, loading mostly on STAI-Trait items that
relate to disturbing or obsessive thoughts. Finally, Factor 8 re
fl
ected
traits associated with performance anxiety, loading most strongly on
LSAS items such as acting, speaking up, reporting to a group, etc.
Factors were allowed to correlate given the
oblimin
rotation, but cor-
relations between factors remai
ned low enough to ensure unique
variance attributed to each factor, ranging from R(568) = 0.08 to
R(568) = 0.56 (Fig. S9C). These correlations were overall much lower
than the correlations between the questionnaire scores, ranging from
R(568) = 0.45 (between STAI-State and LSAS) to R(568) = 0.82
(between STAI-Trait and BDI), thus justifying the factor analysis
approach to better identify separate symptom dimensions.
To assess whether the
fi
ve groups, de
fi
ned based on their
learning strategy on the task, differed on those 8 symptom dimen-
sions, we ran a linear mixed-effects model predicting the factor
scores (each factor representing a symptom dimension) from an
interaction between symptom dimension and group, including a
random intercept, and controlling for gender, age, education, ICAR
score, as well as study group (given that we pooled data from both
studies). We found a signi
fi
cant interaction (F(28,3948) = 2.38,
P
< 0.001,
η
p
2
= 0.017, Fig.
8
A
–
E, Table S5A), suggesting that the
groups differed in their symptom dimensions. For comparison
purposes, results from the same analyses with the 5 questionnaire
summary scores, instead of the 8 separable symptom dimensions,
are shown in Table S5B.
Post-hoc tests using R
’
s
emmeans
function to compute marginal
means highlighted the following drivers of the interaction. First, we
examined differences in symptom dimensions within each group,
using Tukey method p-value adjustment for comparing a family of 8
estimates (the number of symptom dimensions), revealing signi
fi
cant
effectsintwoofthegroups.Inthebaselinegroup(Fig.
8
A), individuals
were characterized by high autistic traits, poor social responsiveness,
and low trait anxiety (autistic traits vs trait anxiety: estimate=0.527,
t(3983) = 4.47,
P
=0.002,
d
= 0.648, 95% CI [0.364, 0.933]; social
responsiveness vs trait anxiety: estimate=0.411, t(3983) = 3.49,
P
=0.012,
d
= 0.506, 95% CI [0.221, 0.790]). In the dynamic arbitration
group (Fig.
8
E), individuals were characterized with the opposite pat-
tern, that is low autistic traits, good social responsiveness, but high
trait anxiety (autistic traits vs trait anxiety: estimate =
−
0.340,
t(3983) =
−
3.51,
P
= 0.011,
d
=
−
0.418, 95% CI [
−
0.652,
−
0.185]; social
responsiveness vs trait anxiety: estimate =
−
0.300, t(3983) =
−
3.10,
P
=0.041,
d
=
−
0.369, 95% CI [
−
0.603,
−
0.135]). Second, we ran the
complementary analysis, examining differences between groups for
each symptom dimension, using Tukey adjustment for the 5 groups.
We found differences in autistic traits between the baseline and
fi
xed
arbitration groups (t(3070) = 3.37,
P
=0.007,
d
= 0.509, 95% CI [0.213,
0.805], Fig.
8
A vs D), and between the observational learning and the
fi
xedarbitrationgroup(t(3095)=
−
2.73,
P
=0.049,
d
=0.381, 95% CI
[0.108, 0.655], Fig.
8
C vs D). Groups also differed on trait anxiety,
speci
fi
cally between the baseline and dynamic arbitration groups
(t(2847) = 4.05,
P
< 0.001,
d
= 0.642, 95% CI [0.331, 0.953], Fig.
8
AvsE).
Overall, this suggests that the
fi
ve model-based groups can be
differentially characterized along two symptom dimensions: autistic
traits and trait anxiety (Fig.
8
F).Wenotethatthetwosymptom
dimensions were positively correlated across participants
(R(568) = 0.16,
P
< 0.001, Fig.
8
G), such that on average across the
entire sample, individuals with high autistic traits also tend to score
high on trait anxiety. Yet, we
fi
nd that groups, especially the baseline
and dynamic arbitration groups, differ signi
fi
cantly on these dimen-
sions, suggesting that our model-based classi
fi
cation can help separate
symptom dimensions that tend to coexist in the population.
Discussion
Our aim in this study was two-fold:
fi
rst, to test a computational
account of reliability-driven arbitration between two domains, namely
experiential learning (EL) and observational learning (OL); and second,
to characterize the heterogeneity in strategy use, both in key sig-
natures of behavior and in transdiagnostic symptom dimensions
relevant to affective and social function.
To address the
fi
rst aim, we designed a task in which the reliability
of EL and OL were manipulated by means of changes in uncertainty
conditions, resulting in key trials that could be clearly classi
fi
ed as high
EL reliability trials, low OL reliability trials, and vice versa. Behavioral
fi
ndings indicated that people clearly modulated their behavior in an
expected way according to the reliability of each strategy, favoring EL
when EL reliability was high and OL reliability was low, and favoring OL
when OL reliability was high and EL reliability was low. Computational
modelling con
fi
rmed this
fi
nding, showing that those participants who
were best
fi
t by our proposed dynamic arbitration model also exhib-
ited the greatest reliability-driven modulation of behavior. Reliability
in our model was de
fi
ned using absolute prediction errors associated
with each strategy as an index of uncertainty (or unreliability). This
arbitration signal is consistent with the algorithmic and neural imple-
mentation of mixture of experts models in the literature
20
,though
future work is needed to further exp
lore whether other implementa-
tions of reliability could perform better. In particular, this could be
achieved through more optimized task designs that fully allow dis-
tinguishing between different reliability computations, which was
outside the scope of the current study. Our analyses do however
provide insights into how this dynamic arbitration mechanism differs
from a
fi
xed mixture model, which was originally proposed in early
investigations of model-based/model-free arbitration during EL
52
,
53
.
Not only did model recovery analyses show that the two arbitration
schemes can be clearly differentiated, but behavioral signatures
associated with each model pointed towards more
‘
extreme
’
signature
of uncertainty-driven arbitration between EL and OL. In sum, a learner
using a
fi
xed mixture model will still be sensitive to trial-by-trial
changes in uncertainty, since those variations will be captured in the
value difference, and hence in the choice probabilities; however, using
the proposed dynamic arbitration mechanism helps push this sensi-
tivity to the extreme, leading to im
proved performance. Consistent
with cross-domain arbitration, and with previous literature showing
that humans do integrate social and experiential information when
learning and making decisions
15
,
26
–
32
,
54
,our
fi
ndings also suggest that
the
fi
xed and dynamic arbitration groups (a substantial proportion of
our sample) performed this task by integrating the predictions of both
EL and OL. Only one of these studies in particular demonstrated the
possibility of a dynamic, volatility-driven, arbitration between indivi-
dual and social learning
54
. Although the individual learning used in that
study was similar to our EL model in the current study (outcome of a
binary lottery), the social learning component was quite different
(learning from advice, rather than learning from observing another
person
’
schoices).Our
fi
ndings thus further extend the concept of
arbitration, via a reliability-weighted mixture of experts
20
,toapply
Article
https://doi.org/10.1038/s41467-024-48548-y
Nature Communications
| (2024) 15:4436
11