Can the Brain Strategically Go on Automatic Pilot?
The Effect of If
–
Then Planning on
Behavioral Flexibility
Tim van Timmeren
1,2
, John P. O
’
Doherty
3
, Nadza Dzinalija
4
, and Sanne de Wit
1
Abstract
■
People often have good intentions but fail to adhere to
them. Implementation intentions, a form of strategic planning,
can help people to close this intention
–
behavior gap. Their
effectiveness has been proposed to depend on the mental
formation of a stimulus
–
response association between a trig-
ger and target behavior, thereby creating an
“
instant habit.
”
If implementation intentions do indeed lead to reliance on
habitual control, then this may come at the cost of reduced
behavioral flexibility. Furthe
rmore, we would expect a shift
from recruitment of corticostriatal brain regions implicated
in goal-directed control toward habit regions. To test these
ideas, we performed a fMRI study in which participants
received instrumental training supported by either implemen-
tation or goal intentions, followed by an outcome revaluation
to test reliance on habitual versus goal-directed control. We
found that implementation intentions led to increased effi-
ciency early in training, as reflected by higher accuracy, faster
RTs, and decreased anterior caudate engagement. However,
implementation intentions did not reduce behavioral flexibility
when goals changed during the test phase, nor did it affect
the underlying corticostriatal pathways. In addition, this study
showed that
“
slips of action
”
toward devalued outcomes are
associated with reduced activ
ity in brain regions implicated
in goal-directed control (ventromedial prefrontal cortex and
lateral orbitofrontal cortex) and increased activity of the
fronto-parietal salience network (including the insula, dorsal
anterior cingulate cortex, and SMA). In conclusion, our behav-
ioral and neuroimaging findings suggest that strategic if
–
then
planning does not lead to a shift from goal-directed toward
habitual control.
■
INTRODUCTION
At the start of the new year, many people reflect on their
future plans and form resolutions. However, they often
fail to put their good intentions into practice (Sheeran &
Webb, 2016). Strategic
“
if
–
then
”
plans, also known as
implementation intentions, are an effective way to support
the translation of intentions to actions. For example,
instead of formulating an abstract plan such as
“
I want to
lose weight,
”
an implementation intention links the
intended action to a specific cue or situation, for example,
“
If I get home, I will eat an apple,
”
thereby enhancing the
probability of success. Indeed, many studies have shown
that implementation intentions support behavior change
better than goal intentions that merely specify the
intended action or outcome (Gollwitzer & Sheeran,
2006). In addition to increasing attention to the relevant
cue, the effectiveness of if
–
then planning has been
proposed to rely on creating a strong associative link
between the stimulus (S) in the if-part (
“
home
”
) and the
response (R) in the then-part (eat an apple), in a manner
akin to habits acquired throu
gh behavioral repetition
(Dickinson, 1985; Thorndike, 1911). These mentally
formed S
–
R associations may allow for automatic action
initiation (Gollwitzer, 2014)
—
a process often referred to
as strategic aut
omaticity or
“
instant habits
”
(Gollwitzer,
1993, 1999, 2014).
The notion that merely using a verbal action-plan could
be sufficient to form a habit is fascinating, because a
central assumption in theories of habit formation is that
this process critically depends on behavioral repetition.
Support for the idea that implementation intentions
accelerate habit formation comes from research showing
that they increase (self-reported) automaticity (Orbell &
Verplanken, 2010; Parks-Stamm, Gollwitzer, & Oettingen,
2007; Brandstätter, Lengfelder, & Gollwitzer, 2001).
Therefore, implementation intentions lead to benefits
in terms of efficient goal attainment (Gollwitzer, 2014;
Gollwitzer & Sheeran, 2006). However, habits developed
through behavioral repetition also come at a cost,
namely, decreased behavioral flexibility (Dickinson,
1985). The question arises, therefore, if the use of imple-
mentation intentions also leads decreased flexibility when
goals change. This can be i
nvestigated using the
outcome-devaluation test, a
n experimental paradigm
originally used in rats (Ada
ms & Dickinson, 1981) and
1
University of Amsterdam, The Netherlands,
2
Utrecht Univer-
sity, The Netherlands,
3
California Institute of Technology, Pasa-
dena,
4
Amsterdam UMC, Location VUmc, The Netherlands
© 2023 Massachusetts Institute of Technology. Published under a
Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Journal of Cognitive Neuroscience 35:6, pp. 957
–
975
https://doi.org/10.1162/jocn_a_01990
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
later translated to humans (de Wit, Corlett, Aitken,
Dickinson, & Fletcher, 2009; de Wit, Niry, Wariyar, Aitken,
& Dickinson, 2007; Valentin, Dickinson, & O
’
Doherty,
2007). In this task, participants first learn to make a
response to obtain a reward. Subsequently, the value of
the outcome associated with that response is devalued,
and the ability to flexibly adapt responding to this change
in outcome value is measured during an extinction test.
Sensitivity to outcome devaluation suggests that behavior
is based on knowledge and evaluation of their conse-
quences, and therefore under goal-directed control. If
implementation intentions lead to
“
instant habits,
”
then
we would predict reduced sensitivity to outcome devalu-
ation, reflecting a shift from goal-directed toward more
rigid, habitual control (de Wit et al., 2018; Balleine &
O
’
Doherty, 2010).
Wehavepreviouslytestedthishypothesis(vanTimmeren
& de Wit, 2022), using a computerized symmetrical
outcome-revaluation task (SORT; Watson, Gladwin,
Verhoeven, & de Wit, 2022). Participants learn to make a
response (go) to certain ice cream vans to collect valuable
ice creams (and points) or to withhold a response (no-go)
to other ice cream vans delivering nonvaluable ice creams
(and a reduction of points). To investigate the effect of if
–
then planning, we instructed them to use verbal imple-
mentation intentions for half of the stimuli and use goal
intentions for the other half. In the subsequent test phase,
some outcome values changed (i.e., outcome revalua-
tion). Whereas participants should continue to respond
according to the learned S
–
R mappings on value-
congruent trials (i.e., still-valuable and still-not-valuable),
they should flexibly adjust their behavior on value-
incongruent trials (i.e., devalued and upvalued). The
results of this previous study suggest that the use of imple-
mentation (compared with go
al) intentions facilitates
instrumental learning, but also impairs performance when
some of the signaled outcome values change during the
test phase (van Timmeren & de Wit, 2022). This detrimen-
tal effect of if
–
then planning was observed across value-
congruent and incongruent trials, suggesting that it was
not mediated by strengthened S
–
R associations (as this
would have impacted the value-incongruent trials specifi-
cally). Instead, this result may have been driven by
reduced goal-directed contro
l. Investigating the neural
processes underlying implem
entation intentions may
offer us a window on the underl
ying (goal-directed vs.
habitual) processes.
To this end, in the present study, we used fMRI to inves-
tigate the neural correlates of if
–
then planning of instru-
mental responses on the SORT. We capitalized on current
insights regarding the neural basis of goal-directed and
habitual control to investigate the notion that if
–
then
planning gives rise to
“
instant habits.
”
Decades of animal
research have provided detailed insights into the neurobi-
ology of goal-directed and habitual actions, demonstrating
that they are causally supported by anatomically distinct
but interacting corticostriatal systems (Balleine, 2019;
Balleine & O
’
Doherty, 2010; Yin, Knowlton, & Balleine,
2004). These findings are mirrored by (correlational)
neuroimaging evidence in humans, albeit less consis-
tently. Specifically, previous fMRI studies have found that
goal-directed control is supported by the ventromedial
prefrontal cortex (vM
PFC) and caudate whereas
outcome-insensitive habitual actions depend on the pre-
motor cortex and posterior putamen/dorsal striatum
(Watson, van Wingen, & de Wit, 2018; Delorme et al.,
2016; Morris, Quail, Griffiths, Green, & Balleine, 2015;
de Wit et al., 2012; Tricomi, Balleine, & O
’
Doherty, 2009;
Valentin et al., 2007).
The present study is the first fMRI investigation with the
SORT, and we will therefore start with specifying our pre-
dictions regarding the general
pattern of neural activity
independent of intentions. First, we expected that over
the course of training (i.e., habit acquisition) activity
would increase in regions associated with habitual control
whereas the involvement of regions implicated in goal-
directed control would decrease (Zwosta, Ruge, Goschke,
& Wolfensteller, 2018; Liljeholm, Dunne, & O
’
Doherty,
2015; Tricomi et al., 2009). Second, we expected neural
activity during training in these regions to be predictive
of revaluation insensitivity in the test phase (Watson
et al., 2018; Zwosta et al., 2018; Liljeholm et al., 2015; de
Wit et al., 2009). Third, in line with previous work (Watson
et al., 2018; Valentin et al., 2007), we hypothesized that, in
the test phase, we would find higher activity in areas impli-
cated in goal-directed action, cognitive control, and
response conflict when participants flexibly updated their
responses and equal (if anything reduced) activity in habit-
related regions. Finally, we expected that
“
slips of action
”
would be associated with higher activity in habit regions
and reduced activity in goal-directed regions (Watson
et al., 2018).
Our central aim was to investigate the neural basis of
implementation intentions and their effect on behavioral
flexibility. To this end, we measured neural activity related
to the effect of implementation intentions on acquisition
and flexible adjustment of instrumental actions on the
SORT. We hypothesized that the use of implementation
intentions (compared with goal intentions) during train-
ing would lead to increased habit acquisition as reflected
by higher accuracy, increas
ed automaticity (measured
with the Self-Reported Behavioral Automaticity Index;
Gardner, Abraham, Lally, & de Bruijn, 2012), and increased
brain activity in habit regions and equal
—
or if anything
reduced
—
activity in goal-directed regions. Moreover, we
expected if
–
then planning to lead to increased reliance on
previously formed S
–
R associations in the subsequent test
phase as indicated by inflexible, habitual responding on
value-incongruent compared with value-congruent trials,
and higher activity of habit regions during the test phase.
Finally, we expected that overcoming mentally rehearsed
S
–
R associations (as part of an if
–
then plan) would require
more goal-directed control and correspondingly engage
related neural regions.
958
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
METHODS
All operationalizations, exclusion criteria, and main hypoth-
eses and analyses were preregistered on Open Science
Framework (
https://osf.io/yrpxa
).
Participants
Participants were recruited through the participant
portal of the University of Amsterdam Web site, flyers,
andwordofmouth.Weusedthefollowinginclusion
criteria: age 16
–
35 years, not having previously partici-
pated in a previous study using this same task, and any
contraindications for MRI. Data collection took place
between July and November 2020. Note that this is during
the first year of the COVID-19 outbreak; however, no strict
lockdowns were implemented during this period in The
Netherlands. The study was approved by the Psychology
ethics committee of the University of Amsterdam and
performed in accordance with those guidelines. All
participants gave informed consent and received either
course credit or financial compensation (15
A
/hr) for
their time (total
∼
2 hr). An additional
A
20 voucher was
given to the participant with the highest score to moti-
vate participants to perform well on the task.
Forty-seven participants were enrolled, conforming to
our preregistered sampling
plan. Our sample size was
based on a previous pilot study, which found a significant
effect of implementation intentions in 35 participants using
the same task and manipulation. Moreover, a power analy-
sis with G*Power (Version 3.1.9.3) showed that our target
sample size of
n
= 40 should be sufficient to detect a small
behavioral effect (
f
= 0.12) with an
α
level of .05 and
power of .8. Six participants were excluded from all analy-
ses. One participant quit half-way through participation,
and five participants were excluded based on performance
exclusion criteria (see Results for details). The remaining
41 participants (22 women, 19 men) had a mean age of
23.2 (
SD
= 4.1) years. All participants had normal or
corrected-to-normal vision, and all were right-handed
except one who was ambidextrous. All participants were
free of neurological or psychiatric disorders and completed
or were enrolled in higher professional education at the
time of participation, the vast majority being university stu-
dents. Two participants were native Germans who spoke
Dutch fluently; all others were native Dutch speakers.
Stimuli and Materials
Procedure
Participants performed a computerized instrumental learn-
ing task called the SORT (Figure 1; Watson, Gladwin, et al.,
2022), programmed in Presentation (Version 18.1). Partic-
ipants played a hungry skateboarder with the objective to
collect ice creams to earn points and satisfy their hunger
by pressing a response button. They were informed that
the best performing participant at the end of the study
would receive a
A
20 voucher. Four pictures of ice creams
were used: a Cornetto, a Magnum, a Rocket ice lolly, and a
soft serve ice cream. The task consisted of three phases.
First, participants conducte
d an instrumental training
phase without strategic planning outside the scanner, after
which they were moved to the MRI scanner and performed
an instrumental training phase with strategic planning
followed by a test phase (see Figure 1). The symmetrical
nature of the task stems from the inclusion of both valuable
and nonvaluable outcomes, which allows comparisons in
the test phase (when outcome values change) between
the value-congruent and val
ue-incongruent conditions
tobemadewiththesameresponsetype(seeWatson,
Gladwin, et al., 2022, for a more elaborate discussion on
the advantages of this task). The total experiment took
∼
2 hr, of which 1 hr was spent in the scanner.
The task used here is almost identical to a previous
study in which we tested the same hypothesis behaviorally
(van Timmeren & de Wit, 2022), apart from the following
changes. To minimize head movements, we used a static
version of the task here instead of having ice cream trucks
moving across the screen. We added one block of practice
with strategic planning before being moved to the scanner,
in order for participants to once read the intentions out
loud and be able to ask questions. Moreover, we adapted
the task to promote stimulus
–
outcome (S-O) learning
across intention conditions, to rule out that any effect of
implementation intentions on behavioral flexibility would
be mediated by reduced contingency knowledge, as was
the case in the original behavioral study (van Timmeren
& de Wit, 2022). To this end, we changed the way in which
theblockswerecomposedinthefirstpartoftraining
(i.e., without intentions): Instead of alternating between
two sets of four ice cream vans, each block now contains
four (out of eight) pseudorandomly selected stimuli (see
Instrumental Training section for details). More than with
the block-sets, participants are now forced to pay atten-
tion to all outcomes in the value-screen and evaluate for
which stimulus they should (not) make a response.
Instrumental Training
At the start of the task, participants were instructed that
their goal was to collect valuable ice creams (which earn
points and alleviate hunger) and avoid collecting nonvalu-
able ice creams (which lose points and cause stomach
pain) by (not) responding to ice cream vans. There were
four different ice creams, and before each block of instru-
mental training, participants were shown which two ice
creams were valuable (in green) and which two ice creams
were not valuable (in red; Figure 1A). The position of the
valuable and nonvaluable ice creams (left/right) was coun-
terbalanced across participants. Each ice cream was associ-
ated with two out of eight vans (Figure 1B): one van always
predicting this ice cream as being valuable and the other as
being nonvaluable. Each block contained only half of the
vans: two associated with a valuable ice cream and two with
a nonvaluable ice cream. Participants were told to find out
van Timmeren et al.
959
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
by trial and error which ice cream truck delivered which ice
cream, and that the S-O contingencies would remain the
same throughout the whole task. Participants first prac-
ticed with different discriminative stimuli (scooters) and
outcomes (pizzas) for two blocks to familiarize them with
this procedure. As mentioned previously, the composition
of the blocks (i.e., which four out of eight vans were pre-
sented during this block) was now pseudorandomized.
The conditions described above allow for six unique com-
binations of four vans, which were presented twice each
Figure 1.
Overview of the study and experimental design. Participants were told they were playing a hungry skateboarder and their goal was to
collect some ice creams and not others to earn points. (A) Participants first received instrumental training. Each block started with a value-screen
(represented by the black rectangle), followed by a block of 16 training trials (see B). Each block contained four vans (pseudorandomly selected).
Training then continued with participants additionally using implementation intentions (trained with Van-Set A) or goal intentions (trained with
Van-Set B; see C), with intention instructions (see B) being presented before each instrumental learning block. Finally, participants completed si
x test
blocks in which all eight vans (Van-Sets A and B) would appear intermixed and consequently the associated outcome-values of some vans changed
compared with training (see C, comparing the
“
Train
”
vs.
“
Test
”
columns). (B) Train trial: When a van was presented, participants had to decide
whether to make a response within 500 msec, after which the ice cream appeared (irrespective of a response) on top of the van for 500 msec. Test
trial: identical to train blocks, but now (i) a banner appeared on top of the van instead of the ice cream to prevent feedback about the outcome
(i.e., nominal extinction) and (ii) response time was reduced to 450 msec. Value screen: The outcome-value screen indicates which ice creams should
(in green) and should not (in red) be collected. Intention instructions: Vans were trained with either implementation intentions, indicating for wh
ich
ice cream
van
they should or should not make a response, or goal intentions, indicating for which
ice cream
they should (not) make a response.
(C) An overview of stimulus-outcome contingencies (example set) and associated values across different phases of the task. The contingencies
between each ice cream and van remained consistent throughout the whole task, but the
value
of each ice cream (and hence the associated
response) was stable only during training. During the critical test phase, the associated outcome values changed (were incongruent) relative to
the training value for half of the stimuli (indicated by arrows). This results in four conditions: still-valuable trials (valuable, congruent), upv
alued trials
(valuable, incongruent), still-not-valuable trials (nonvaluable, congruent) and devalued trials (nonvaluable, incongruent). For example, the
first
van always delivered a Rocket, which was valuable throughout training but no longer valuable during test (i.e., devalued). Shown here is an example
of the contingencies in one of six test blocks; across the test phase, the correct response for each stimulus was equally often congruent and
incongruent. Deval = Devalued; still val = still-valuable; upval = upvalued; still not = still-not-valuable trials.
960
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
(order randomized) during this first part of training for 12
blocks. The contingencies between ice creams and vans
and which of the ice creams wa
s valuable/nonvaluable
were randomized across participants.
Each stimulus was shown 4 times per block, constituting
16 trials. Trial order was randomized per eight trials, with
each van being presented twice in the first and twice in
the second half of a block. Each t
rial started with a jittered
1- to 5-sec intertrial interval. Participants were instructed
that they should respond as quickly as possible and before
the deliverer disappeared (after 500 msec). Irrespective of
the response, the associated outcome was then presented
for 500 msec. Thus, participants did not receive direct
feedback about the accuracy of their response to balance
the feedback provided for valuable and nonvaluable out-
comes and to promote goal-directed (R-O) learning and
S-O knowledge. Each block ended with a 3-sec feedback
screen that displayed accuracy and late responses in that
block and total number of poi
nts collected (Figure 1D).
Instrumental Training with Intentions
The next phase of training took part in the MRI scanner.
Participants were told that instead of seeing which ice
creams were valuable or nonvaluable, each block would
now start with sentences that would help them perform
well. These sentences cam
e in two different forms
(Figure 1D).
Goal intentions
indicated for each
ice
cream
whether they should make a response (R-O), for-
mulized as
“
If I see [picture of an ice cream], then I WILL
press.
”
Implementation intentions
indicated for each
ice
cream van
if they should make a response or not (S
–
R),
formulized as
“
If I see [picture of an ice cream van] then I
WILL (NOT) press.
”
Each intention was presented for
2500 msec and twice per intention block (randomized
order). Half of the stimuli were trained using goal and
the other using implementation intentions. Each block
of verbal intentions was directly followed by a block of
instrumental training (ident
ical to the previous phase)
with the corresponding stim
uli. Blocks now alternated
between two sets of vans, one van-set being trained with
implementation intentions (S1
–
S4,
“
Van-Set A
”
) and one
with goal intentions (S5
–
S8,
“
Van-Set B
”
). Whether the
training started with an implementation or goal intention
block was counterbalanced acr
oss participants. At the
end of regular instrumental training and before being
moved to the scanner, participants practiced each verbal
intention without instrumental training for one block,
followed by two blocks (one for each intention type) with
instrumental training. During these first few practice
blocks outside the scanner, participants were asked to
read the intentions out loud. During the subsequent 24
blocks of training with intentions in the scanner, partici-
pants were instructed to subvocalize the intentions
instead of reading them out loud to minimize head
motion. Participants entered the scanner in a head-first
supine position and were able to view the screen using
a mirror attached to the head coil on which the task stim-
uli were presented. A button box allowed them to collect
ice creams by responding using their right index finger.
At the end of training with intentions, participants
completed a questionnaire on subjective automaticity
(Self-Report Behavioral Automaticity Index [SRBAI]) and
were tested on their S-O knowledge (details below;
Figure 1E). We had planned to additionally obtain a (pre-
intention) baseline measure of these questionnaires, but
because of a programming error, they were presented
after
the practice blocks with intentions, making them
unusable as a baseline measure.
Test Phase
Participants completed six test blocks. The test phase
was similar to the first training phase (without intentions),
but with some important differences. First, as intention
blocks were no longer presented, value-screens were again
shown at the start of each block, for the duration of 4 sec.
Second, participants were told that the ice cream deliverers
placed a banner on top of their van, blocking the view of
the ice cream they delivered (i
.e., nominal extinction).
Because each van still kept on delivering the same ice
cream as during training, they should base their choice
on what they learned before. Third, the feedback screens
presented at the end of each block no longer included
information on the accuracy of their responses, but only
the percentage of responses, nonresponses, and late
responses. We did this to prevent outcome-based learning
during the test phase. We explicitly instructed participants
that each block contained an equal amount of valuable and
nonvaluable outcomes so they knew they should aim for a
50%/50% distribution. Fourth, we shortened the response
window to 450 msec to force rapid responding, which has
been shown to boost the expression of habitual slips
(Hardwick, Forrence, Krakauer, & Haith, 2019). However,
because a lot of participants responded just after the
450-msec time limit, we decided to include responses up
to 600 msec for both the behavioral and fMRI analysis to
increase the number of included trials in the fMRI analyses.
This change did not significantly impact the pattern of
behavioral results, which was unsurprising as the test phase
was conducted in extinction, meaning that no performance
feedback was provided during this period. Finally and cru-
cially, participants were informed that the final phase
would be more challenging because all eight ice cream
vans would appear intermixed during each block. The cru-
cial consequence of each block containing all eight stimuli
is that half of the vans would now deliver an ice cream with
a value incongruent with the value during training. Some
ice cream vans for which they had been trained to always
make a go response during training, now delivered a (deva-
lued) ice cream that should not be collected. Vice versa,
other ice creams vans had carried nonvaluable outcomes
during training, but their signaled outcome was upvalued
and therefore required a go response. On other (value-
van Timmeren et al.
961
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
congruent) test trials, the signaled outcome remained the
same (i.e., still-valuable and still-not-valuable trials).
Consider for example the Rocket ice cream in Figure 1C.
In this example, during training, this ice cream is always deliv-
ered by the van with a purple star and the van with the pink
circle. During training blocks with the van with the purple
star, the Rocket is valuable and therefore requires a go
response. In contrast, during training blocks with the van
with the pink circle, the Rocket is not valuable, and partici-
pants should refrain from pressing the space bar (i.e., no-go
response). Subsequently, during the test block all (ice cream
van), stimuli are presented, and in the example illustrated in
Figure 1C, the Rocket is instructed to be currently not valu-
able. This means that the van with the purple star signals a
devalued outcome (i.e., this is value-incongruent with train-
ing and requires a different response), and the van with the
pink circle signals a still-valuable outcome (i.e., value congru-
ent; the learned response remains correct).
SRBAI
The SRBAI (Gardner et al., 2012) is a 4-item scale that cap-
tures self-reported habitual behavior patterns that we
adapted for to assess automa
ticity for (not) responding
to the ice cream vans. Participants were presented with
each ice cream van and asked to indicate the associated
response (press or not press) and the degree to which
(not) making a response was something they did:
“
auto-
matically,
”“
without having to consciously remember,
”
“
without thinking,
”
and
“
beforeIrealizeIamdoingit.
”
Each item was scored on a scale ranging from 1 (
strongly
disagree
)to100(
strongly agree
). The SRBAI scale was
previously shown to have good
reliability and validity
(Gardner et al., 2012). Before the four SRBAI items
appeared, participants were asked to indicate which
response was associated
with that stimulus (
“
making a
response
”
/
“
not making a response
”
)totestS
–
R knowl-
edge. Cronbach
’
s alpha was calculated separately for each
of the four conditions (2 intentions × 2 values), using the
eight test items (four SRBAI questions for the two stimuli
per condition). The results indicate high internal reliabil-
ity, with alpha ranging from .91 to .95. The final score was
calculated separately for each intention by taking the
mean across the four items (range: 1
–
100), with higher
scores reflecting more automatic behavior.
Test of Stimulus
–
Outcome Knowledge
Participants were asked about their knowledge of the S-O
contingencies by asking them for each ice cream vans
which ice cream it delivered. After selecting one of the four
ice creams, participants were asked to indicate how confi-
dent they were about their decision (0
–
100). Composite
scores, reflecting S-O knowledge, were calculated for each
intention and separately for go- and no-go-trained stimuli
by multiplying percentage of correct S-O contingencies
(0%/50%/100%) with percentage mean confidence.
Preregistered Behavioral Data Analysis
Behavioral data analyses were performed using IBM SPSS
Statistics 25 for Mac for frequentist statistics and JASP
Version 0.16.3 (JASP Team, 2018) for Bayesian statistics.
Fordataanalysispurposes,thetrainingdatawerecol-
lapsed across blocks of three, referred to as block-sets.
Accuracy is reflected by the percentage of trials on which
a correct response was made, calculated by the number of
correct responses divided by the total number of trials. In
line with the fMRI analyses, trials on which a late response
was made were not included in the analyses (of both accu-
racy and RTs). To assess that learning took place over the
first part of the training without intentions, accuracy was
analyzed using a 2 × 4 repeated-measures ANOVA with
within-subject factors Val
ue (valuable or nonvaluable)
and Block-set (1
–
4). The second part of training was ana-
lyzed using a 2 × 2 × 4 repeated-measures ANOVA, with
Intention Type (implementation or goal intention) as an
additional factor. RTs for correct responses (and thus only
for valuable go trials) were analyzed with similar ANOVAs.
For the test phase, data were analyzed using a 2 × 2 × 2
repeated-measures ANOVA with three factors: Intention
Type (implementation or goal intention), Test Value (valu-
able or nonvaluable during test), and Congruency (congru-
ent or incongruent with value during training). Thus, for
each intention type there are four conditions: still-valuable
trials (valuable, congruent), upvalued trials (valuable, incon-
gruent), still-not-valuable trials (nonvaluable, congruent),
and devalued trials (nonvalua
ble, incongruent). Again,
RTs (including all responses up to 600 msec) were analyzed
using similar ANOVAs but now also analyzing responses on
no-go trials (i.e., responses on still-not-valuable and deva-
lued trials). Note that eight participants were excluded from
the no-go analyses because they performed perfectly on
still-not-valuable trials and thus did not make any response.
Subjective automaticity (SRBAI scores) for responding
to stimuli trained with implementation and goal intentions
at the end of training was compared using a paired
t
test.
Finally, the relationship between automaticity and the
“
revaluation insensitivity
”
index was tested for both
intention types separately using correlational analyses. A
revaluation insensitivity index was calculated for each
intention type by taking the difference between accuracy
for congruent and incongruent test trials separately for go
(still-valuable minus devalued) and no-go-trained stimuli
(still-not-valuable minus upvalued), with higher revalua-
tion insensitivity scores indicating more habitual perfor-
mance. Kendall
’
stauwasusedasthefourrevaluation
indices, and SRBAI scores were not normally distributed.
In the case of violations of sphericity, we report Green-
house
–
Geisser corrected degrees of freedom and
p
values. In addition to 95% confidence intervals, partial
eta squared (
η
p
2
) for the ANOVAs and Cohen
’
s
d
for paired
t
tests are reported as estimates of effect sizes.
We additionally conducted corresponding Bayesian
analyses. For null results (
p
> .05), as preregistered, we
962
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
report the Bayes Factor
01
(BF
01
), which quantifies the rel-
ative evidence in favor of the null hypothesis (H0) over the
alternative hypothesis (H1). For ANOVAs, we report the
BF
excl
, which quantifies the extent to which the data sup-
port inclusion of the factor of interest in the model (i.e.,
the change from before posterior inclusion odds, across
matched models). Finally, although we interpret signifi-
cant findings on the basis of
p
< .05, we also report BFs
for comprehensiveness and transparency (i.e., BF
10
or
BF
incl
for ANOVAs, which quantify evidence in favor of
the alternative hypothesis over H0 and is identical to
1/BF
01/excl
). BFs were interpreted according to Table 1 in
Wetzels and colleagues (2011), with BFs between one and
three reflecting anecdotal support, BFs larger than three
reflecting substantial su
pport, and BFs larger than 10
reflecting strong support. In all Bayesian analyses, JASP
’
s
default priors (cauchy = 0.707 for
t
tests and
r
= 0.5 for
fixed and
r
= 1 for random effects for ANOVAs) were used.
MRI Data Acquisition
All MRIs were performed on a 3-Tesla, full-body Achieva
dStream MRI-scanner (Philips Medical Systems) equipped
with a 32-channel head coil. After entering the scanner, a
low-resolution survey scan was made to determine the
location of the field of view.
fMRI scans were acquired at a
∼
30° angle from the
anterior
–
posterior commissure li
ne to maximize signal
sensitivity in orbital regions (Deichmann, Gottfried,
Hutton, & Turner, 2003) using a T2*-weighted single-shot
gradient echo imaging sequence with the following
parameters: repetition time = 2000 msec; echo time =
28 msec; flip angle = 76.1°; voxel size = 3 mm
3
with
0.3-mm slice gap; matrix size = 80 × 78; number of
slices = 36; field of view = 240 × 118.5 × 240 mm. The
training with intentions was split in two runs of 598 scans
each, whereas 415 scans were acquired for the test phase.
The first six volumes of each run were discarded to allow
T1 saturation to reach equilibrium.
A high-resolution T1-weighted structural image was
acquired before the final run (while participants com-
pleted the post-training SRBAI and SO-test) using an
MPRAGE sequence with the following parameters: voxel
size = 1 mm
3
; field of view = 240 × 220 × 188 mm; rep-
etition time = 8.2 msec; echo time = 3.7 msec, 220 slices,
flip angle = 8°.
fMRI Data Analysis
Image Preprocessing
MRI data were first converted to Brain Imaging Data Struc-
ture format using in-house scripts. An initial check of data
quality was done by visually inspecting the image-quality
metrics derived from MRIQC v0.15.0 (Esteban et al.,
2017). Data were preprocessed using fMRIPrep v20.1.1
(Esteban et al., 2019; RRID:SCR_016216), which is based
on Nipype 1.5.0 (Gorgolewski et al., 2011; RRID:
SCR_002502), with the default processing steps. These
included brain extraction, seg
mentation, and surface recon-
struction of the structural T1
image; spatial normalization
of both the structural and functional data to MNI space;
andheadmotionestimation,coregistration,susceptibility
distortion correction, and resampling to 2 mm
3
of the
functional data. No slice-timi
ng correction was performed.
A comprehensive description of the preprocessing pipe-
line is available here:
https://osf.io/72bsh
.
fMRI Statistical Analyses
The preprocessed functional data were further analyzed
using Statistical Parametric Mapping software (SPM12,
Table 1.
Imaging Results of the Training Phase (Exploratory)
Contrast
Region
MNI Coordinates
(x, y, z)
Cluster Size
(Voxels)
z Score at
Peak Level
Correction
Increase over training
blocks (go)
Caudate nucleus head
22
6
30
443
4.37
Cluster
Amygdalo-hippocampal
junction
−
10
−
4
−
14
348
5.17
Peak
Angular gyrus
20
−
52
38
214
4.92
Peak
Posterior putamen
26
−
20
4
34
3.96
SVC Tricomi
Decrease over training
blocks (go)
Anterior caudate L
−
24
10
2
912
6.53
Cluster
Anterior caudate R
24
10
−
4
537
6.39
Cluster
Primary motor/SMA
8
−
24
60
860
5.44
Cluster
Hippocampus/putamen
43
14
−
8
657
4.83
Cluster
Temporal cortex L
−
46
−
46
−
4
591
5.69
Cluster
Goal > implementation
intentions block-set 1 (Go)
Anterior caudate
13
18
−
4
40
3.69
SVC striatum
SVC = small volume correction; L = left; R = right.
van Timmeren et al.
963
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
Wellcome Trust Centre for Neuroimaging). The data were
spatially smoothed using a Gaussian kernel with a FWHM
of 8 mm and all functional data was high pass filtered
(with a 128-sec cutoff) to remove slow signal drifts.
First-Level Analysis
For the first-level analysis of the fMRI data, a general linear
model was constructed for each participant, concatenated
over all three runs from the training and test phase. For the
data on training with intentions, trial onsets of valuable
stimuli and nonvaluable stimuli for implementation and
goal intentions were modeled using stick functions, mak-
ing four conditions. To look at the effect of time on train-
ing, these were modeled as separate regressors per three
blocks, making four training block-sets. Only correct trials
(i.e., where an accurate (non)response was made) were
included. Blocks of verbal rehearsal of implementation
and goal intentions were additionally modeled as blocks
of 28 sec (total duration of eight 3.5-sec trials). For the test
phase, stick functions modeled the trial onsets of still-
valuable and still-not-valuable (
“
value-congruent
”
;the
outcome value is congruent with training phase) and deva-
lued and upvalued (
“
value-incongruent
”
; the outcome
value is not congruent with training phase) stimuli that
were trained with impleme
ntation or goal intentions
separately, making eight regressors. To investigate BOLD
activity during habitual (c)omission errors (habitual
“
slips
”
in case of incongruent trials), separate regressors were
included for incorrect trials for all conditions. The follow-
ing regressors of no interest were included separately for
each run: one regressor for errors (only for training, as
test-errors/
“
slips
”
were modeled as regressors of interest)
and late trials, keypresses, feedback-displays, value-
screens (only for test phase), and six realignment parame-
ters capturing rotation and translation to correct for
residual participant motion. Three session constants were
included in the model. All onsets were then convolved
with the canonical hemodynamic response function, and
an autoregressive AR(1) model was used to correct for
serial correlations. The general linear model was regressed
against the fMRI data to generate parameter estimates for
each participant.
Regressor-specific first-level contrast images were cre-
ated for the training- and test-regressors modeling the dif-
ferent conditions of interest to construct the planned
second-level full factorial models. These contrasts of
parameter estimates were then entered into between-
subjects ANOVAs to generate group-level random-effects
statistics. To test for a difference in learning between
intention types, contrasts of parameter estimates of the
instrumental training phase were entered into a 2 × 4 ×
2 (Value × Block-set × Intention) factorial ANOVA. Fol-
lowing estimation of the second-level model,
t
tests were
specified by adding linear weights to each instrumental
training block-set, modeling increases over training as
[
−
1.5
−
0.5 0.5 1.5] and decreases as [1.5 0.5
−
0.5
−
1.5].
In addition, first-level contrast images were created. To
assesstheeffectofplanningduringtraining,contrastswere
created comparing training with implementation versus
goal intentions (across all blocks, separately for go and
no-go trials). To examine markers of goal-directed control
during test, we compared correct congruent trials with cor-
rect incongruent trials (i.e., [still-valuable go > upvalued
go] and [still-not-valuable no-go > devalued no-go]). We
also investigated situations where participants fail to adapt
to the new outcome value and continue to respond accord-
ing to the learned S
–
R association by comparing incorrect
incongruent trials (i.e.,
“
slips of action
”
)withcorrect
incongruent trials. Again, separate contrasts were created
for test-go- and test-no-go trials (i.e., [devalued go > upva-
lued go] and [upvalued no-go > devalued no-go]). Finally,
we also created a similar contrast comparing incorrect
incongruent trials (slips) with correct congruent trials
(i.e., [devalued go > still-valuable go] and [upvalued no-
go > still-not-valuable no-go]). More information about
the rationale behind these contrasts is provided in the
Results section. To assess the effect of planning strategy
on test performance, the same test-phase contrasts were
constructed but looking for an interaction with intention
type (e.g., [still-valuable go > upvalued go × implementa-
tion > goal intention]). Parameter estimates generated
from these first-level analyses were entered into a
random-effects group analysis, and linear contrasts were
used to identify significant effects at the group level.
Higher level whole-brain statistical maps were corrected
for FWE at the cluster-level (
p
FWE-cluster
< .05) with a voxel
cluster-defining threshold of
p
= .001 uncorrected. When
activations did not reach statisti
cal significance at the cluster
level, we also checked the peak
-voxel level with a threshold
of
p
< .05 corrected (
p
FWE-peak
< .05). In such cases, we
clearly indicate this in the text, and we report the peak-voxel
level results so as to be as comprehensive as possible in our
reporting. Finally, in an exploratory analysis, we further
aimed to test for effects in specific regions of the striatum
given prior published findings on the role of these struc-
tures in goal-directed and habitual responding (Watson
et al., 2018; de Wit et al., 2012; Tricomi et al., 2009; Tanaka,
Balleine, & O
’
Doherty, 2008; Valentin et al., 2007). In partic-
ular, we defined an anatomical ROI to examine effects in the
caudate nucleus, a region previously implicated in goal-
directed processes, as well de
fining a functional ROI based
on the results from Tricomi et al. (2009) that implicated the
posterior putamen in habit-related processing.
In addition, we identified several ROIs in our preregistra-
tion: for habitual control, goal-directed control, response
conflict, and implementation intentions. Three separate
masks were created based on these ROIs to apply small vol-
ume correction (SVC). Apart from a striatal ROI (encom-
passing the bilateral caudate, putamen, and NAcc from
the AAL atlas (Tzourio-Mazoyer et al., 2002); however,
applying SVC with the three preregistered ROIs did not
alter the pattern of results. This may be because of the large
number of voxels included in t
he ROIs (especially the goal-
964
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
directed mask) thereby reducin
g the sensitivity of the SVC.
Therefore, we have opted to stick to reporting the whole-
brain results for the confirmatory analyses. Whole-brain
t
-maps (without thresholding) of the main fMRI contrasts
are available at
https://neurovault.org
/collections/13191/
.
RESULTS
All analyses reported in this section were preregistered at
the start of this study, unless indicated otherwise in the
text. We generally followed the preregistered analysis plan,
butinsomecases,theresultspromptedustofurther
explore the data. We should also point out that we prere-
gistered these hypotheses before finishing data analysis of
our related behavioral study (van Timmeren & de Wit,
2022). Hence, we preregistered the same behavioral
hypotheses for this study, although the original behavioral
study only partially supported our initial predictions
—
a
point we will come back to in the discussion. We therefore
incidentally deviate from the preregistration to keep our
analyses in line with analyses and findings from the behav-
ioral study, which is always clearly indicated.
The total final sample used for the analyses consisted of
41 participants, after excluding the following participants.
On the basis of the preregistered exclusion criteria, no par-
ticipants were excluded on the training criterion (< 80%
accuracy in the last block-set of training), whereas three
were excluded because they made < 25% responses on
upvalued trials trained with goal intentions in the test
phase. The goal of this criterion was to ensure that partic-
ipants understood the test-phase instructions and updated
their performance accordingly, while not excluding partic-
ipants based on the manipulation of interest (i.e. imple-
mentation intentions). We additionally excluded two
participants (post hoc) based on a very low overall
response rate during the test phase. Although these partic-
ipants made (just) > 25% upvalued responses, we deviated
from the preregistration because they were outliers on the
overall response rate and responded on less than one out
of three trials during the test, despite receiving explicit
instruction to aim for a response rate of
∼
50% and receiv-
ing feedback about that at the end of each block. Hence,
they did not follow the test-phase instructions and their
performance is not reliable. Note that this criterion is
independent of actual task performance (accuracy) and
that the in-/exclusion of these two participants does not
change the general pattern of behavioral nor fMRI results.
Behavioral Results
Training Phase without Intentions
As expected, participants learned to make correct responses
over the first part of training (Figure 2A), as revealed by a
significant main effect of Block-set on accuracy,
F
(2.46,
98.20) = 16.74,
p
< .001
η
p
2
=.30,BF
incl
=2.81×10
5
,
and a marginally significant effect of Block-set on RT,
F
(2.45, 98.07) = 2.75,
p
= .058,
η
p
2
= .06, BF
incl
=0.81.
There was no significant difference in learning to make go
versus no-go responses (main effect of Value:
F
(1, 40) =
2.00,
p
= .17,
η
p
2
= .05, BF
excl
= 1.60; Block × Value inter-
action:
F
(1.70, 68.16) = .25,
p
=. 57,
η
p
2
= .01, BF
excl
=
22.15).
Instrumental Training with Goal versus
Implementation Intentions
Following the first 12 blocks of instrumental training
without planning, intentions were introduced during a
practice block (still outside the scanner). Although we
did not preregister to analyze those data, for completeness
and in line with our previous behavioral study with this par-
adigm investigating the same question (van Timmeren &
de Wit, 2022), we conducted a paired
t
test comparing
the final block of training without intentions to the practice
block. This analysis revealed that participants benefitted
from if
–
then planning on the valuable go trials, as reflected
by higher accuracy (
M
=96.1,
SD
= 12.4) relative to the
preceding (pre)training block-set (baseline:
M
= 91.8,
SD
=9.1,
Z
(40) = 2.57,
p
=.01,
d
= 0.59, 95% CI [.81,
.22], BF
10
= 1.34), whereas RTs were not affected,
t
(40) =
−
.01,
p
=.99,
d
=
−
0.001. In contrast, the use of goal
intentions negatively impacted both accuracy (
M
=87.6,
SD
= 14.7,
Z
(40) =
−
1.86,
p
=.065,
d
=
−
0.40, 95% CI
[
−
.01,
−
.69], BF
10
= 1.36) and RTs,
t
(40) =
−
2.03,
p
=
.049,
d
=
−
0.32, BF
10
= 1.08, of go trials compared with
(pre)training. For no-go trials, no significant effects of imple-
mentation,
Z
(40) = 1.03,
p
=.31,BF
01
= 5.12, or goal inten-
tion,
Z
(40) = .10,
p
=.93,BF
01
= 5.68, were seen.
Subsequently, when instrumental training was resumed
during the scanning session, the 2 × 2 × 2 repeated-
measures ANOVA indicated that the advantage of if
–
then
planning was initially still apparent on valuable go trials
(Figure 2A). In addition to a strong main effect of Value,
driven by participants performing better overall on valu-
able compared with nonvaluable trials,
F
(1, 84.47) =
10.93,
p
= .002,
η
p
2
= .22, BF
incl
= 18.08, we found the
expected preregistered three-way interaction between
Intention, Value, and Block-set,
F
(3, 103.14) = 6.45,
p
< .001,
η
p
2
= .14, BF
incl
= 857.7. Separate analyses of
valuable and nonvaluable trials revealed a significant
Intention × Block interaction for valuable,
F
(3, 81.78) =
6.21,
p
= .003,
η
p
2
= .13, BF
incl
= 74.01, but not for non-
valuable trials,
F
(3, 120) = 1.88,
p
= .14,
η
p
2
= .05, BF
excl
=
2.63. The significant effect on the valuable go trials was
driven by higher accuracy with implementation compared
with goal intentions duri
ng the first block-set,
Z
(40) =
3.34,
p
<.001,
d
= 0.85, 95% CI [.64, .94], BF
10
=
22.76. At the end of training (Block-Set 4), there was no
longer a significant effect of Intention Type on accuracy,
Z
(1, 40) =
−
.34,
p
= .80,
η
p
2
=
−
1.43, BF
01
= 5.87.
The analysis of RTs (Figure 2A) revealed a main effect of
IntentionType,
F
(1,40)=12.08,
p
=.001,
η
p
2
=.23,BF
incl
=
11.12, with faster responses during blocks trained with
implementation intentions (median = 365 msec,
SD
=
van Timmeren et al.
965
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
17) compared with goal intentions (median = 374 msec,
SD
= 20), but no significant effect of Block-set,
F
(2.4,
98.6) = 2.31,
p
= .09,
η
p
2
= .05, BF
excl
= 3.41, nor an inter-
action (
p
= .20,
η
p
2
= .04, BF
excl
= 3.67).
Symmetrical Outcome-Revaluation Test
As expected, learned S
–
R associations had a clear impact
on performance during the test phase (Figure 2B), as
revealed by a main effect of Congruence,
F
(1, 40) =
65.08,
p
<.001,
η
p
2
= .62, BF
incl
=1.39×10
7
.Because
test value showed significant interactions with both
Congruence,
F
(1, 40) = 10.73,
p
= .002,
η
p
2
= .21, BF
incl
=
8.91, and Intention Type,
F
(1, 40) = 5.94,
p
= .02,
η
p
2
=
.13, BF
incl
= 1.27, separate follow-up comparisons were
conducted for go (associated with still-valuable and
upvalued outcomes) and no-go (associated with still-
not-valuable and devalued outcomes) trials. Main effects
of Congruence were seen for both the go,
F
(1, 40) =
16.82,
p
< .001,
η
p
2
= .30,, BF
incl
= 76.40, and no-go,
F
(1, 40) = 56.46,
p
< .001,
η
p
2
=.59,BF
incl
=2.31×
10
6
, stimuli. As can be seen in Figure 2B, the congruency
effect was larger for no-go trials mainly because of partic-
ipants struggling more on devalued trials, where they had
Figure 2.
Behavioral results. (A) Over the course of training, participants learned to successfully respond for stimuli associated with valuable
outcomes (Go) and to withhold making a response for stimuli associated with nonvaluable outcomes (no-go), as reflected by increasing accuracy
rates. After six blocks of regular training, some stimuli continued to be trained using implementation intentions (blue) whereas others were traine
d
with goal intentions (blue). Following one block of practice (black dotted line), participants were moved to the scanner and resumed training
with intentions. Accuracy was significantly higher initially when using implementation intentions, but toward the end of training performance was
almost perfect for both implementation and goal intentions. Across training with intentions, participants were faster during blocks trained with
implementation versus goal intentions. (B) During the test phase, for some stimuli, the associated outcome changed in value (and thus response)
compared with training (upvalued and devalued; see Figure 1C) and participants had to flexibly update their responses accordingly. For other stimul
i,
the associated value and response remained congruent with training (still-valuable and still-bot-valuable). Participants responded less accura
tely for
incongruent compared with congruent trials, reflecting inflexibility as a consequence of learned S
–
R contingencies during training. However, training
with implementation intentions did not lead to reduced flexibility. Similarly, there was no significant effect of training with implementation int
entions
on RT. (Shaded) error bars represent standard error of the mean. II = implementation intentions; GI = goal intentions.
966
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
to suppress responding to discriminative stimuli that pre-
viously signaled a valuable
outcome. Importantly, we
were interested in the effect of implementation inten-
tions on test performance. First, an analysis of the go test
trials suggested that overall performance was worse when
trained with implementation compared with goal inten-
tions,
F
(1, 40) = 5.48,
p
=.02,
η
p
2
= .12, although Bayesian
statistics showed that this
evidence was inconclusive
(BF
incl
= 1.46). Importantly, in contrast to our preregis-
tered hypothesis, there was no evidence for reduced flex-
ibility as a consequence of if
–
then planning: The expected
interaction of congruence w
ith intention type failed to
reach significance,
F
(1, 40) = 1.52,
p
= .23,
η
p
2
=.04,
BF
excl
= 1.86. Given the direct relevance of the compari-
son between intentions for our research question, we
followed these analyses up with separate (exploratory)
paired
t
tests for still-valuable and upvalued trials to also
report Bayesian evidence agai
nst a difference. Findings
indicate that intentions only had a significant negative
effect on (congruent) still-valuable,
Z
(40) =
−
2.55,
p
=
.01,
d
=
−
0.56, BF
incl
= 3.68, but not on (incongruent)
upvalued trials,
t
(40) =
−
.75,
p
= .46, BF
excl
= 4.54.
Finally, for the no-go stimuli (still-not-valuable and deva-
lued), no main,
F
(1, 40) = .42,
p
= .52, BF
excl
= 4.37, nor
interaction,
F
(1, 40) = .06,
p
= .81, BF
excl
= 4.25, effects
of intention type were observed.
We also analyzed RTs during the test phase. A Value ×
Congruence interaction,
F
(1, 32) = 49.47,
p
< .001,
η
p
2
=
.61, BF
incl
=2.91×10
5
, prompted separate analyses for
trials trained with go responses (still-valuable and deva-
lued) and for trials trained with no-go responses (still-
not-valuable and upvalued). I
nterestingly, there was a
main effect of congruence for go-
trained stimuli, suggesting
significantly faster RTs on devalued trials (
M
= 418 msec,
SE
= 8.8) relative to still-valuable (
M
= 443 msec,
SE
=
6.8;
F
(1, 40) = 12.56,
p
= .001,
η
p
2
= .24, BF
incl
= 23.40),
in line with the idea that habitual slips of action are trig-
gered fast and efficiently before one has the chance to
suppress them. As late responses were excluded from
this analysis (following the accuracy analysis), we ran an
additional analysis including RTs for late responses to
make sure that this effect was not driven by a higher
number of (excluded) late responses on devalued trials.
This analysis showed an even stronger main effect of con-
gruence than the original analysis without late responses,
F
(1, 40) = 14.84,
p
< .001,
η
p
2
= .27, BF
incl
= 36.88. No
other significant effects of RTs were found (all
p
>.22,
BF
excl
> 1.74).
Self-reported Automaticity and S-O Knowledge
Self-reported automaticit
y was at a high level overall
(median = 80.4%,
SD
= 16.7), but did not differ between
intentions,
t
(40) =
−
.98,
p
= .34, BF
01
= 3.80, nor did sub-
jective automaticity correlate with revaluation insensitivity
for implementation (
r
τ
=
−
.09,
p
= .57, BF
01
= 4.39) or
goal intentions (
r
τ
= .22,
p
= .17, BF
01
= 2.03).
Following van Timmeren and de Wit (2022), we also
explored differences in S-O knowledge between intention
types and their relationship with overall test accuracy. S-O
knowledge was high (median = 89.8%,
SD
= 22.1) and,
contrary to our previous study, no longer differed signifi-
cantly between intention types,
F
(1, 40) = 2.07,
p
= .16,
η
p
2
= .05, BF
01
= 2.6; values,
F
(1, 40) = 3,42,
p
= .07,
η
p
2
=
.08, BF
01
= 2.4; or their interaction,
F
(1, 40) = .91,
p
= .35,
η
p
2
= .02, BF
01
= 5.88, suggesting that the adaptation we
made to the task (i.e., using a pseudorandom selection of
stimuli instead of alternating between two block-sets in
the first part of training, see Methods section) had the
desired effect. S-O knowledge did correlate positively with
test accuracy (across all four conditions) for both imple-
mentation intentions (
r
τ
=.30,
p
= .008, 95% CI [.08,
.52], BF
10
= 7.91) and goal intentions (
r
τ
= .39,
p
<
.001, 95% CI [.21, .57], BF
10
= 99.22).
Conclusions: Behavioral Results
We provide evidence for habit learning, as indicated by the
general effect of previously learned S
–
R mappings on the
ability to flexibly adapt responding when the cue signals a
revalued outcome (i.e., inco
ngruent). Importantly,
although if
–
then planning seemed to increase efficiency
relative to goal intentions, as reflected in superior acquisi-
tion, this was not at the expense of flexibility when out-
come values changed in the test phase.
Neuroimaging Results
Instrumental Training: Across Intentions (Exploratory)
First, we were interested to explore general learning
effects across intention types because this was the first
time the SORT was used in the MRI scanner. These
analyses showed that over the course of go training (i.e.,
on valuable trials), activity increased linearly in the head
of the caudate nucleus e
xtending into ACC (at
p
<.05
FWE rate corrected;
p
FWE-cluster
< .05). Activation in the
left amygdalo-hippocampal junction and the angular
gyrus did not reach our cluster-level correction thresh-
old, but did survive voxel-level correction at
p
<.05
(
p
FWE-voxel
< .05; Table 1). In this same contrast, we
also observed a cluster in the posterior putamen, which
survived a small-volume co
rrection for the posterior
putamen ROI (i.e.,
p
FWE
< .05 with SVC, defined as a
10-mm sphere at peak value of the cluster that showed a
significant increase over training in the study of Tricomi
et al. [2009];
x
= 33,
y
=
−
24,
z
= 0). On the other
hand, activity decreased over training in the bilateral
anterior caudate (a more ventral part of the striatum),
primary motor cortex (extending to mid-posterior cin-
gulate), hippocampus extending into the putamen, and
the left temporal cortex (all
p
FWE-cluster
<0.05cor-
rected). In contrast, on no-go trials, there were no vox-
els that showed a significant linear change over training
blocks.
van Timmeren et al.
967
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
Instrumental Training: Comparing Goal and
Implementation Intentions
We then examined whether strategic planning affected
instrumental training. The contrast comparing the aver-
age BOLD signal of trials trained with implementation
intentions and goal intentions did not reveal any signif-
icant activations, neither on go nor no-go trials. We
also tested for differences in learning between inten-
tions over the course of training by adding linear
weights to block-sets to compare increased activity
over block-sets during implementation intentions with
decreased activity during goal intentions, and vice versa.
However, both tests of this interaction failed to show sig-
nificant differences.
The finding that implementation intentions showed the
most pronounced effect behavi
orally early in training
prompted us to conduct an exploratory analysis of only
the first training block-set. This analysis revealed signifi-
cantly decreased activation in the anterior caudate (
p
FWE
<
.05 with SVC,
z
= 3.69) on trials trained with implementa-
tion intentions compared with goal intentions (Figure 3A
and Table 1). For visual purposes, the extracted average
BOLD signal from the anterior caudate cluster is shown
separately for each block-set and intention in Figure 3B.
As can be seen here, activity was indeed lower on imple-
mentation intention trials during the first block-set only
and subsequently decreased for both intentions. A
whole-brain analysis also showed decreased activity for
implementation relative to goal intentions at an uncor-
rected threshold (
p
< .001) in the right lateral orbitofron-
tal cortex (OFC;
p
FWE-cluster
= .061,
z
=4.25;
x
= 26,
y
=
50,
z
= 14) and the left insula (
p
FWE-cluster
= .28,
z
=3.76;
x
=
−
42,
y
= 20,
z
= 2). However, because these results
did not survive FWE-correction, we refrain from interpret-
ing them further. To rule out that these findings were
driven by RTs, which were significantly shorter for imple-
mentation compared with goal intentions, we performed
an additional analysis controlling for trial-by-trial RT by
including a parametric regressor (one for each of the
two training runs) with RTs for each trial. This had no sig-
nificant impact on the results, and we could qualitatively
replicate all reported findings.
Neural Predictors of Test Performance
To determine whether brain activity during instrumen-
tal training with implementation intentions was pre-
dictive of test performance, we tested whether the
average BOLD signal during training covaried with
the revaluation insensitivity score. This preregistered test
did not reveal significant neural predictors of test perfor-
mance. For completeness, we also exploratively ran this
analysis separately for goal intentions and across inten-
tions, but this similarly did not reveal any significant
results.
Symmetrical Outcome-Revaluation Test: Markers of
Goal-directed versus Habitual Performance
In the test phase, changes in outcome value create con-
flict between goal-directed control and learned S
–
R asso-
ciations. Specifically, to perform the correct response on
incongruent trials (i.e., upvalued go and devalued no-
go), participants have to ex
ert goal-directed control
and override the learned S
–
R mapping. Conversely, on
congruent trials (still-valuable go and still-not-valuable
no-go), participants can rely on the learned S
–
R associa-
tions. The advantage of the symmetrical outcome-
revaluation test (compared with the original slips of
action test) is that we can compare congruent and
incongruent trials with each other unconfounded by test
outcome value (and therefore required response: i.e., go
or no-go). Therefore, to examine markers of goal-
directed control, we firstly compared upvalued go with
still-valuable go responses and found that this was asso-
ciated with increased right insula activity (
p
FWE-cluster
<
.05,
z
= 4.16; Table 2). No significant activations were
seen in the contrast between devalued no-go and still-
not-valuable no-go trials.
To identify regions where participants fail to adapt and
continue to respond according to the learned S
–
R
Figure 3.
Lower activity in the right anterior caudate early in training
for implementation compared with goal intentions. (A) Voxels that
showed significantly lower activation during the first block-set of
training with implementation compared with goal intentions on go-trials
(at
p
FWE
< .05, small volume-corrected). The activity patterns shown
are thresholded at
p
< .001 uncorrected. (B) Parameter estimates
extracted from this anterior caudate cluster (peak at
x
= 13,
y
= 18,
z
=
−
4) over block-sets. Error bars represent 95% confidence intervals.
a.u. = arbitrary units.
968
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
association, we contrasted incorrect incongruent trials
(devalued go and upvalued no-go) to correct incongruent
trials (upvalued go and devalued no-go, respectively), as
the latter arguably require most goal-directed control to
override the learned S
–
R mapping. The contrast compar-
ing devalued go responses (i.e., slips of action) with
upvalued go responses is shown in Figure 4A, and
revealed increased activity in a fronto-parietal network,
including the left anterior insula extending to the inferior
lateral prefrontal cortex, SMA, dorsal anterior cingulate
cortex, bilateral inferior parietal lobule, and supramargi-
nal gyrus (all
p
FWE-cluster
< .05; Table 2). Conversely,
lower activity during slips of action compared with upva-
lued go responses was seen in the left anterior cingu-
late cortex extending into caudate nucleus, left lateral
OFC, bilateral superior parietal lobe, and several
occipital/primary visual areas (all
p
FWE-cluster
<.05,
Table 2). In addition, activation in the premotor/primary
motor cortex did not survive cluster-level correction but
did reach peak-voxel level significance (
p
FWE-voxel
< .05).
Although the previous contrast between devalued slips
and correct upvalued go responses maximizes the differ-
ence between habitual versus goal-directed control, the
conditions differ in terms of
the original training out-
come value (as well as test value). To mitigate this, we
proceeded to compare devalued slips to still-valuable go
responses, which only differ in their test outcome value.
Thus, this contrast compares trials on which participants
correctly continued responding according to the learned
S
–
R association with trials on which they failed to over-
ride this association. Although we have used the same
approach previously (in the study of Watson et al.,
Table 2.
Imaging Results of the Test Phase
Contrast
Region
MNI Coordinates
(x, y, z) Max
Cluster Size
(Voxels)
z Score
(Peak) Correction
Upvalued go > still-valuable go
Insula R
38
24
−
2
468
4.16
Cluster
Devalued slips > still-valuable go Anterior insula L
−
40
26
2
611
5.46
Cluster
Anterior insula R
42
26
−
10
621
4.49
Cluster
Still-valuable go > devalued slips vMPFC
22
42
−
4
388
4.64
Cluster
Caudate
8
28
2
NAcc
4
20
4
Primary motor cortex
−
26
12
60
252
5.10
Peak
Paracentral lobule
−
10
−
30
66
336
4.56
Cluster
Angular gyrus L
−
30
−
52
52
1653
5.51
Cluster
IPL L
Angular gyrus R
38
−
50
58
2510
5.33
Cluster
IPL R
Occipital cortex
−
36
−
74
8
767
5.25
Cluster
Devalued slips > upvalued go
Anterior insula L
−
36
26
−
8
707
4.29
Cluster
SMA
8
8
64
378
5.47
Cluster
dACC
8
18
34
431
4.18
Cluster
Inferior parietal lobule L
−
56
−
42
34
269
4.60
Peak
Inferior parietal lobule R
56
−
4
44
331
4.44
Cluster
Supramarginal gyrus
−
36
26
−
8
707
4.29
Cluster
Upvalued go > devalued slips
ACC, caudate nucleus
−
20
22
18
327
4.17
Cluster
Premotor/PMC
−
26
0
42
529
4.80
Cluster
Lateral OFC
−
32
62
0
317
4.19
Cluster
Superior parietal love L
−
28
−
76
36
1482
4.12
Cluster
Superior parietal love R
30
−
62
38
4099
5.04
Cluster
Occipital/visual cortex
−
30
−
96
16
1307
6.43
Cluster
L = left; R = right; NAcc = nucleus accumbens; IPL = inferior parietal lobule; (d)ACC = (dorsal) anterior cingulate cortex; PMC = primary motor
cortex; OFC = orbitofrontal cortex.
van Timmeren et al.
969
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023
2018, the
“
slips versus respond valuable
”
contrast), this
contrast was not preregistered and should thus be con-
sidered exploratory. Simil
ar to the comparison with
upvalued go responses, this comparison of slips with
still-valuable go responses revealed increased anterior
insula activity (bilaterally) during slips, but decreased
activity in vMPFC (extending to NAcc), primary motor
cortex, paracentral lobule, a large occipital cluster, and
large parietal clusters (bila
teral) including the angular
gyrus and inferior parietal lobule (all p
FWE-cluster
<
0.05; Figure 4A).
As preregistered, we also compared upvalued no-go
responses (
“
inhibition slips
”
) to correct devalued (no-
go) trials, but this did not reveal any significant activation
patterns. Moreover, we were not able to conduct the
contrast between upvalued and still-valuable no-go trials,
because of the low number of omission errors on still-
valuable trials.
Our results thus identify the anterior insula as a com-
mon region associated with slips toward devalued out-
comes, as activity in this region was higher during slips
than during go responses toward upvalued and still-
valuable outcomes. However, both contrasts are con-
founded by expected value (the outcome value during
the test phase) as they both compare stimuli signaling a
nonvaluable outcome (devalued) with stimuli signaling a
valuable outcome (upvalued or still-valuable). To control
for this, we ran some additional exploratory analyses, com-
paring activity during devalu
ed slips with correct no-go
responses on devalued and still-not
-
valuable trials.
Although these contrasts are difficult to interpret by
themselves
—
they are themselves confounded by pressing
a button or not
—
looking at the overlap between all four
contrasts overcomes the value-related confounds and
hence could find a common process in the expression of
habits. To this end, we used ImCalc to create binary
images of all four contrasts thresholded at
t
(41) = 3.1
(equivalent to
p
< .001 uncorrected) and multiply them.
The result of this inclusive masking analysis, which is akin
to a conjunction analysis, shows that the bilateral anterior
insula was commonly activat
ed across all four contrasts
(Figure 4B).
Symmetrical Outcome-Revaluation Test: Comparing
Goal and Implementation Intentions
None of the planned contrasts comparing test-phase trials
trained with implementation with goal intentions revealed
significant activation patterns.
DISCUSSION
The aim of the present study was to investigate whether
the brain can strategically go on automatic pilot. We inves-
tigated this by measuring the impact of strategic planning
(i.e., implementation intentions vs. goal intentions) on the
acquisition of instrumental actions as well as subsequent
flexible, behavioral adjustment. When strategic planning
was first introduced during the instrumental learning
phase of our paradigm, implementation intentions
improved performance relative to goal intentions. Fur-
thermore, in line with the idea that their beneficial effect
was mediated by accelerated S
–
R learning, an exploratory
analysis revealed that implementation intentions were
associated with reduced activity in the anterior caudate,
a brain area previously implicated in goal-directed control
(Watson et al., 2018; Liljeholm, Tricomi, O
’
Doherty, &
Balleine, 2011). These effects of strategic planning on per-
formance and neural activity were only apparent early in
training, with participants reaching high levels of accuracy
(and reduced activity in the anterior caudate) by the end of
the learning phase independent of intention type. Our
central question, however, was whether implementation
intentions would actually impede performance when
flexible, behavioral adjustment was required during the
subsequent outcome-revaluation test. Importantly, we
found no evidence for a detrimental effect of strategic
planning on the ability to adapt behavior to changing
outcome values, nor any effect on underlying neural activ-
ity patterns. We conclude that strategic planning of S
–
R
mappings may allow people to go on automatic pilot to
Figure 4.
(A) Neural correlates of slips of action in the test phase,
as revealed by increased (red
–
yellow) and decreased (dark
–
light
blue) activity during devalued slips compared with upvalued responses.
Clusters that survived whole-brain FWE correction include increased
activity in a fronto-parietal network, including the left anterior insula
extending to the inferior lateral pFC, SMA, dorsal anterior cingulate
cortex, bilateral inferior parietal lobule, and supramarginal gyrus.
Conversely, lower activity was seen in the left anterior cingulate
cortex extending into caudate nucleus, premotor/primary motor
cortex, left lateral OFC, bilateral superior parietal lobe, and several
occipital/primary visual areas. Results are shown here at
p
< .001
(uncorrected) for visual purposes, overlaid on the mean T1 image
of all participants. (B) The bilateral anterior insula was found to be
commonly activated during devalued slips (
x
= ±40,
y
= 26,
z
= 2).
Shown here in yellow are the voxels that overlap between all four
contrasts comparing devalued slips relative to correct (non-)responses
during still-valuable, still-not-valuable, devalued and upvalued trials
(thresholded at
p
< .001 uncorrected).
970
Journal of Cognitive Neuroscience
Volume 35, Number 6
Downloaded from http://direct.mit.edu/jocn/article-pdf/35/6/957/2082881/jocn_a_01990.pdf by California Institute of Technology (Caltech) user on 25 July 2023