Title: Encoding of predictive associations in human
1
prefrontal and medial temporal neurons during Pavlovian
2
conditioning
3
Authors:
Tomas G. Aquino
1
,
4
,
∗
, Hristos Courellis
2
, Adam N. Mamelak
4
,
Ueli Rutishauser
1
,
4
,
†
, John P. O’Doherty
1
,
3
,
†
Affiliations:
1
Computation and Neural Systems, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
2
Biological Engineering, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
3
Division of Humanities and Social Sciences, California Institute of Technology, 91125, USA
4
Department of Neurosurgery, Cedars-Sinai Medical Center, 90048, USA
†
Joint senior authors
∗
To whom correspondence should be addressed; E-mail: taquino@caltech.edu.
4
Classification:
Biological Sciences, Neuroscience; Social Science, Psychological and Cognitive
5
Sciences
6
Keywords:
Pavlovian conditioning, ventromedial prefrontal cortex, amygdala, medial frontal
7
cortex, model-based learning, human single neurons.
8
1
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint
Abstract: Pavlovian conditioning is thought to involve the formation of learned as-
9
sociations between stimuli and values, and between stimuli and specific features of
10
outcomes. Here we leveraged human single neuron recordings in ventromedial pre-
11
frontal, dorsomedial frontal, hippocampus and amygdala neurons while patients
12
performed a sequential Pavlovian conditioning task containing both stimulus-value
13
and stimulus-stimulus associations. Neurons in the ventromedial prefrontal cor-
14
tex encoded predictive value along with the amygdala, but also encoded predictions
15
about the identity of stimuli that would subsequently be presented, suggesting a role
16
for neurons in this region in encoding predictive information beyond value. Un-
17
signed error signals were found in dorsomedial prefrontal areas and hippocampus,
18
potentially supporting learning of non-value related outcome features. Our findings
19
implicate distinct human prefrontal and medial temporal neuronal populations in
20
mediating predictive associations which could partially support model-based mech-
21
anisms during Pavlovian conditioning.
22
Significance statement:
Pavlovian conditioning is a fundamental form of learning, allowing or-
23
ganisms to associate stimuli and outcomes. Recent Pavlovian work suggests that phenomena such
24
as devaluation sensitivity and sensory preconditioning can be explained by a model-based learning
25
framework. How human neurons perform model-based learning during Pavlovian conditioning is still
26
an open question. We recorded single neurons from epilepsy patients during a two-step Pavlovian
27
conditioning task and found that ventromedial prefrontal neurons encoded expected rewards along
28
with amygdala neurons, but also predicted the identity of upcoming stimuli as required for model-
29
based cognition. Additionally, medial frontal neurons were found to encode error signals that could
30
be used for stimulus-outcome learning. This is the first study mapping model-based computations
31
during Pavlovian conditioning in human neurons.
32
Main Text:
33
1 Introduction
34
In Pavlovian conditioning, an animal learns to make predictions about future affectively significant
35
events such as an appetitive outcome so as to facilitate prospective behavioral adaptations in the form
36
of conditioned responses.
1, 2
Elucidating the nature of the associations that underpin the acquisition
37
of Pavlovian conditioned responses, and of the computational mechanism by which such associations
38
are learned, has long been an active area of research at both behavioral and neural levels.
2–6
39
Perhaps the most influential theory of Pavlovian conditioning is the Rescorla-Wagner (RW) rule
40
and variants thereof. This class of model assumes that conditioning occurs via a prediction error (PE)
41
that reports on discrepancies between expectations associated with the conditioned stimulus (CS),
42
and the unconditioned stimulus (US) or experienced outcome.
7
This PE is used to update associative
43
weights, driving the degree of elicitation of a conditioned response (CR) by the CS. Such a model can
44
account for a wide variety of conditioning phenomena observed in behavioral experiments such as
45
blocking, over-expectation and conditioned inhibition.
8–10
Compelling empirical support for the role
46
of RW-type PEs in appetitive Pavlovian learning has emerged from the finding that the phasic activity
47
of dopamine neurons in the midbrain resembles a reward-specific prediction error, manifesting many
48
of the specific predictions of the RW model,
11–13
while also being causally relevant for driving learning
49
of appetitive Pavlovian conditioned responses.
14–16
50
However, the RW model has long been known to fail to account for numerous phenomena ob-
51
served in Pavlovian conditioning, such as sensory preconditioning
17, 18
and resistance of inhibitory
52
stimuli to extinction.
19
Moreover, many Pavlovian conditioned responses are known to be devalua-
53
tion or revaluation sensitive, in which changes in the value of an unconditioned stimulus can produce
54
immediate changes in the conditioned responses elicited to a corresponding CS,
20
inconsistent with
55
2
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint
the trial and error updating of conditioned responses learned via reward PEs. To underpin such behav-
56
ioral phenomena, other forms of associations have been suggested to be formed in the brain, including
57
stimulus-stimulus associations.
1, 21
These types of stimulus-stimulus associations can be viewed as
58
one of the constituent parts of ”model-based” Pavlovian learning, in which features about an outcome
59
such as predictions about its identity are used to drive on-line updating about value analogous to the
60
model-based algorithms thought to be important in instrumental conditioning.
22–25
61
Consistent with this richer associative framework, accumulating evidence suggests that the brain
62
encodes associative predictions about stimuli beyond just value. In particular, activity in orbitofrontal
63
cortex neurons in rodents has been found to signal changes in stimulus features such as stimulus
64
identity.
26
fMRI studies in humans also found evidence for encoding of predictive outcome identity
65
representations during reward learning in OFC and anterior dorsal striatum.
24, 25, 27, 28
66
Learning about stimulus features other than value is theoretically proposed to be mediated by a
67
kind of prediction error signal known as the state prediction error or identity prediction error.
23, 24, 28–31
68
This type of error signal is distinct from reward prediction error, and encodes discrepancies between
69
an agent’s expectation about which stimulus will occur and which stimulus actually occurs, irrespec-
70
tive of the amount of reward or value involved. Such error signals have been found to be present
71
in many brain areas in human fMRI studies, including a frontoparietal network encompassing dor-
72
somedial prefrontal cortex, as well as in dopaminergic regions of the midbrain, the OFC and stria-
73
tum.
23, 30, 32–35
In rodents, dopamine neurons themselves have been suggested to be sensitive to vio-
74
lations in predicted identity beyond value, and are even found to be necessary for the acquisition of
75
learned predictions about stimulus identity.
29, 36
76
The goal of the present study was to investigate neural representations of multiple forms of Pavlo-
77
vian associative learning at the cellular level in humans. A key limitation of previous studies on the
78
neural mechanisms of Pavlovian conditioning in humans to date has been that these studies have re-
79
lied on BOLD fMRI. While this technique can provide a bird’s eye view on the neural representations
80
present in different brain regions, its lack of spatial and temporal resolution precludes insight into the
81
underlying neuronal representations and computations taking place during Pavlovian learning. Prob-
82
ing activity at the level of single neurons is of fundamental importance both for understanding how
83
Pavlovian conditioning is implemented in the human brain, and for gaining insights at the cellular
84
level so that they can be compared to the large body of single-neuron work done in animal model
85
systems.
86
To accomplish this goal, we conducted single neuron recordings in patients undergoing intracra-
87
nial monitoring as part of the neurosurgical treatment for refractory epilepsy. Patients performed a
88
Pavlovian learning task in which pairs of visual stimuli (fractals) were presented in a sequence that
89
was associated with either the subsequent delivery of a reward outcome (a picture of a food item that
90
could subsequently be consumed) or a non-reward (no outcome). Across blocks, the particular stimuli
91
associated with each outcome were changed to dissociate predictive coding related to the identity of
92
a stimulus from coding of the predicted reward-value of the outcome associated with that stimulus.
93
Pairs of stimuli were presented in a sequence to test whether neurons encoded predictions about the
94
identity of the stimulus expected to occur next in that sequence. This design allowed for differen-
95
tiating between neurons encoding stimulus-stimulus relationships and stimulus-reward associations.
96
In addition, this paradigm afforded the opportunity to assess whether there exist neurons that encode
97
prediction error signals beyond the canonical reward prediction error, which could potentially play a
98
role in model-based learning.
99
We obtained simultaneous recordings from neurons in the ventromedial prefrontal cortex (includ-
100
ing neurons on the medial orbital surface) as well as the amygdala, two brain areas that have long
101
been suggested to play a role in value-based learning and Pavlovian learning in particular.
25, 37–46
We
102
also obtained recordings from the hippocampus, a brain region that has been previously implicated
103
in model-based inference,
28, 47, 48
as well as from the anterior cingulate cortex and pre-supplementary
104
motor cortex (preSMA) in dorsomedial frontal cortex, a region that has been previously found to
105
3
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint
prominently encode error signals.
49–53
106
We hypothesized that neurons in ventromedial prefrontal cortex and amygdala might play a role in
107
encoding predictive information about stimulus identity, consistent with a role for these structures in
108
stimulus-stimulus learning, and in model-based inference more generally.
18, 24, 25, 28
In addition to test-
109
ing for neurons correlating with stimulus identity we also tested for neurons encoding stimulus-value
110
associations. Furthermore, we tested whether neural correlates of state prediction errors that could
111
underpin learning of stimulus-stimulus associations in a model-based learning mechanism for Pavlo-
112
vian conditioning could be found at the single neuron level in any of the recorded brain structures.
113
We then sought to determine whether such neuronal signals could be clearly dissociated from reward
114
prediction errors or outcome signals. Finally, the fact that we could record from multiple structures
115
simultaneously provided us with the opportunity to gain insight into the network-level interactions
116
between neurons in these structures as a function of learning.
117
2 Results
118
2.1
Behavioral evidence of Pavlovian conditioning
119
We recorded 165 AMY, 119 HIP, 86 vmPFC, 137 preSMA, and 103 dACC single neurons (610
120
total) in 13 sessions from 12 patients implanted with hybrid macro/micro electrodes for epilepsy
121
monitoring (Fig. 1C). Patients performed a sequential Pavlovian conditioning task (Fig. 1A) with
122
two conditioned stimuli in the form of fractal images: distal (CSd), followed by proximal (CSp).
123
Conditioned stimuli were then followed by an outcome, which could be rewarding or neutral.
25, 54
124
Outcomes were delivered in the form of videos, either of a hand depositing a piece of candy in a bag,
125
or of an empty hand approaching a bag. Patients were told that every display of the rewarding video
126
contributed partially to the amount of real candy they would be given after the end of the session.
127
Patients were asked to pay attention to CS identities as they would be predictive of rewards. Subjects
128
were asked to indicate by button which side of the screen the CSp stimulus was shown to verify
129
attention, but were informed that their accuracy or speed of doing so did not influence the outcome of
130
the trial in any way. In each of the four blocks, a CSd/CSp pair would be more likely associated with
131
the reward, and were therefore defined as CSd+/CSp+ (see Materials and Methods for task details),
132
according to a common/rare transition structure (Fig. 1D). Conversely, the other CSd/CSp pair was
133
more likely to precede the neutral outcome, and are referred to as CSdn/CSpn.
134
To characterize value learning and stimulus predictions in this task, we fit a normative model-
135
based transition matrix model (see Materials and Methods for model details) to infer expected values
136
(EV), state prediction errors (SPE) and transition probabilities on a trial by trial basis, for each session
137
(Fig. 1E). With these transition probabilities, we could also estimate which CSp was most likely to
138
follow a CSd in each trial, which we refer to as
CSp presumed identity
. To infer whether Pavlovian
139
conditioning occurred across patients, we correlated the obtained model covariates with behavioral
140
metrics indicative of conditioning: stimulus ratings and pupil dilation.
141
Subjective preference ratings for all fractal images were obtained in the beginning of the task. Ad-
142
ditionally, between blocks, we asked patients to re-rate the fractals that were included in the previous
143
block to obtain measures of changes in subjective preference as a function of patients’ experience in
144
the task (Fig. 1B). When grouping CSd and CSp together, we observed that stimuli used as CS+ were
145
rated significantly higher than stimuli used as CSn (p = 0.02, one sided t-test, Fig. 1F). We also tested
146
whether distal and proximal stimuli had their ratings change by a different amount by contrasting
147
absolute rating changes for (CSd+,CSdn) versus (CSp+,CSpn), and found no significant difference
148
for distal vs. proximal stimuli (p=0.16, two-tailed t-test). Finally, we found that initial ratings for
149
stimuli which would be used as CSd+ and CSdn did not significantly differ (
p
= 0
.
91
, t-test), further
150
indicating stimulus rating changes were a consequence of experiencing the task.
151
We performed eye tracking simultaneously with single-neuron recordings to further assess the
152
4
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint
success of conditioning (see methods). Pupil diameter was analyzed in two distinct time windows:
153
during CSd presentation and CSp presentation (see Materials and Methods for details on pupil anal-
154
ysis). We obtained the average pupil diameter change within these periods, relative to a baseline,
155
and tested whether they correlated with model covariates with a linear mixed effects model, with ses-
156
sion number as a random effect. Specifically, the model for pupil diameter during CSd presentation
157
included the EV of the CSd, while the model for pupil diameter during CSp presentation included
158
CSp EV, SPE, and an interaction term between CSp EV and SPE. We found no effect for CSd EV
159
in the first model (p = 0.30). In the second model (Fig. 1E), there was no significant effect of CSp
160
EV (p = 0.07) nor SPE (p = 0.06), but there was a significant interaction between CSp EV and SPE
161
(p = 0.02). This result indicates that pupil diameter correlated with a combination of computational
162
factors inferred from the model-based framework. A similar interaction was previously observed in a
163
Pavlovian conditioning paradigm performed in a neurotypical population.
54
164
Overall, the aggregate behavioral evidence from changes in subjective stimulus ratings and pupil
165
sensitivity to an interaction of EV and SPE indicates collective evidence of Pavlovian conditioning
166
across our subject sample.
167
+
+
1s
1s
3s
3s
3s
Fixation
Distal CS
Fixation
Outcome
Proximal CS
a
d
c
e
A - X - R
B - Y - NR
A - Y - NR
B - X - R
Reward
No reward
Reward
No reward
Trial type
CSd
CSp
Outcome
CS+
CSn
-0.4
-0.2
0
0.2
0.4
Mean rating change
Change in Z-scored ratings
*
Frequency
37.5%
37.5%
12.5%
12.5%
CSd+
CSp+
CSdn
CSpn
CSd+
CSpn
CSdn
CSp+
-0.02
0
0.02
0.04
0.06
% Pupil diameter change
Pupil diameter at CSp
EV x SPE interaction:
p = 0.03
CSd+
CSd+
CSdn
CSdn
CSp+
CSp+
CSpn
CSpn
Extremely
pleasant
Extremely
unpleasant
b
-80
-60
-40
-20
0
20
40
60
MNI y Coordinate (mm)
-60
-40
-20
0
20
40
60
80
MNI z Coordinate (mm)
f
-80
-60
-40
-20
0
20
40
60
MNI y Coordinate (mm)
-60
-40
-20
0
20
40
60
80
MNI z Coordinate (mm)
vmPFC
HIP
AMY
dACC
preSMA
Figure 1: Pavlovian conditioning task and behavior. (a) Trial structure. After a fixation period, patients saw
a sequence of two conditioned stimuli, distal and proximal, with a 1s fixation period in between them. Then,
outcomes were presented: for positive outcomes, a video of a hand depositing a piece of candy in a bag;
for neutral outcomes, an empty hand approaching a bag. (b) Rating screen. Before the task and between
blocks, patients used a sliding bar to rate fractal stimuli according to their subjective preference, from extremely
unpleasant to extremely pleasant. (c) Electrode positioning. Each dot indicates the location of a microwire
bundle in amygdala (magenta), hippocampus (yellow), preSMA (red), dACC (blue), or vmPFC (green). (d)
Trial types. Stimuli transitioned from distal to proximal according to a common/rare probabilistic structure.
The same 2 fractals were used as CSp throughout the entire task, while new fractals were picked as CSd in
every block. (e) Pupil diameter change at CSp. We compared pupil diameter during CSp presentation with
a baseline period, for each trial type. Error bars represent SEM. (f) Changes in stimulus ratings. After each
block, patients rated stimuli for their subjective preference. We compared how ratings changed for each fractal
compared to its previous value, depending on whether they were a positive or neutral CS in that block.
2.2
Predictive identity coding in vmPFC
168
We next investigated whether firing rates in individual neurons correlated with task variables and
169
the estimated computational model-derived covariates using a Poisson GLM analysis (see Materials
170
5
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint
and Methods for details). We obtained spike counts for each neuron in the time windows that were
171
relevant for each regressed variable (e.g. counting spikes during outcome presentation for regressing
172
outcomes). After obtaining the number of neurons significant for a given criterion, we tested whether
173
the number of significant neurons in each brain region was more than expected by chance with a
174
binomial test (Bonferroni corrected for the number of tested brain areas).
175
We first regressed the CSp presumed identity (i.e. the most likely proximal stimulus identity)
176
with the firing rate of neurons at the time of CSd presentation (Fig. 2A) to test whether the most
177
likely identity of the next presented stimulus was already encoded by neurons at distal time.
11
.
5%
178
of vmPFC neurons encoded CSp presumed identity (
p <
0
.
05
, binomial test), indicating that vmPFC
179
neurons represent predicted stimulus identity in a stimulus-stimulus association context. On the other
180
hand, we found that no brain areas significantly encoded the actual identity of the proximal stimulus
181
at the time of proximal stimulus presentation (
p >
0
.
05
for all regions, binomial test).
182
For vmPFC neurons whose response signaled predictive identity coding, we summarized their
183
normalized firing rates over time, separated by trials containing their preferred versus non-preferred
184
identity (Fig. 2B). For each time point, we tested whether preferred versus non-preferred firing rates
185
were different, and found that they first differed 0.73s after CSd onset (
p <
0
.
05
, t-test, uncorrected).
186
An example neuron performing this type of encoding is shown in Fig. 2C. We also regressed the EV
187
of the presumed CSp at distal time and the actual CSp identity at proximal time with firing rate but
188
did not find a significant neuron count in any region (
p >
0
.
05
for all regions, binomial test).
189
We next turned to population level analysis to investigate if the joint activity patterns from all
190
neurons recorded simultaneously in a given brain area were predictive of the variables of interest.
191
All population level decoding analysis throughout this paper is performed session-by-session, only
192
utilizing neurons that were recorded simultaneously. We performed cross-validated population de-
193
coding analysis with a linear SVM, obtaining significance levels with a bootstrapped null distribution
194
(see Materials and Methods for details). We found that CSp presumed identity could be significantly
195
decoded at distal time in vmPFC (
p <
0
.
01
, permutation test), in consonance with the single neu-
196
ron selection result discussed above (Fig. 2D). These results further establish vmPFC neural activity
197
as a substrate for predictive coding during learning of stimulus-stimulus associations in Pavlovian
198
conditioning.
199
2.3 Medial frontal cortex and hippocampal neurons encode Pavlovian unsigned
200
error signals
201
At the population level, unsigned prediction errors (uPEs, correlating with the state prediction error
202
regressor generated by the model-based learning agent) were decodable at the time of outcome in
203
both dACC and preSMA (both
p <
0
.
01
, permutation test, Fig. 3A). Additionally, in the encoding
204
analysis of individual neurons, a significant number of neurons coding for uPEs at outcome were
205
present in dACC (
12
.
5%
,
p <
0
.
01
, binomial test), preSMA (
12
.
3%
,
p <
0
.
001
, binomial test), but
206
also hippocampus (
10
.
2%
,
p <
0
.
05
, binomial test, Fig. 3B). In order to determine whether neurons
207
responding to these uPE signals might be better explained by a signed PE signal, we also included
208
a regressor corresponding to the value of the outcome minus the expected value at outcome time,
209
corresponding directly to a signed PE signal. When including a regressor carrying this signal in the
210
same neural analysis from which the uPEs were found, we found no significant evidence for neural
211
encoding of signed PEs in either ACC or preSMA (or elsewhere) (Fig. 3C). Qualitatively similar
212
results were obtained even if generating the signed PEs via a standard temporal difference model
213
(not shown). We summarized the normalized firing rates for uPE coding neurons in dACC, preSMA
214
and hippocampus, (Fig. 3E) and determined that the first times in which preferred vs. non-preferred
215
uPE activity differed in each region were 0.71s, and 0.43s and 0.45s, respectively (
p <
0
.
05
, t-test).
216
Additionally, at the time of the outcome, we also found a significant proportion of neurons correlated
217
with the outcome itself (reward vs. neutral) in dACC, even when correcting for prediction errors
218
6
preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this
this version posted February 13, 2023.
;
https://doi.org/10.1101/2023.02.10.528055
doi:
bioRxiv preprint