of 38
Research Articles | Behavioral/Cognitive
Encoding of predictive associations in human prefrontal
and medial temporal neurons during Pavlovian
appetitive conditioning
https://doi.org/10.1523/JNEUROSCI.1628-23.2024
Received: 28 August 2023
Revised: 29 January 2024
Accepted: 19 February 2024
Copyright © 2024 the authors
This Early Release article has been peer reviewed and accepted, but has not been through
the composition and copyediting processes.The final version may differ slightly in style or
formatting and will contain links to any extended data.
Alerts:
Sign up at www.jneurosci.org/alerts to receive customized email alerts when the fully
formatted version of this article is published.
Encoding of predictive associations in human prefrontal and
medial temporal neurons during Pavlovian appetitive
conditioning
Pavlovian associations in human frontal neurons
Tomas G. Aquino
1
, Hristos Courellis
2
, Adam N. Mamelak
3
,
Ueli Rutishauser
3
,
4
, John P. O’Doherty
4
,
5
1
Biomedical Engineering, Columbia University, 10027, USA
2
Biological Engineering, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
3
Department of Neurosurgery, Cedars-Sinai Medical Center, 90048, USA
4
Computation and Neural Systems, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
5
Division of Humanities and Social Sciences,
California Institute of Technology, 91125, USA
Joint senior authors
To whom correspondence should be addressed; E-mail: tg2863@columbia.edu.
Manuscript structure
Number of pages: 37; Number of figures: 7; Number of tables: 4; Number of words in abstract: 136;
Number of words in introduction: 635; Number of words in discussion: 1433
Acknowledgements
We would like to thank the members of the O’Doherty and Rutishauser labs for discussions and
feedback. We thank all subjects and their families for their participation, as well as nurses and medical
staff for their work. This work was supported by National Institutes of Health Grants R01DA040011
and R01MH111425 (to J.P.O.), R01MH110831 and U01NS117839 (to U.R.), and P50MH094258 (to
J.P.O. and U.R.). The funders had no role in study design, data collection and analysis, decision to
publish or preparation of the manuscript.
Conflicts of interest
The authors state no conflicts of interest.
1
JNeurosci Accepted Manuscript
Abstract
Pavlovian conditioning is thought to involve the formation of learned associations between stimuli and
values, and between stimuli and specific features of outcomes. Here we leveraged human single neuron
recordings in ventromedial prefrontal, dorsomedial frontal, hippocampus and amygdala while patients
of both sexes performed an appetitive Pavlovian conditioning task probing both stimulus-value and
stimulus-stimulus associations. Ventromedial prefrontal cortex encoded predictive value along with
the amygdala, but also encoded predictions about the identity of stimuli that would subsequently
be presented, suggesting a role for neurons in this region in encoding predictive information beyond
value. Unsigned error signals were found in dorsomedial frontal areas and hippocampus, potentially
supporting learning of non-value related outcome features. Our findings implicate distinct human
prefrontal and medial temporal neuronal populations in mediating predictive associations which could
partially support model-based mechanisms during Pavlovian conditioning.
Significance statement
In this study, epilepsy patients implanted intracranially with depth microelectrodes performed a
Pavlovian conditioning task. We measured single neuron activity in ventromedial prefrontal cor-
tex (vmPFC), amygdala, and hippocampus, and found a representation of predictive identity coding
and Pavlovian expected reward values. Additionally, unsigned error signals were found in dorsomedial
frontal areas as well as hippocampus. This study thus provides a rare glimpse into the role of frontal
human neurons in predictive associative learning of values and stimulus identities.
Introduction
In Pavlovian conditioning, animals learn to predict outcomes and adapt their behavior in the form of
conditioned responses (Pavlov, 1927; Rescorla, 1988). Understanding the associations that underpin
the acquisition of Pavlovian conditioned responses has been an active area of research, neurally and
behaviorally (Rescorla, 1988; Schultz, 2006; Dayan and Berridge, 2014; Fanselow and Wassum, 2016;
O’Doherty et al., 2017).
An influential theory of Pavlovian conditioning is the Rescorla-Wagner (RW) rule and variants
2
JNeurosci Accepted Manuscript
thereof. These models assume that conditioning occurs via a prediction error signal (PE), indicating
discrepancies between expectations associated with the conditioned stimulus (CS), and the uncon-
ditioned stimulus (US) (Rescorla and Wagner, 1972). These models can account for a variety of
phenomena observed in conditioning experiments, such as blocking, over-expectation and conditioned
inhibition (Kamin, 1967; Rescorla, 1969; Lattal and Nakajima, 1998). Compelling empirical support
for the role of RW-type PEs in appetitive Pavlovian learning has emerged from the finding that the
phasic activity of dopamine neurons in the midbrain resembles a reward-specific prediction error,
manifesting many of the specific predictions of the RW model (Schultz et al., 1997; Tobler et al., 2003,
2006).
However, the RW model has long been known to fail to account for numerous phenomena ob-
served in Pavlovian conditioning, such as sensory preconditioning (Rescorla, 1980; Jones et al., 2012),
and revaluation sensitivity (Robinson and Berridge, 2013). To explain such behavioral phenomena,
other forms of associations have been suggested, including stimulus-stimulus associations (Pavlov,
1927; Rescorla, 1982). These associations act as "model-based" Pavlovian learning, in which features
about an outcome, such as predictions about its identity, are used to update values, analogous to
the model-based algorithms utilized in instrumental conditioning (Daw et al., 2005; Gläscher et al.,
2010; Howard et al., 2015; Pauli et al., 2019). Broadly speaking, model-based learning is a class of
reinforcement learning algorithms in which a probabilistic transition structure between environmen-
tal states underpins decisions (Sutton and Barto, 2018), which is hypothesized to support Pavlovian
conditioning (Dayan and Berridge, 2014).
Learning about stimulus features other than value is theoretically proposed to be mediated by a
kind of prediction error signal known as the state prediction error or identity prediction error (Gläscher
et al., 2010; Howard et al., 2015; Boorman et al., 2016; Takahashi et al., 2017; Howard and Kahnt,
2018; Boorman et al., 2021). This type of error encodes discrepancies between an agent’s expectation
about which stimulus will occur and which stimulus actually occurs, irrespective of the amount of
reward involved. Such error signals have been found to be present in many brain areas in human
fMRI studies, including a frontoparietal network encompassing dorsomedial prefrontal cortex, as well
as in dopaminergic regions of the midbrain, the OFC and striatum (Gläscher et al., 2010; Rushworth
et al., 2011; Lee et al., 2014; O’Doherty et al., 2015; Howard and Kahnt, 2018; Witkowski et al., 2022).
3
JNeurosci Accepted Manuscript
The goal of the present study was to investigate neural representations of Pavlovian associative
learning at the cellular level in humans, leveraging high-resolution single neuron recordings. A funda-
mental gap in previous studies is understanding how model-based learning utilizes stimulus-stimulus
associations in the context of Pavlovian learning, beyond instrumental conditioning. To accomplish
this goal, we asked subjects to perform a sequential appetitive Pavlovian learning task in which visual
stimuli and outcomes were paired probabilistically. The sequential task design enabled the identifica-
tion of model-based processes such as the encoding of stimulus-stimulus associations.
We hypothesized that neurons in ventromedial prefrontal cortex and amygdala might play a role in
encoding predictive information about stimulus identity, consistent with a role for these structures in
stimulus-stimulus learning, and in model-based inference more generally (Jones et al., 2012; Howard
et al., 2015; Boorman et al., 2016; Pauli et al., 2019). Furthermore, we tested whether neural correlates
of prediction errors that could underpin learning of stimulus-stimulus associations could be found at
the single neuron level.
Materials and Methods
Electrophysiology and recording
We used Behnke-Fried hybrid depth electrodes (AdTech Medical), positioned exclusively according to
clinical criteria. Broadband extracellular bipolar recordings were performed with a sampling rate of
32 kHz and a standard bandpass of 0.1-9000Hz (ATLAS system, Neuralynx Inc.).
We performed electrode localization with pre- and post- operative MRI scans. We utilized the
same pipeline detailed elsewhere (Kyzar et al., 2024), briefly described here. Pre- and post- operative
scans were aligned with Freesurfer (Reuter et al., 2010) then a forward mapping from scans to the
CIT168 template brain map (Tyszka and Pauli, 2016) was computed with the ANTs suite (Avants
et al., 2008), resulting in a projection of post-operative scans onto standardized MNI152-registered
space. Electrode locations were then visualized and mapped in this space with Freesurfer.
The data set reported here was obtained bilaterally from hippocampus (HIP), amygdala (AMY),
ventromedial prefrontal cortex (vmPFC), dorsal anterior cingulate cortex (dACC), and pre-supplementary
motor area (preSMA) with one macroelectrode on each side. Each macroelectrode contained eight 40
4
JNeurosci Accepted Manuscript
μ
m diameter microwires. To correct for noise and motion artifacts, recordings were locally referenced,
with one of the eight microwires serving as a reference in every bundle, by subtracting its activity
from all the other wires in the same bundle. No units were recorded on the reference wire.
Ventromedial prefrontal cortex (including neurons on the medial orbital surface), as well as the
amygdala, are two brain areas that have long been suggested to play a role in value-based learning
and Pavlovian learning in particular (Schoenbaum et al., 1998; Gottfried et al., 2003; Holland and
Gallagher, 2004; Hampton et al., 2006; Ostlund and Balleine, 2007; Kennerley and Wallis, 2009;
Salzman and Fusi, 2010; Wang et al., 2017; Rudebeck et al., 2017; Pauli et al., 2019; Paton et al., 2006).
On the other hand, the hippocampus is a brain region that has been previously implicated in model-
based inference (Wimmer and Shohamy, 2012; Boorman et al., 2016; Miller et al., 2017). Additionally,
the anterior cingulate cortex and pre-supplementary motor cortex (preSMA) in dorsomedial frontal
cortex, are regions that have been previously found to prominently encode error signals (Bonini et al.,
2014; Fu et al., 2019; Sajad et al., 2019; Fu et al., 2022; Aquino et al., 2023).
Patients
Twelve patients (eight females) were implanted with depth electrodes for seizure monitoring prior to
potential surgery for treatment of drug resistant epilepsy. One of the patients performed the task
twice, totalling 13 recorded sessions. Human research experimental protocols were approved by the
Institutional Review Boards of the California Institute of Technology and the Cedars-Sinai Medical
Center. Patients provided informed consent and were reimbursed with a fifty dollar gift card at the end
of their participation period. Electrode location was determined based on preoperative structural MRI
and postoperative MRI and/or CT scans obtained for each patient as described previously (Minxha
et al., 2020). Patient information (sex and age) is displayed in Table 2.
Experimental Design
Patients performed a sequential Pavlovian conditioning task (Pauli et al., 2015) in which two fractals
were presented in sequence, acting as distal and proximal conditioned stimuli (CSd, CSp, respectively).
Following the pair of conditioned stimuli (CSd and CSp), a video outcome was presented, either in the
form of a hand depositing the patient’s candy of choice into a paper bag, for a positive outcome, or in
5
JNeurosci Accepted Manuscript
the form of an empty hand approaching a paper bag, for a neutral outcome. Patients were informed
that each time they received a positive outcome contributed to a grand total of actual candy pieces
they would receive at the end of the experiment. Every patient experienced exactly 48 win trials
and 48 no-win trials. Prior to the task, patients visually inspected and chose one among five possible
candy options (Reese’s Pieces, Hershey’s Kisses, York Peppermint, Werther’s Caramel, or Hi-Chew)
to be delivered, according to their preference. After the end of the session, they received the closest
possible equivalent to 200 calories in their candy of choice, to reward them for the 48 win trials they
experienced.
Trial structure is detailed in Fig. 1A. After a 1s fixation period, a distal CS was presented for 3s,
followed by a 1s fixation period. Then, a proximal CS was displayed for 3s, followed by an outcome
presentation video edited to a length of 3s. Intertrial intervals were jittered between 0.5s-1s. The
distal CS was always presented along the vertical axis, either above or below the center of the screen,
at a random position. Similarly, the proximal CS was always presented in a random position along
the horizontal axis, inside one of two possible squares positioned to the left or the right of the screen.
There was no relationship between CSd identity and display location and display positioning was
randomly chosen for each trial. As an attention check, patients were asked to perform a button press
whenever they saw a CSp inside one of the squares, reporting which of the two squares it was. To
ensure this remained a purely Pavlovian paradigm, patients were instructed that these button presses
did not affect the outcome of the trial in any way. Button presses were collected with a CEDRUS
button box.
Each session contained a total of 96 trials, split into 4 blocks of 24 trials, with a two minute break
between blocks. Within each block, there were 4 possible CS, 2 distal (A,B) and 2 proximal (X,Y).
The identity of the distal stimuli was re-selected in every block, but the two proximal stimuli were kept
the same throughout the entire session, though their valences were reversed between blocks. There
were four trial types in total (Fig. 1C): two common transitions (A-X-Reward and B-Y-No reward)
and two rare transitions (A-Y-No reward and B-X-Reward), meaning that the CSp was always fully
predictive of the outcome, while the CSd was only partially predictive. Trial frequencies were
37
.
5%
for each of the common types and
12
.
5%
for each of the rare types. Every block always had exactly
9 trials of each common type and 3 trials of each rare type. Trial type order was selected randomly,
6
JNeurosci Accepted Manuscript
except for the first 4 trials of each block, which we enforced to be of the common type, selected
randomly.
The 10 fractals used in each session (2 CSd fractals for each block and 2 CSp fractals for the
whole session) were chosen specifically for each patient out of a set of 24 possible fractals using the
following procedure: before the beginning of the task, patients rated all 24 fractals for their subjective
preference using a sliding bar from extremely unpleasant to extremely pleasant. The 10 stimuli which
elicited the most neutral responses were selected for the experiment and assigned randomly to their
CSd or CSp roles. All the 24 fractals had the same mean luminance. We enforced this by enforcing
equal average RGB values across pixels for all fractals, along the (0,0,0) to (255,255,255) spectrum.
At the end of every block, patients rated the 4 fractals they experienced for how pleasant they were.
We z-scored ratings for each patient and used aggregate stimulus rating changes across patients as a
metric of Pavlovian conditioning.
Eye tracking
We tracked patients’ pupil diameter during the task as a candidate conditioning metric (Pauli et al.,
2015, 2019; Pool et al., 2019). We used an EyeLink 1000 system (SR Research Ltd.) attached to the
bottom of the task screen as described in (Minxha et al., 2017; Wang et al., 2019). We calibrated
the system using EyeLink’s five point calibration before the session and between blocks (average
calibration error =
0
.
40
±
0
.
17
). Patients were instructed to fixate during fixation periods but were
free to gaze anywhere in the screen otherwise. Pupil data was preprocessed to remove blinks and
outlier points further than 5 s.d. from the mean diameter. We utilized the proprietary automated
blink detection method provided by EyeLink 1000 (SR Research Ltd.), which finds time points with
missing pupil data and tags the beginning and end of blink events. We interpolated missing values
removed in this way with the closest previous value and then filtered data with a 50ms moving average
window. Pupil diameters were normalized in every trial relative to the initial 1s fixation period, using
the average diameter in that period as a baseline. Statistical analyses were then performed using
average diameter changes in the 0.5s-3s time windows after CSd and CSp presentations.
7
JNeurosci Accepted Manuscript
Computational model of learning
We used a normative model-based model to obtain estimates of how patients encoded transition prob-
abilities between task states, stimulus expected values, and unsigned prediction errors. We adapted a
model used for a sequential instrumental task (Gläscher et al., 2010), to estimate a matrix
T
(
s,s
)
for
the transition probabilities from start states
s
to end states
s
. Given distal states (A,B), proximal
states (X,Y), and binary outcome states (reward,
value
= 1
, R; no reward,
value
= 0
, N), we defined
the transition matrix T with transition probabilities
t
ss
as:
T
=
0 0
t
AX
t
AY
0
0
0 0
t
BX
t
BY
0
0
0 0 0
0
t
XR
t
XN
0 0 0
0
t
Y R
t
Y N
0 0 0
0
0
0
0 0 0
0
0
0
(1)
The
t
values were initialized to 0.5 and the remaining values in the matrix were chosen to be 0,
reflecting the constraints of the task’s transition structure. At each step, after transitioning from some
state
s
to an end state
s
, a continuous regressor for unsigned prediction error (uPE) is computed
according to the following equation:
δ
uP E
= 1
T
(
s,s
)
(2)
The transition matrix
T
is then updated using a learning rate
η
, chosen normatively to be 0.22,
adopting a value from a previously validated model-free model (Pauli et al., 2015).
T
new
(
s,s
) =
T
(
s,s
) +
ηδ
uP E
(3)
All the other values in T for states not transitioned into were also updated with
T
new
(
s,s
′′
) =
T
(
s,s
′′
)(1
η
)
to ensure each the sum of each row of T stays equal to 1, as it should reflect total
probability.
Finally, the expected value (EV) of each state could be computed assuming the values of arriving
into the reward and no reward states were 1 and 0, respectively:
8
JNeurosci Accepted Manuscript
EV
A
=
t
AX
t
XR
+
t
AY
t
Y R
(4)
EV
B
=
t
BX
t
XR
+
t
BY
t
Y R
(5)
EV
X
=
t
XR
(6)
EV
Y
=
t
Y R
(7)
Since we were interested in tracking neural responses to stimulus identity associations, we used the
inferred transition matrix T to determine the identity of the most likely proximal stimulus to follow
distal stimulus presentation. We refer to this identity as
CSp presumed identity
. To illustrate the
regressors extracted from the model (EV, uPE), we plotted their time course over all trials in one
example session in Fig. 2.
We chose a normative model-based learning mechanism for analyzing the data because this model
allowed us to estimate the presumed identity of subsequent stimuli in the associative chain, thereby
allowing us to probe stimulus-stimulus mappings in the task. It is important to note that a model-free
temporal difference (TD) model (Sutton, 1988) could be used instead to model this data, and that
such a model generates highly similar time-series for value signals as we used in the behavioral and
neural analyses (though it would not allow us to make predictions about stimulus identities). Using
such a TD-like model, a signed reward prediction error can be estimated, and the absolute value
(unsigned) of this error signal also strongly correlates with uPE signals used in the analysis. While
we implemented such a TD model alongside the MB model and used it to analyze the data, we do not
report on the results of this analysis because it produces highly similar results about value encoding
as reported using the existing MB model, and because this TD model does not allow us to test for
stimulus identity representations.
9
JNeurosci Accepted Manuscript
Neural data pre-processing
We performed spike detection and sorting with the semiautomatic template-matching algorithm OSort,
utilizing the energy detection method, with energy signal thresholds from 4.5 to 5 standard deviation
units. This process is described elsewhere (Rutishauser et al., 2006), and we reproduce some key
components here for brevity. After OSort provided automated spike detection and putative cluster
formation, we manually determined which clusters were valid putative neural units, by considering
the following criteria: absence of contamination by 60Hz power noise in waveform power spectra;
waveform stability in width and amplitude; absence of refractory period violation (less than
3%
of interspike intervals under 3ms). Clusters found by OSort could be manually merged in case of
waveform similarity, as previously described (Rutishauser et al., 2006).
Across all 13 sessions, we obtained 86 vmPFC, 165 amygdala, 119 hippocampus, 137 preSMA and
103 dACC putative single units (610 total). We refer to these isolated putative single units as “neuron”
and “cell” interchangeably.
Single neuron encoding analysis
To quantify how single neuron activity correlated with several variables of interest, we performed a
negative binomial generalized linear model (GLM) analysis. As a dependent variable, we measured
spike counts in three time windows: CSd window, from 0.25s to 3s after CSd presentation; CSp
window, from 0.25s to 3s after CSp presentation; outcome window, from 0.25s to 2.25s after outcome
presentation. Table 1 includes all dependent variables and the time windows in which they were
tested. We defined reward prediction errors (RPEs) as the difference between proximal EV at the
time of outcome and the outcome valence itself.
Dependent variable
Time window
CSp presumed identity, CSd EV
CSd
CSp identity, CSp EV, uPE, button press
CSp
Outcome, uPE, RPE
Outcome
Table 1: Dependent variables and respective time windows for negative binomial GLM and population decoding
analyses.
10
JNeurosci Accepted Manuscript
Population decoding analysis
We performed population decoding analyses by training a linear support vector machine (SVM) with
MATLAB’s function
fitcsvm
for binary classes, or
fitecoc
for more classes. In each session, we defined
a population activity matrix X of dimensions
(
nTrials,nNeurons
)
by counting spikes in each trial
within the same time windows from the encoding analysis. We then defined the decoded variable
y
as
the same dependent variables from Table 1. To reduce the decoding problem to a classification task, we
binned the continuous regressors (EV, uPE) into 3 tertiles before training the SVM with MATLAB’s
multiclass function
fitcecoc
. Cross-validation was performed by training on 2 trial blocks and testing
on the 2 remaining trial blocks of each session. Since there were 4 blocks in total, we repeated the
procedure for every possible combination of train/test blocks, resulting in 6 cross-validation folds. Test
accuracies were averaged across folds and reported separately for each session. To obtain test accuracy
significance levels, we repeated this entire procedure 500 times while shuffling the decoded variable
y
and compare the null mean test accuracy obtained in this manner with the true test accuracy, for
each brain region. Finally, we corrected significance thresholds for the number of tested brain areas.
For this analysis, we excluded one session, which contained only 1 block of trials. For comparisons
between GLM encoding analyses and SVM decoding analyses, we utilized the SVM beta values in
binary classes or the average absolute beta values across multiple binary learners for multiple class
decoding.
Cross-correlation analysis
To measure how neural activity across brain areas of interest was correlated, we performed a spike-
spike cross-correlation analysis, with shuffle correction (Brody, 1999). First, we divided all trials in
EV or uPE tertiles, excluding the middle tertile to obtain low EV or uPE and high EV or uPE trial
groups. Then, for each trial group, we computed cross-correlations for each neuron pair containing
neurons A and B recorded from the same session, but different brain areas (e.g. each neuron pair
contained one vmPFC neuron vs. one amygdala neuron).
For two spike trains
S
r
A
,
S
r
B
(binned at 50ms with 5ms steps, constrained to CSp presentation time
window), recorded from neurons A and B on trial
r
, we define the cross-correlogram of each trial as:
11
JNeurosci Accepted Manuscript
C
r
(
τ
) =
X
t
=
−∞
S
r
A
(
t
)
S
r
B
(
t
+
τ
) :=
S
r
A
S
r
B
(8)
Defining the
<>
operator as averaging across trials, we defined the shuffle-corrected cross-correlogram
as:
V
:=
<
(
S
r
A
< S
r
A
>
)
(
S
r
B
< S
r
B
>
)
>
(9)
V
=
< S
r
A
S
r
B
>
< S
r
A
>
< S
r
B
>
(10)
The shuffle-correction procedure corrects for time-locked co-variation that might be caused to both
neurons concurrently by stimulus presentation. Additionally, it ensures that the expected value of
V
is 0 if
S
A
and
S
B
are independent.
Finally, the reported shuffle-corrected correlation results were obtained by averaging the
V
vectors
across all neuron pairs. To summarize the correlation results, we obtained the integrals from the
positive and negative time lag regions and performed an ANOVA with EV or uPE level (low vs. high)
and time lag sign (which region leads) as factors.
Statistical Analysis
Binomial test
After labeling each neuron as coding for a variable of interest, a way to determine whether neuron
counts in each brain area were larger than expected by chance were binomial tests on the number
of significant neurons, relative to the size of the tested population of each brain area. Concretely,
assuming a false positive rate of
5%
, and a Bernoulli process to generate significant neurons at this
rate, the number of neurons S falsely classified as significant within a population of size N is given by
a binomial distribution:
S
binomial
(
N,
0
.
05)
. Accordingly, we derived a binomial p-value for the
probability of obtaining an observed sensitive neuron count
K
larger than expected by chance, using
the cumulative binomial distribution:
p
= 1
binomialCDF
(
K,N,
0
.
05)
12
JNeurosci Accepted Manuscript
Jaccard index test
After performing negative binomial GLM encoding analyses, we tested whether the sub-populations of
neurons which were sensitive to two variables of interest had significant overlap. For this, we computed
the Jaccard index (Jaccard, 1912) of overlap between neurons sensitive to each of the variables
X
and
Y
, where
N
X
and
N
Y
indicate the number of neurons sensitive to the variables X and Y, respectively,
and
N
X,Y
indicates the number of neurons concurrently sensitive to both variables:
J
=
N
X,Y
N
X
+
N
Y
N
X,Y
(11)
To compute p-values for each comparison between two variables, we bootstrapped a null distribu-
tion of Jaccard indexes using 1000 reshuffles, considering X and Y are independent variables with a
false positive rate of
p
= 0
.
05
.
Results
Behavioral evidence of Pavlovian conditioning
We recorded 87 vmPFC, 166 AMY, 127 HIP, 112 dACC, and 138 preSMA single neurons (630 total)
in 13 sessions from 12 patients implanted with hybrid macro/micro electrodes for epilepsy monitoring
(Fig. 1C). Neuron counts per brain area for each patient are displayed in Table 3. Patients performed
a sequential Pavlovian conditioning task (Fig. 1A) with two conditioned stimuli in the form of fractal
images: distal (CSd), followed by proximal (CSp). Here, proximal and distal denote the temporal
sequence of stimuli, without any spatial connotation. Conditioned stimuli were then followed by an
outcome, which could be rewarding or neutral (Pauli et al., 2015, 2019). Outcomes were delivered
in the form of videos, either of a hand depositing a piece of candy in a bag, or of an empty hand
approaching a bag. Patients were told that every display of the rewarding video contributed partially
to the amount of real candy they would be given after the end of the session. Patients were asked
to pay attention to CS identities as they would be predictive of rewards. Subjects were asked to
indicate by button which side of the screen the CSp stimulus was shown to verify attention, but were
informed that their accuracy or speed of doing so did not influence the outcome of the trial in any
way. In each of the four blocks, a CSd/CSp pair would be more likely associated with the reward,
13
JNeurosci Accepted Manuscript
and were therefore defined as CSd+/CSp+ (see Materials and Methods for task details), according to
a common/rare transition structure (Fig. 1D). Conversely, the other CSd/CSp pair was more likely
to precede the neutral outcome, and are referred to as CSdn/CSpn.
To characterize value learning and stimulus predictions in this task, we fit a normative model-based
transition matrix model (see Materials and Methods for model details) to infer expected values (EV),
unsigned prediction errors (uPE) and transition probabilities on a trial by trial basis, for each session
(Fig. 1E). With these transition probabilities, we could also estimate which CSp was most likely to
follow a CSd in each trial, which we refer to as
CSp presumed identity
. To do so, we utilized the
inferred transition probability matrix to determine which state was most likely to come next in the
trial, given the current state. In each block, six states were possible: two distal stimuli, two proximal
stimuli, and two possible outcomes (reward or neutral). Transitions from distal to proximal stimuli
followed a common-rare transition probability structure of
75%
chance for common transitions, versus
25%
for rare transitions. Once the trial arrived into a proximal stimulus state, transitions to outcome
states were fully deterministic. To infer whether Pavlovian conditioning occurred across patients, we
correlated the obtained model covariates with behavioral metrics indicative of conditioning: stimulus
ratings and pupil dilation.
Subjective preference ratings for all fractal images were obtained in the beginning of the task. Ad-
ditionally, between blocks, we asked patients to re-rate the fractals that were included in the previous
block to obtain measures of changes in subjective preference as a function of patients’ experience in
the task (Fig. 1B). When grouping CSd and CSp together, we observed that stimuli used as CS+ were
rated significantly higher than stimuli used as CSn (p = 0.02, one sided t-test, Fig. 1F). We also tested
whether distal and proximal stimuli had their ratings change by a different amount by contrasting
absolute rating changes for (CSd+,CSdn) versus (CSp+,CSpn), and found no significant difference
for distal vs. proximal stimuli (p=0.16, two-tailed t-test). Finally, we found that initial ratings for
stimuli which would be used as CSd+ and CSdn did not significantly differ (
p
= 0
.
91
, t-test), further
indicating stimulus rating changes were a consequence of experiencing the task. We did not find an
effect of conditioning over response times for the button press during proximal stimulus presentation,
after testing for a correlation between RTs and proximal EVs (
p
= 0
.
96
, linear regression), and between
RTs and SPEs at proximal stimulus presentation (
p
= 0
.
36
, linear regression).
14
JNeurosci Accepted Manuscript
We performed eye tracking simultaneously with single-neuron recordings to further assess the suc-
cess of conditioning (see methods). Pupil diameter was analyzed in two distinct time windows: during
CSd presentation and CSp presentation (see Materials and Methods for details on pupil analysis). We
obtained the average pupil diameter change within these periods, relative to a baseline, and tested
whether they correlated with model covariates with a linear mixed effects model, with session number
as a random effect. Specifically, the model for pupil diameter during CSd presentation included the
EV of the CSd, while the model for pupil diameter during CSp presentation included CSp EV, uPE,
and an interaction term between CSp EV and uPE. We found no effect for CSd EV in the first model
(p = 0.30). In the second model (Fig. 1E), there was no significant effect of CSp EV (p = 0.07)
nor uPE (p = 0.06), but there was a significant interaction between CSp EV and uPE (p = 0.02).
This result indicates that pupil diameter correlated with a combination of computational factors in-
ferred from the model-based framework. A similar interaction was previously observed in a Pavlovian
conditioning paradigm performed in a neurotypical population (Pauli et al., 2015).
Overall, the aggregate behavioral evidence from changes in subjective stimulus ratings and pupil
sensitivity to an interaction of EV and uPE indicates collective evidence of Pavlovian conditioning
across our subject sample.
Predictive identity coding in vmPFC
We next investigated whether firing rates in individual neurons correlated with task variables and
the estimated computational model-derived covariates using a negative binomial GLM analysis (see
Materials and Methods for details). We obtained spike counts for each neuron in the time windows
that were relevant for each regressed variable (e.g. counting spikes during outcome presentation for
regressing outcomes). After obtaining the number of neurons significant for a given criterion, we
tested whether the number of significant neurons in each brain region was more than expected by
chance with a binomial test (Bonferroni corrected for the number of tested brain areas).
We first regressed the CSp presumed identity (i.e. the most likely proximal stimulus identity)
with the firing rate of neurons at the time of CSd presentation (Fig. 3A) to test whether the most
likely identity of the next presented stimulus was already encoded by neurons at distal time.
11
.
6%
of vmPFC neurons encoded CSp presumed identity (
p <
0
.
05
, binomial test), indicating that vmPFC
15
JNeurosci Accepted Manuscript
neurons represent predicted stimulus identity in a stimulus-stimulus association context. On the other
hand, we found that no brain areas significantly encoded the actual identity of the proximal stimulus
at the time of proximal stimulus presentation (
p >
0
.
05
for all regions, binomial test).
To determine whether the activity of CSp presumed identity neurons in vmPFC also correlated
with the identity of the distal stimulus, we pre-selected neurons that were classified as sensitive in
the previous analysis and tested whether activity in this neuron cohort correlated with distal stimulus
identity by performing an ANOVA with 8 categories (for the 2 possible CSd identities in each of the 4
blocks), utilizing z-scored spike counts within the CSd presentation period as the analyzed variable. In
vmPFC, we found that only 1/10 CSp predictive neurons also correlated with CSd identity, indicating
a predominantly distinct code between these two variables.
For vmPFC neurons whose response signaled predictive identity coding, we summarized their
normalized firing rates over time, separated by trials containing their preferred versus non-preferred
identity (Fig. 3B). For each time point, we tested whether preferred versus non-preferred firing rates
were different, and found that they first differed 0.73s after CSd onset (
p <
0
.
05
, t-test, uncorrected).
An example neuron performing this type of encoding is shown in Fig. 3C. We also regressed the EV
of the presumed CSp at distal time and the actual CSp identity at proximal time with firing rate but
did not find a significant neuron count in any region (
p >
0
.
05
for all regions, binomial test).
We next turned to population level analysis to investigate if the joint activity patterns from all
neurons recorded simultaneously in a given brain area were predictive of the variables of interest. All
population level decoding analysis throughout this paper is performed session-by-session, only utiliz-
ing neurons that were recorded simultaneously. We performed cross-validated population decoding
analysis with a linear SVM, obtaining significance levels with a bootstrapped null distribution (see
Materials and Methods for details). We found that CSp presumed identity could be significantly
decoded at distal time in vmPFC (
p <
0
.
01
, permutation test), in consonance with the single neuron
selection result discussed above (Fig. 3D). Decoding results per stimulus for each patient are dis-
played in Table 4. We also found that the linear SVM weights for vmPFC neurons correlate (Pearson
correlation test,
ρ
= 0
.
79
,
p <
0
.
001
) with the negative binomial GLM weights from the encoding
analysis, indicating a correspondence between these two modalities of analysis (Fig. 3E). These re-
sults further establish vmPFC neural activity as a substrate for predictive coding during learning of
16
JNeurosci Accepted Manuscript