Supplementary Material for Encoding of predictions and learning
1
signals during Pavlovian conditioning by human neurons
2
Authors:
Tomas G. Aquino
1
,
4
,
∗
, Hristos Courellis
2
, Adam N. Mamelak
4
,
Ueli Rutishauser
1
,
4
,
†
, John P. O’Doherty
1
,
3
,
†
Affiliations:
1
Computation and Neural Systems, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
2
Biological Engineering, Division of Biology and Biological Engineering,
California Institute of Technology, 91125, USA
3
Division of Humanities and Social Sciences, California Institute of Technology, 91125, USA
4
Department of Neurosurgery, Cedars-Sinai Medical Center, 90048, USA
†
Joint senior authors
∗
To whom correspondence should be addressed; E-mail: taquino@caltech.edu.
3
February 13, 2023
4
1
1 Materials and Methods
5
1.1 Electrophysiology and recording
6
We used Behnke-Fried hybrid depth electrodes (AdTech Medical), positioned exclusively according to clinical
7
criteria. Broadband extracellular recordings were performed with a sampling rate of 32 kHz and a bandpass of 0.1-
8
9000Hz (ATLAS system, Neuralynx Inc.). The data set reported here was obtained bilaterally from hippocampus
9
(HIP), amygdala (AMY), ventromedial prefrontal cortex (vmPFC), dorsal anterior cingulate cortex (dACC), and
10
pre-supplementary motor area (preSMA) with one macroelectrode on each side. Each macroelectrode contained
11
eight 40
μ
m diameter microwires. Recordings were locally referenced, with one of the eight microwires serving as
12
a reference in every bundle.
13
1.2 Patients
14
Twelve patients (eight females) were implanted with depth electrodes for seizure monitoring prior to potential
15
surgery for treatment of drug resistant epilepsy. One of the patients performed the task twice, totalling 13 recorded
16
sessions. Human research experimental protocols were approved by the Institutional Review Boards of the Cal-
17
ifornia Institute of Technology and the Cedars-Sinai Medical Center. Electrode location was determined based
18
on preoperative structural MRI and postoperative MRI and/or CT scans obtained for each patient as described
19
previously Minxha et al. (2020). Patient information (sex and age) is displayed in Supplementary Table 1.
20
1.3 Pavlovian conditioning task
21
Patients performed a sequential Pavlovian conditioning task Pauli et al. (2015) in which two fractals were presented
22
in sequence, acting as distal and proximal conditioned stimuli (CSd, CSp, respectively). Following the CS pair, a
23
video outcome was presented, either in the form of a hand depositing the patient’s candy of choice into a paper
24
bag, for a positive outcome, or in the form of an empty hand approaching a paper bag, for a neutral outcome.
25
Patients were informed that each time they received a positive outcome contributed to a grand total of actual
26
candy pieces they would receive at the end of the experiment. Every patient experienced exactly 48 win trials and
27
48 no-win trials. Prior to the task, patients visually inspected and chose one among five possible candy options
28
(Reese’s Pieces, Hershey’s Kisses, York Peppermint, Werther’s Caramel, or Hi-Chew) to be delivered, according to
29
their preference. After the end of the session, they received the closest possible equivalent to 200 calories in their
30
candy of choice, to reward them for the 48 win trials they experienced.
31
Trial structure is detailed in Fig. 1A. After a 1s fixation period, a distal CS was presented for 3s, followed by
32
a 1s fixation period. Then, a proximal CS was displayed for 3s, followed by an outcome presentation video edited
33
to a length of 3s. Intertrial intervals were jittered between 0.5s-1s. The distal CS was always presented along the
34
vertical axis, either above or below the center of the screen, at a random position. Similarly, the proximal CS was
35
2
always presented in a random position along the horizontal axis, inside one of two possible squares positioned to
36
the left or the right of the screen. As an attention check, patients were asked to perform a button press whenever
37
they saw a CSp inside one of the squares, reporting which of the two squares it was. To ensure this remained a
38
purely Pavlovian paradigm, patients were instructed that these button presses did not affect the outcome of the
39
trial in any way.
40
Each session contained a total of 96 trials, split into 4 blocks of 24 trials, with a two minute break between
41
blocks. Within each block, there were 4 possible CS, 2 distal (A,B) and 2 proximal (X,Y). The identity of the
42
distal stimuli was re-selected in every block, but the two proximal stimuli were kept the same throughout the entire
43
session, though their valences were reversed between blocks. There were four trial types in total (Fig. 1C): two
44
common transitions (A-X-Reward and B-Y-No reward) and two rare transitions (A-Y-No reward and B-X-Reward),
45
meaning that the CSp was always fully predictive of the outcome, while the CSd was only partially predictive. Trial
46
frequencies were
37
.
5%
for each of the common types and
12
.
5%
for each of the rare types. Every block always had
47
exactly 9 trials of each common type and 3 trials of each rare type. Trial type order was selected randomly, except
48
for the first 4 trials of each block, which we enforced to be of the common type, selected randomly.
49
The 10 fractals used in each session (2 CSd fractals for each block and 2 CSp fractals for the whole session)
50
were chosen specifically for each patient out of a set of 24 possible fractals using the following procedure: before the
51
beginning of the task, patients rated all 24 fractals for their subjective preference using a sliding bar from extremely
52
unpleasant to extremely pleasant. The 10 stimuli which elicited the most neutral responses were selected for the
53
experiment and assigned randomly to their CSd or CSp roles. All the 24 fractals had the same mean luminance. We
54
enforced this by enforcing equal average RGB values across pixels for all fractals, along the (0,0,0) to (255,255,255)
55
spectrum. At the end of every block, patients rated the 4 fractals they experienced for how pleasant they were.
56
We z-scored ratings for each patient and used aggregate stimulus rating changes across patients as a metric of
57
Pavlovian conditioning.
58
1.4 Eye tracking
59
We tracked patients’ pupil diameter during the task as a candidate conditioning metric Pauli et al. (2015, 2019),
60
Pool et al. (2019). We used a EyeLink 1000 system (SR Research Ltd.) attached to the bottom of the task
61
screen as described in Minxha et al. (2017), Wang et al. (2019). We calibrated the system using EyeLink’s five
62
point calibration before the session and between blocks (average calibration error =
0
.
40
◦
±
0
.
17
). Patients were
63
instructed to fixate during fixation periods but were free to gaze anywhere in the screen otherwise. Pupil data was
64
preprocessed to remove blinks and outlier points further than 5 s.d. from the mean diameter. We interpolated
65
missing values removed in this way with the closest previous value and then filtered data with a 50ms moving
66
average window. Pupil diameters were normalized in every trial relative to the initial 1s fixation period, using the
67
average diameter in that period as a baseline. Statistical analyses were then performed using average diameter
68
changes in the 0.5s-3s time windows after CSd and CSp presentations.
69
3
1.5 Computational model of learning
70
We used a normative model-based model to obtain estimates of how patients encoded transition probabilities
71
between task states, stimulus expected values, and state prediction errors. We adapted a model used for a sequential
72
instrumental task Gläscher et al. (2010), to estimate a matrix
T
(
s,s
′
)
for the transition probabilities from start
73
states
s
to end states
s
′
. Given distal states (A,B), proximal states (X,Y), and outcome states (reward, R; no
74
reward, N), we defined the transition matrix T with transition probabilities
t
ss
′
as:
75
T
=
0 0
t
AX
t
AY
0
0
0 0
t
BX
t
BY
0
0
0 0 0
0
t
XR
t
XN
0 0 0
0
t
Y R
t
Y N
0 0 0
0
0
0
0 0 0
0
0
0
(1)
The
t
values were initialized to 0.5 and the remaining values in the matrix were chosen to be 0, reflecting the
76
constraints of the task’s transition structure. At each step, after transitioning from some state
s
to an end state
77
s
′
, a state prediction error is computed according to the following equation:
78
δ
SP E
= 1
−
T
(
s,s
′
)
(2)
The transition matrix
T
is then updated using a learning rate
η
, chosen normatively to be 0.22, adopting a
79
value from a previously validated model-free model Pauli et al. (2015).
80
T
new
(
s,s
′
) =
T
(
s,s
′
) +
ηδ
SP E
(3)
All the other values in T for states not transitioned into were also updated with
T
new
(
s,s
′′
) =
T
(
s,s
′′
)(1
−
η
)
81
to ensure each the sum of each row of T stays equal to 1, as it should reflect total probability.
82
Finally, the value of each state could be computed assuming the values of arriving into the reward and no reward
83
states were 1 and 0, respectively:
84
EV
A
=
t
AX
t
XR
+
t
AY
t
Y R
(4)
EV
B
=
t
BX
t
XR
+
t
BY
t
Y R
(5)
EV
X
=
t
XR
(6)
4
EV
Y
=
t
Y R
(7)
Since we were interested in tracking neural responses to stimulus identity associations, we used the inferred tran-
85
sition matrix T to determine the identity of the most likely proximal stimulus to follow distal stimulus presentation.
86
We refer to this identity as
CSp presumed identity
.
87
We chose a normative model-based learning mechanism for analyzing the data because this model allowed us
88
to estimate the presumed identity of subsequent stimuli in the associative chain, thereby allowing us to probe
89
stimulus-stimulus mappings in the task. It is important to note that a model-free temporal difference (TD) model
90
(cite Sutton) could be used instead to model this data, and that such a model generates highly similar time-series
91
for value signals as we used in the behavioral and neural analyses (though it would not allow us to make predictions
92
about stimulus identities). Using such a TD-like model, a signed reward prediction error can be estimated, and the
93
absolute value (unsigned) of this error signal also strongly correlates with SPE signals used in the analysis. While
94
we implemented such a TD model alongside the MB model and used it to analyze the data, we do not report on
95
the results of this analysisbecause it produces highly similar results about value encoding as reported using thr
96
existing MB model, and because this TD model does not allow us to test for stimulus identity representations.
97
1.6 Neural data pre-processing
98
We performed spike detection and sorting with the semiautomatic template-matching algorithm OSort Rutishauser
99
et al. (2006). Across all 13 sessions, we obtained 86 vmPFC, 165 amygdala, 119 hippocampus, 137 preSMA and
100
103 dACC putative single units (610 total). We refer to these isolated putative single units as “neuron” and “cell”
101
interchangeably.
102
1.7 Single neuron encoding analysis
103
To quantify how single neuron activity correlated with several variables of interest, we performed a Poisson general-
104
ized linear model (GLM) analysis. As a dependent variable, we measured spike counts in three time windows: CSd
105
window, from 0.25s to 3s after CSd presentation; CSp window, from 0.25s to 3s after CSp presentation; outcome
106
window, from 0.25s to 2.25s after outcome presentation. Table 1 includes all dependent variables and the time
107
windows in which they were tested. We defined RPEs as the difference between proximal EV at the time of outcome
108
and the outcome valence itself.
109
Dependent variable
Time window
CSp presumed identity, CSd EV
CSd
CSp identity, CSp EV, SPE
CSp
Outcome, SPE, RPE
Outcome
Table 1: Dependent variables and respective time windows for Poisson GLM and population decoding analyses.
5
1.8 Jaccard index test
110
After performing Poisson GLM encoding analyses, we tested whether the sub-populations of neurons which were
111
sensitive to two variables of interest had significant overlap. For this, we computed the Jaccard index Jaccard
112
(1912) of overlap between neurons sensitive to each of the variables
X
and
Y
, where
N
X
and
N
Y
indicate the
113
number of neurons sensitive to the variables X and Y, respectively, and
N
X,Y
indicates the number of neurons
114
concurrently sensitive to both variables:
115
J
=
N
X,Y
N
X
+
N
Y
−
N
X,Y
(8)
To compute p-values for each comparison between two variables, we bootstrapped a null distribution of Jaccard
116
indexes using 1000 reshuffles, considering X and Y are independent variables with a false positive rate of
p
= 0
.
05
.
117
1.9 Population decoding analysis
118
We performed population decoding analyses by training a linear support vector machine (SVM) with MATLAB’s
119
function
fitcsvm
. In each session, we defined a population activity matrix X of dimensions
(
nTrials,nNeurons
)
by
120
counting spikes in each trial within the same time windows from the encoding analysis. We then defined the decoded
121
variable
y
as the same dependent variables from Table 1. To reduce the decoding problem to a classification task,
122
we binned the continuous regressors (EV, SPE) into 3 tertiles before training the SVM with MATLAB’s multiclass
123
function
fitcecoc
. Cross-validation was performed by training on 2 trial blocks and testing on the 2 remaining trial
124
blocks of each session. Since there were 4 blocks in total, we repeated the procedure for every possible combination
125
of train/test blocks, resulting in 6 cross-validation folds. Test accuracies were averaged across folds and reported
126
separately for each session. To obtain test accuracy significance levels, we repeated this entire procedure 500 times
127
while shuffling the decoded variable
y
and compare the null mean test accuracy obtained in this manner with the
128
true test accuracy, for each brain region. Finally, we corrected significance thresholds for the number of tested
129
brain areas. For this analysis, we excluded one session, which contained only 1 block of trials.
130
1.10 Cross-correlation analysis
131
To measure how neural activity across brain areas of interest was correlated, we performed a spike-spike cross-
132
correlation analysis, with shuffle correction Brody (1999). First, we divided all trials in EV tertiles, excluding the
133
middle tertile to obtain low EV and high EV trial groups. Then, for each trial group, we computed cross-correlations
134
for each neuron pair containing neurons A and B recorded from the same session, but different brain areas (e.g.
135
each neuron pair contained one vmPFC neuron vs. one amygdala neuron).
136
For two spike trains
S
r
A
,
S
r
B
(binned at 50ms with 5ms steps, constrained to CSp presentation time window),
137
recorded from neurons A and B on trial
r
, we define the cross-correlogram of each trial as:
138
6
C
r
(
τ
) =
∞
X
t
=
−∞
S
r
A
(
t
)
S
r
B
(
t
+
τ
) :=
S
r
A
⊗
S
r
B
(9)
Defining the
<>
operator as averaging across trials, we defined the shuffle-corrected cross-correlogram as:
139
V
:=
<
(
S
r
A
−
< S
r
A
>
)
⊗
(
S
r
B
−
< S
r
B
>
)
>
(10)
V
=
< S
r
A
⊗
S
r
B
>
−
< S
r
A
>
⊗
< S
r
B
>
(11)
The shuffle-correction procedure corrects for time-locked co-variation that might be caused to both neurons
140
concurrently by stimulus presentation. Additionally, it ensures that the expected value of
V
is 0 if
S
A
and
S
B
are
141
independent.
142
Finally, the reported shuffle-corrected correlation results were obtained by averaging the
V
vectors across all
143
neuron pairs. To summarize the correlation results, we obtained the integrals from the positive and negative time
144
lag regions and performed an ANOVA with EV level (low vs. high) and time lag sign (which region leads) as
145
factors.
146
2 Pavlovian outcome coding
147
At the time of the outcome, we found a significant proportion of neurons correlated with outcome (reward vs.
148
neutral) in dACC (
10
.
7%
,
p <
0
.
05
, binomial test, Supplementary Fig. 2A). An example dACC neuron which
149
coded outcomes is shown in Supplementary Fig. 2B. Additionally, for outcome encoding dACC neurons, we tested
150
when normalized firing rates differed for preferred versus non-preferred outcomes, and found that they first differed
151
1.08s after outcome onset (
p <
0
.
05
, t-test, uncorrected). Out of all significant outcome dACC neurons,
10
/
12
had
152
negative t-scores, indicating they fired less in trials with positive outcomes. This is more than expected by chance
153
assuming an equal probability of positive and negative t-scores (
p
= 0
.
003
, binomial test). At the population
154
level, we found significant decoding of outcomes in dACC (
p <
0
.
001
, permutation test,Supplementary Fig. 2C).
155
However, above and beyond the single neuron encoding results, outcome was also decodable from the hippocampus
156
(
p <
0
.
05
, permutation test) and preSMA (
p <
0
.
05
, permutation test), highlighting a widespread population-
157
level representation of outcome valence. One possible interpretation is that weakly coding neurons provided a
158
distributed code for discriminating between positive and neutral outcomes when considered jointly (Stefanini et al.
159
2020). These results are consistent with previous findings on tracking feedback in medial frontal cortex neurons
160
(Fu et al. 2019, Aquino et al. 2021).
161
Supplementary Figures
162
7
vmPFC
AMY
HIP
dACC
preSMA
0.2
0.3
0.4
0.5
0.6
Decoding accuracy
Decoding uPE (at proximal time)
n.s.
n.s.
*
n.s.
n.s.
vmPFC
AMY
HIP
dACC
preSMA
-2
0
2
4
6
t-score
uPE coding (at proximal time)
3.45%
6.63%
7.09%
1.79%
4.38%
a
b
n.s.
n.s.
n.s.
n.s.
n.s.
Figure 1: Unsigned prediction error encoding at proximal time. (a) Decoding accuracy for uPE during proximal stimulus
presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped
null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. (b)
T-scores for every neuron in each brain area, for a GLM predicting spike counts during proximal stimulus presentation with
uPE as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected
across areas.
0
1
2
3
Time from outcome (s)
0
50
100
Trial nr. (reordered)
0
1
2
3
Time from outcome (s)
0
5
10
15
20
Firing rate (Hz)
No reward
Reward
dACC outcome neuron
signicant
n.s.
preferred
non-preferred
vmPFC
AMY
HIP
dACC
preSMA
-5
0
5
t-score
Outcome coding (at outcome)
5.75%
4.82%
6.3%
10.7%
*
7.25%
0
0.5
1
1.5
2
2.5
Time from outcome (s)
0.2
0.4
0.6
0.8
Normalized ring rate
dACC outcome neurons
a
c
d
vmPFC
AMY
HIP
dACC
preSMA
0.4
0.6
0.8
Decoding accuracy
Decoding outcome (at outcome time)
n.s.
n.s.
*
***
*
b
Figure 2: Outcome encoding in dACC. (a) T-scores for every neuron in each brain area, for a GLM predicting spike counts
during outcome presentation with outcome as a regressor. Red dots indicate significant neurons and stars indicate significance
across the entire region, corrected across areas. (b) dACC neuron whose activity during correlates with outcome (blue: no
reward; black: reward). Top: raster plot; Bottom: PSTH. (c) Decoding accuracy for outcome during outcome presentation.
Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution,
corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. (d) Time course of
normalized firing rates in dACC neurons which coded outcomes. Trials are separated between preferred (black) versus non-
preferred (blue) outcomes, and red dots indicate time points with a significant difference between trajectories.
8