media-1.pdf

Supplementary Material for Encoding of predictions and learning

1

signals during Pavlovian conditioning by human neurons

2

Authors:

Tomas G. Aquino

1

,

4

,

∗

, Hristos Courellis

2

, Adam N. Mamelak

4

Ueli Rutishauser

1

,

4

,

†

, John P. O’Doherty

1

,

3

,

†

Affiliations:

Computation and Neural Systems, Division of Biology and Biological Engineering,

California Institute of Technology, 91125, USA

Biological Engineering, Division of Biology and Biological Engineering,

California Institute of Technology, 91125, USA

Division of Humanities and Social Sciences, California Institute of Technology, 91125, USA

Department of Neurosurgery, Cedars-Sinai Medical Center, 90048, USA

†

Joint senior authors

∗

To whom correspondence should be addressed; E-mail: taquino@caltech.edu.

3

February 13, 2023

4

1 Materials and Methods

5

1.1 Electrophysiology and recording

6

We used Behnke-Fried hybrid depth electrodes (AdTech Medical), positioned exclusively according to clinical

7

criteria. Broadband extracellular recordings were performed with a sampling rate of 32 kHz and a bandpass of 0.1-

8

9000Hz (ATLAS system, Neuralynx Inc.). The data set reported here was obtained bilaterally from hippocampus

9

(HIP), amygdala (AMY), ventromedial prefrontal cortex (vmPFC), dorsal anterior cingulate cortex (dACC), and

10

pre-supplementary motor area (preSMA) with one macroelectrode on each side. Each macroelectrode contained

11

eight 40

m diameter microwires. Recordings were locally referenced, with one of the eight microwires serving as

12

a reference in every bundle.

13

1.2 Patients

14

Twelve patients (eight females) were implanted with depth electrodes for seizure monitoring prior to potential

15

surgery for treatment of drug resistant epilepsy. One of the patients performed the task twice, totalling 13 recorded

16

sessions. Human research experimental protocols were approved by the Institutional Review Boards of the Cal-

17

ifornia Institute of Technology and the Cedars-Sinai Medical Center. Electrode location was determined based

18

on preoperative structural MRI and postoperative MRI and/or CT scans obtained for each patient as described

19

previously Minxha et al. (2020). Patient information (sex and age) is displayed in Supplementary Table 1.

20

1.3 Pavlovian conditioning task

21

Patients performed a sequential Pavlovian conditioning task Pauli et al. (2015) in which two fractals were presented

22

in sequence, acting as distal and proximal conditioned stimuli (CSd, CSp, respectively). Following the CS pair, a

23

video outcome was presented, either in the form of a hand depositing the patient’s candy of choice into a paper

24

bag, for a positive outcome, or in the form of an empty hand approaching a paper bag, for a neutral outcome.

25

Patients were informed that each time they received a positive outcome contributed to a grand total of actual

26

candy pieces they would receive at the end of the experiment. Every patient experienced exactly 48 win trials and

27

48 no-win trials. Prior to the task, patients visually inspected and chose one among five possible candy options

28

(Reese’s Pieces, Hershey’s Kisses, York Peppermint, Werther’s Caramel, or Hi-Chew) to be delivered, according to

29

their preference. After the end of the session, they received the closest possible equivalent to 200 calories in their

30

candy of choice, to reward them for the 48 win trials they experienced.

31

Trial structure is detailed in Fig. 1A. After a 1s fixation period, a distal CS was presented for 3s, followed by

32

a 1s fixation period. Then, a proximal CS was displayed for 3s, followed by an outcome presentation video edited

33

to a length of 3s. Intertrial intervals were jittered between 0.5s-1s. The distal CS was always presented along the

34

vertical axis, either above or below the center of the screen, at a random position. Similarly, the proximal CS was

35

always presented in a random position along the horizontal axis, inside one of two possible squares positioned to

36

the left or the right of the screen. As an attention check, patients were asked to perform a button press whenever

37

they saw a CSp inside one of the squares, reporting which of the two squares it was. To ensure this remained a

38

purely Pavlovian paradigm, patients were instructed that these button presses did not affect the outcome of the

39

trial in any way.

40

Each session contained a total of 96 trials, split into 4 blocks of 24 trials, with a two minute break between

41

blocks. Within each block, there were 4 possible CS, 2 distal (A,B) and 2 proximal (X,Y). The identity of the

42

distal stimuli was re-selected in every block, but the two proximal stimuli were kept the same throughout the entire

43

session, though their valences were reversed between blocks. There were four trial types in total (Fig. 1C): two

44

common transitions (A-X-Reward and B-Y-No reward) and two rare transitions (A-Y-No reward and B-X-Reward),

45

meaning that the CSp was always fully predictive of the outcome, while the CSd was only partially predictive. Trial

46

frequencies were

for each of the common types and

for each of the rare types. Every block always had

47

exactly 9 trials of each common type and 3 trials of each rare type. Trial type order was selected randomly, except

48

for the first 4 trials of each block, which we enforced to be of the common type, selected randomly.

49

The 10 fractals used in each session (2 CSd fractals for each block and 2 CSp fractals for the whole session)

50

were chosen specifically for each patient out of a set of 24 possible fractals using the following procedure: before the

51

beginning of the task, patients rated all 24 fractals for their subjective preference using a sliding bar from extremely

52

unpleasant to extremely pleasant. The 10 stimuli which elicited the most neutral responses were selected for the

53

experiment and assigned randomly to their CSd or CSp roles. All the 24 fractals had the same mean luminance. We

54

enforced this by enforcing equal average RGB values across pixels for all fractals, along the (0,0,0) to (255,255,255)

55

spectrum. At the end of every block, patients rated the 4 fractals they experienced for how pleasant they were.

56

We z-scored ratings for each patient and used aggregate stimulus rating changes across patients as a metric of

57

Pavlovian conditioning.

58

1.4 Eye tracking

59

We tracked patients’ pupil diameter during the task as a candidate conditioning metric Pauli et al. (2015, 2019),

60

Pool et al. (2019). We used a EyeLink 1000 system (SR Research Ltd.) attached to the bottom of the task

61

screen as described in Minxha et al. (2017), Wang et al. (2019). We calibrated the system using EyeLink’s five

62

point calibration before the session and between blocks (average calibration error =

◦

). Patients were

63

instructed to fixate during fixation periods but were free to gaze anywhere in the screen otherwise. Pupil data was

64

preprocessed to remove blinks and outlier points further than 5 s.d. from the mean diameter. We interpolated

65

missing values removed in this way with the closest previous value and then filtered data with a 50ms moving

66

average window. Pupil diameters were normalized in every trial relative to the initial 1s fixation period, using the

67

average diameter in that period as a baseline. Statistical analyses were then performed using average diameter

68

changes in the 0.5s-3s time windows after CSd and CSp presentations.

69

1.5 Computational model of learning

70

We used a normative model-based model to obtain estimates of how patients encoded transition probabilities

71

between task states, stimulus expected values, and state prediction errors. We adapted a model used for a sequential

72

instrumental task Gläscher et al. (2010), to estimate a matrix

(

s,s

′

)

for the transition probabilities from start

73

states

to end states

′

. Given distal states (A,B), proximal states (X,Y), and outcome states (reward, R; no

74

reward, N), we defined the transition matrix T with transition probabilities

′

as:

75







0 0

0 0 0

Y R

Y N

0 0 0







(1)

The

values were initialized to 0.5 and the remaining values in the matrix were chosen to be 0, reflecting the

76

constraints of the task’s transition structure. At each step, after transitioning from some state

to an end state

77

′

, a state prediction error is computed according to the following equation:

78

SP E

= 1

−

(

s,s

′

)

(2)

The transition matrix

is then updated using a learning rate

, chosen normatively to be 0.22, adopting a

79

value from a previously validated model-free model Pauli et al. (2015).

80

new

(

s,s

′

) =

(

s,s

′

) +

ηδ

SP E

(3)

All the other values in T for states not transitioned into were also updated with

new

(

s,s

′′

) =

(

s,s

′′

)(1

−

)

81

to ensure each the sum of each row of T stays equal to 1, as it should reflect total probability.

82

Finally, the value of each state could be computed assuming the values of arriving into the reward and no reward

83

states were 1 and 0, respectively:

84

Y R

(4)

Y R

(5)

(6)

Y R

(7)

Since we were interested in tracking neural responses to stimulus identity associations, we used the inferred tran-

85

sition matrix T to determine the identity of the most likely proximal stimulus to follow distal stimulus presentation.

86

We refer to this identity as

CSp presumed identity

87

We chose a normative model-based learning mechanism for analyzing the data because this model allowed us

88

to estimate the presumed identity of subsequent stimuli in the associative chain, thereby allowing us to probe

89

stimulus-stimulus mappings in the task. It is important to note that a model-free temporal difference (TD) model

90

(cite Sutton) could be used instead to model this data, and that such a model generates highly similar time-series

91

for value signals as we used in the behavioral and neural analyses (though it would not allow us to make predictions

92

about stimulus identities). Using such a TD-like model, a signed reward prediction error can be estimated, and the

93

absolute value (unsigned) of this error signal also strongly correlates with SPE signals used in the analysis. While

94

we implemented such a TD model alongside the MB model and used it to analyze the data, we do not report on

95

the results of this analysisbecause it produces highly similar results about value encoding as reported using thr

96

existing MB model, and because this TD model does not allow us to test for stimulus identity representations.

97

1.6 Neural data pre-processing

98

We performed spike detection and sorting with the semiautomatic template-matching algorithm OSort Rutishauser

99

et al. (2006). Across all 13 sessions, we obtained 86 vmPFC, 165 amygdala, 119 hippocampus, 137 preSMA and

100

103 dACC putative single units (610 total). We refer to these isolated putative single units as “neuron” and “cell”

101

interchangeably.

102

1.7 Single neuron encoding analysis

103

To quantify how single neuron activity correlated with several variables of interest, we performed a Poisson general-

104

ized linear model (GLM) analysis. As a dependent variable, we measured spike counts in three time windows: CSd

105

window, from 0.25s to 3s after CSd presentation; CSp window, from 0.25s to 3s after CSp presentation; outcome

106

window, from 0.25s to 2.25s after outcome presentation. Table 1 includes all dependent variables and the time

107

windows in which they were tested. We defined RPEs as the difference between proximal EV at the time of outcome

108

and the outcome valence itself.

109

Dependent variable

Time window

CSp presumed identity, CSd EV

CSd

CSp identity, CSp EV, SPE

CSp

Outcome, SPE, RPE

Outcome

Table 1: Dependent variables and respective time windows for Poisson GLM and population decoding analyses.

1.8 Jaccard index test

110

After performing Poisson GLM encoding analyses, we tested whether the sub-populations of neurons which were

111

sensitive to two variables of interest had significant overlap. For this, we computed the Jaccard index Jaccard

112

(1912) of overlap between neurons sensitive to each of the variables

and

, where

and

indicate the

113

number of neurons sensitive to the variables X and Y, respectively, and

X,Y

indicates the number of neurons

114

concurrently sensitive to both variables:

115

X,Y

−

X,Y

(8)

To compute p-values for each comparison between two variables, we bootstrapped a null distribution of Jaccard

116

indexes using 1000 reshuffles, considering X and Y are independent variables with a false positive rate of

= 0

117

1.9 Population decoding analysis

118

We performed population decoding analyses by training a linear support vector machine (SVM) with MATLAB’s

119

function

fitcsvm

. In each session, we defined a population activity matrix X of dimensions

(

nTrials,nNeurons

)

120

counting spikes in each trial within the same time windows from the encoding analysis. We then defined the decoded

121

variable

as the same dependent variables from Table 1. To reduce the decoding problem to a classification task,

122

we binned the continuous regressors (EV, SPE) into 3 tertiles before training the SVM with MATLAB’s multiclass

123

function

fitcecoc

. Cross-validation was performed by training on 2 trial blocks and testing on the 2 remaining trial

124

blocks of each session. Since there were 4 blocks in total, we repeated the procedure for every possible combination

125

of train/test blocks, resulting in 6 cross-validation folds. Test accuracies were averaged across folds and reported

126

separately for each session. To obtain test accuracy significance levels, we repeated this entire procedure 500 times

127

while shuffling the decoded variable

and compare the null mean test accuracy obtained in this manner with the

128

true test accuracy, for each brain region. Finally, we corrected significance thresholds for the number of tested

129

brain areas. For this analysis, we excluded one session, which contained only 1 block of trials.

130

1.10 Cross-correlation analysis

131

To measure how neural activity across brain areas of interest was correlated, we performed a spike-spike cross-

132

correlation analysis, with shuffle correction Brody (1999). First, we divided all trials in EV tertiles, excluding the

133

middle tertile to obtain low EV and high EV trial groups. Then, for each trial group, we computed cross-correlations

134

for each neuron pair containing neurons A and B recorded from the same session, but different brain areas (e.g.

135

each neuron pair contained one vmPFC neuron vs. one amygdala neuron).

136

For two spike trains

(binned at 50ms with 5ms steps, constrained to CSp presentation time window),

137

recorded from neurons A and B on trial

, we define the cross-correlogram of each trial as:

138

(

) =

∞

−∞

(

)

(

) :=

⊗

(9)

Defining the

operator as averaging across trials, we defined the shuffle-corrected cross-correlogram as:

139

(

−

< S

)

⊗

(

−

< S

)

(10)

< S

⊗

−

< S

⊗

< S

(11)

The shuffle-correction procedure corrects for time-locked co-variation that might be caused to both neurons

140

concurrently by stimulus presentation. Additionally, it ensures that the expected value of

is 0 if

and

are

141

independent.

142

Finally, the reported shuffle-corrected correlation results were obtained by averaging the

vectors across all

143

neuron pairs. To summarize the correlation results, we obtained the integrals from the positive and negative time

144

lag regions and performed an ANOVA with EV level (low vs. high) and time lag sign (which region leads) as

145

factors.

146

2 Pavlovian outcome coding

147

At the time of the outcome, we found a significant proportion of neurons correlated with outcome (reward vs.

148

neutral) in dACC (

p <

, binomial test, Supplementary Fig. 2A). An example dACC neuron which

149

coded outcomes is shown in Supplementary Fig. 2B. Additionally, for outcome encoding dACC neurons, we tested

150

when normalized firing rates differed for preferred versus non-preferred outcomes, and found that they first differed

151

1.08s after outcome onset (

p <

, t-test, uncorrected). Out of all significant outcome dACC neurons,

had

152

negative t-scores, indicating they fired less in trials with positive outcomes. This is more than expected by chance

153

assuming an equal probability of positive and negative t-scores (

= 0

003

, binomial test). At the population

154

level, we found significant decoding of outcomes in dACC (

p <

001

, permutation test,Supplementary Fig. 2C).

155

However, above and beyond the single neuron encoding results, outcome was also decodable from the hippocampus

156

(

p <

, permutation test) and preSMA (

p <

, permutation test), highlighting a widespread population-

157

level representation of outcome valence. One possible interpretation is that weakly coding neurons provided a

158

distributed code for discriminating between positive and neutral outcomes when considered jointly (Stefanini et al.

159

2020). These results are consistent with previous findings on tracking feedback in medial frontal cortex neurons

160

(Fu et al. 2019, Aquino et al. 2021).

161

Supplementary Figures

162

vmPFC

AMY

HIP

dACC

preSMA

0.2

0.3

0.4

0.5

0.6

Decoding accuracy

Decoding uPE (at proximal time)

n.s.

vmPFC

AMY

HIP

dACC

preSMA

-2

t-score

uPE coding (at proximal time)

3.45%

6.63%

7.09%

1.79%

4.38%

n.s.

Figure 1: Unsigned prediction error encoding at proximal time. (a) Decoding accuracy for uPE during proximal stimulus

presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped

null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. (b)

T-scores for every neuron in each brain area, for a GLM predicting spike counts during proximal stimulus presentation with

uPE as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected

across areas.

Time from outcome (s)

100

Trial nr. (reordered)

Time from outcome (s)

Firing rate (Hz)

No reward

Reward

dACC outcome neuron

signicant

n.s.

preferred

non-preferred

vmPFC

AMY

HIP

dACC

preSMA

-5

t-score

Outcome coding (at outcome)

5.75%

4.82%

6.3%

10.7%

7.25%

0.5

1.5

2.5

Time from outcome (s)

0.2

0.4

0.6

0.8

Normalized ring rate

dACC outcome neurons

vmPFC

AMY

HIP

dACC

preSMA

0.4

0.6

0.8

Decoding accuracy

Decoding outcome (at outcome time)

n.s.

***

Figure 2: Outcome encoding in dACC. (a) T-scores for every neuron in each brain area, for a GLM predicting spike counts

during outcome presentation with outcome as a regressor. Red dots indicate significant neurons and stars indicate significance

across the entire region, corrected across areas. (b) dACC neuron whose activity during correlates with outcome (blue: no

reward; black: reward). Top: raster plot; Bottom: PSTH. (c) Decoding accuracy for outcome during outcome presentation.

Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution,

corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. (d) Time course of

normalized firing rates in dACC neurons which coded outcomes. Trials are separated between preferred (black) versus non-

preferred (blue) outcomes, and red dots indicate time points with a significant difference between trajectories.