A Caltech Library Service

Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T

Colas, Jaron T. and Dundon, Neil M. and Gerraty, Raphael T. and Saragosa-Harris, Natalie M. and Szymula, Karol P. and Tanwisuth, Koranis and Tyszka, J. Michael and van Geen, Camilla and Ju, Harang and Toga, Arthur W. and Gold, Joshua I. and Bassett, Dani S. and Hartley, Catherine A. and Shohamy, Daphna and Grafton, Scott T. and O'Doherty, John P. (2022) Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Human Brain Mapping . ISSN 1065-9471. doi:10.1002/hbm.25988. (In Press)

[img] PDF - In Press Version
Creative Commons Attribution.

[img] PDF (Tables S1-S25; Figures S1-S7) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


The model-free algorithms of “reinforcement learning” (RL) have gained clout across disciplines, but so too have model-based alternatives. The present study emphasizes other dimensions of this model space in consideration of associative or discriminative generalization across states and actions. This “generalized reinforcement learning” (GRL) model, a frugal extension of RL, parsimoniously retains the single reward-prediction error (RPE), but the scope of learning goes beyond the experienced state and action. Instead, the generalized RPE is efficiently relayed for bidirectional counterfactual updating of value estimates for other representations. Aided by structural information but as an implicit rather than explicit cognitive map, GRL provided the most precise account of human behavior and individual differences in a reversal-learning task with hierarchical structure that encouraged inverse generalization across both states and actions. Reflecting inference that could be true, false (i.e., overgeneralization), or absent (i.e., undergeneralization), state generalization distinguished those who learned well more so than action generalization. With high-resolution high-field fMRI targeting the dopaminergic midbrain, the GRL model's RPE signals (alongside value and decision signals) were localized within not only the striatum but also the substantia nigra and the ventral tegmental area, including specific effects of generalization that also extend to the hippocampus. Factoring in generalization as a multidimensional process in value-based learning, these findings shed light on complexities that, while challenging classic RL, can still be resolved within the bounds of its core computations.

Item Type:Article
Related URLs:
URLURL TypeDescription ItemData
Colas, Jaron T.0000-0003-1872-7614
Dundon, Neil M.0000-0001-6246-1775
Gerraty, Raphael T.0000-0001-9782-1005
Saragosa-Harris, Natalie M.0000-0002-4493-6113
Szymula, Karol P.0000-0003-1822-0688
Tanwisuth, Koranis0000-0003-3563-6781
Tyszka, J. Michael0000-0001-9342-9014
van Geen, Camilla0000-0002-4948-5550
Ju, Harang0000-0003-1904-1753
Toga, Arthur W.0000-0001-7902-3755
Gold, Joshua I.0000-0002-6018-0483
Bassett, Dani S.0000-0002-6183-4493
Hartley, Catherine A.0000-0003-0177-7295
Shohamy, Daphna0000-0003-4239-4960
Grafton, Scott T.0000-0003-4015-3151
O'Doherty, John P.0000-0003-0016-3531
Additional Information:2022 The Authors. Human Brain Mapping published by Wiley Periodicals LLC. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Received: 19 January 2022. Revised: 20 May 2022. Accepted: 10 June 2022. Scott T. Grafton and John P. O'Doherty are co-senior authors. This study originated at a workshop on “Learning in Networks” supported by the National Institute for Mathematical and Biological Synthesis. STG was supported by the Institute for Collaborative Biotechnologies under Cooperative Agreement W911NF-19-2-0026 and grant W911NF-16-1-0474 from the Army Research Office. JPOD was supported by National Institute on Drug Abuse grant R01 DA040011 and the National Institute of Mental Health's Caltech Conte Center for Social Decision Making (P50 MH094258). JMT was supported by National Institute of Mental Health grant P50 MH094258. AWT was supported by National Institute of Biomedical Imaging and Bioengineering grant P41 EB015922. JIG was supported by National Institute of Mental Health grant R01 MH115557. DSB was supported by Army Research Office grants W911NF-16-1-0474 and W911NF-18-1-0244. CAH was supported by the Klingenstein-Simons Neuroscience Fellowship. DATA AVAILABILITY STATEMENT. Data are available at
Group:Tianqiao and Chrissy Chen Institute for Neuroscience
Funding AgencyGrant Number
Army Research Office (ARO)W911NF-19-2-0026
Army Research Office (ARO)W911NF-16-1-0474
NIHR01 DA040011
NIHP50 MH094258
NIHP41 EB015922
NIHR01 MH115557
Army Research Office (ARO)W911NF-18-1-0244
Klingenstein-Simons Neuroscience FellowshipUNSPECIFIED
Subject Keywords:cognitive map; counterfactual learning; dopaminergic midbrain; generalization; hippocampus; individual differences; model-free and model-based; multifield fMRI; reinforcement learning; striatum
Record Number:CaltechAUTHORS:20220726-997438000
Persistent URL:
Official Citation:Colas, J. T., Dundon, N. M., Gerraty, R. T., Saragosa-Harris, N. M., Szymula, K. P., Tanwisuth, K., Tyszka, J. M., van Geen, C., Ju, H., Toga, A. W., Gold, J. I., Bassett, D. S., Hartley, C. A., Shohamy, D., Grafton, S. T., & O'Doherty, J. P. (2022). Reinforcement learning with associative or discriminative generalization across states and actions: fMRI at 3 T and 7 T. Human Brain Mapping, 1– 41.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115853
Deposited By: George Porter
Deposited On:27 Jul 2022 21:54
Last Modified:12 Aug 2022 15:34

Repository Staff Only: item control page