CaltechAUTHORS
  A Caltech Library Service

Confirmation bias optimizes reward learning

Tarantola, Tor and Folke, Tomas and Boldt, Annika and Pérez, Omar D. and De Martino, Benedetto (2021) Confirmation bias optimizes reward learning. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20210301-125315461

[img] PDF (March 11, 2021) - Submitted Version
Creative Commons Attribution No Derivatives.

852Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210301-125315461

Abstract

Confirmation bias - the tendency to overweight information that matches prior beliefs or choices - has been shown to manifest even in simple reinforcement learning. In line with recent work, we find that participants learned significantly more from choice-confirming outcomes in a reward-learning task. What is less clear is whether asymmetric learning rates somehow benefit the learner. Here, we combine data from human participants and artificial agents to examine how confirmation-biased learning might improve performance by counteracting decisional and environmental noise. We evaluate one potential mechanism for such noise reduction: visual attention - a demonstrated driver of both value-based choice and predictive learning. Surprisingly, visual attention showed the opposite pattern to confirmation bias, as participants were most likely to fixate on "missed opportunities", slightly dampening the effects of the confirmation bias we observed. Several million simulated experiments with artificial agents showed this bias to be a reward-maximizing strategy compared to several alternatives, but only if disconfirming feedback is not completely ignored - a condition that visual attention may help to enforce.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
https://doi.org/10.1101/2021.02.27.433214DOIDiscussion Paper
ORCID:
AuthorORCID
Tarantola, Tor0000-0002-1383-8799
Folke, Tomas0000-0001-6768-8426
Boldt, Annika0000-0002-6913-5099
Pérez, Omar D.0000-0002-4168-5435
De Martino, Benedetto0000-0002-3555-2732
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. Version 1 - March 1, 2021; Version 2 - March 1, 2021; Version 3 - March 2, 2021; Version 4 - March 11, 2021. The authors have declared no competing interest.
Record Number:CaltechAUTHORS:20210301-125315461
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210301-125315461
Official Citation:Confirmation bias optimizes reward learning. Tor Oreste Tarantola, Tomas Folke, Annika Boldt, Omar David Perez, Benedetto De Martino. bioRxiv 2021.02.27.433214; doi: https://doi.org/10.1101/2021.02.27.433214
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:108250
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:01 Mar 2021 21:33
Last Modified:12 Mar 2021 18:37

Repository Staff Only: item control page