English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Poster

Rewards and uncertainty jointly drive the attention dynamics in reinforcement learning

MPS-Authors
/persons/resource/persons217460

Dayan,  P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Stojic, H., Orquin, J., Dayan, P., Dolan, R., & Speekenbrink, M. (2019). Rewards and uncertainty jointly drive the attention dynamics in reinforcement learning. Poster presented at Ninth International Symposium on Biology of Decision Making (SBDM 2019), Oxford, UK.


Cite as: https://hdl.handle.net/21.11116/0000-0004-DA26-C
Abstract
Aim: The nature of attention, and how it interacts with learning and choice processes in the
context of reinforcement learning, is still unclear. Probabilistic accounts of associative
learning, as well as approximately optimal solutions of the exploration-exploitation dilemma,
suggest that both learned value and uncertainty about those values (i.e. reducible or
estimation uncertainty) are important for learning and choice. This implies that both factors
should jointly guide attention. Our main goal was to test this prediction. Our secondary goal
was to examine whether the relation between attention and reinforcement learning is
bidirectional, whether attention also influences or biases what we learn and how we choose.
There are some tests of this direction of influence; however, the role of estimation uncertainty
has not previously been addressed.
Method: Participants (N=36) completed two games in which they repeatedly chose between
six options. Each game was a multi-armed bandit task where rewards for each option were
drawn from Gaussian distributions, differing in both their means and variances. The
participants' goal was to maximize the cumulative sum of rewards in each game. To do this,
they needed to explore the options in the choice set in order to learn which option had the
highest average reward, and subsequently exploit this knowledge. We monitored participants'
attention using eye tracking while they performed the tasks, operationalizing attention as the
proportion of time spent fixating on each of the options before making a choice.
Results: We relied on computational modeling to garner evidence for our two questions. To
address our main question, we modeled attention with a combination of a Bayesian (Kalman
filter) learning component and two types of choice rules: one that relies only on learned value (softmax) and one that additionally uses estimation uncertainty to assign an "exploration
bonus" to the options (upper confidence bound rule). Model evidence showed that Kalman
filter learning with the exploration bonus described overt attention best, providing evidence
that trial-by-trial learned values and estimation uncertainty jointly guide visual attention. For
our secondary question, we used the same models to model choices, but allowing measured
attention to affect the choice process by increasing the probability of choosing attended
options and decreasing it for unattended options. Attention was also allowed to modulate the
magnitude of updates in the learning process. Again, we found that Kalman filter learning with
exploration bonus was the best model, showing that estimation uncertainty plays an
independent role in determining choice, over and above its effect on attention.
Conclusions: In summary, the interaction between attention, learning, and decision making,
extends further than previously found. Our results provide support for probabilistic associative
learning accounts that ground attention in efficient computations rather than constraints, and
establish a relation with approximately optimal resolutions of the exploration-exploitation trade-off.