Compositional generalization in multi-armed bandits

Saanum, T; Schulz, E; Speekenbrink, M

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Compositional generalization in multi-armed bandits

MPS-Authors

/persons/resource/persons263619

Saanum, T
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons139782

Schulz, E
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://cognitivesciencesociety.org/wp-content/uploads/2021/07/CSSPosterListingJuly15.pdf
(Abstract)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Saanum, T., Schulz, E., & Speekenbrink, M. (2021). Compositional generalization in multi-armed bandits. In 43rd Annual Conference of the Cognitive Science Society (CogSci 2021): Workshop 2 Using Games to Understand Intelligence (pp. 1320-1326). Red Hook, NY, USA: Curran.

Cite as: https://hdl.handle.net/21.11116/0000-0007-FF3C-8

Abstract

To what extent do human reward learning and decision-making
rely on the ability to represent and generate richly structured
relationships between options? We provide evidence that
structure learning and the principle of compositionality play
crucial roles in human reinforcement learning. In a new multiarmed
bandit paradigm, we found evidence that participants
are able to learn representations of different reward structures
and combine them to make correct generalizations about options
in novel contexts. Moreover, we found substantial evidence
that participants transferred knowledge of simpler reward
structures to make compositional generalizations about
rewards in complex contexts. This allowed participants to
accumulate more rewards earlier, and to explore less whenever
such knowledge transfer was possible. We also provide
a computational model which is able to generalize and compose
knowledge for complex reward structures. This model describes
participant behaviour in the compositional generalization
task better than various other models of decision-making
and transfer learning.