Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Meeting Abstract

Compositional generalization in multi-armed bandits


Schulz,  E
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Schulz, E. (2021). Compositional generalization in multi-armed bandits. In Psychologie und Gehirn (PuG 2021) (pp. 99).

Cite as: https://hdl.handle.net/21.11116/0000-0008-9243-7
To what extent do human reward learning and decision-making rely on the ability to represent and generate richly structured relationships between options? We provide evidence that structure learning and the principle of compositionality play crucial roles in human reinforcement learning. In a new multi-armed bandit paradigm, termed the compositionally-structured multi-armed bandit task, we found evidence that participants are able to learn representations of different latent reward structures and combine them to make correct generalizations about options in novel contexts. Moreover, we found substantial evidence that participants transferred knowledge of simpler reward structures, to make informed, compositional generalizations about rewards in complex contexts. We also provide a computational model which is able to generalize and compose knowledge of complex reward structures using a grammar over structures and show how such compositional inductive biases can be learned by meta-reinforcement learning agents.