Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Conference Paper

Learning and decisions in contextual multi-armed bandit tasks

There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015). Learning and decisions in contextual multi-armed bandit tasks. In D. Noelle, R. Dale, A. Warlaumont, J. Yoshimi, T. Matlock, C. Jennings, et al. (Eds.), 37th Annual Meeting of the Cognitive Science Society (CogSci 2015): Mind, Technology and Society (pp. 2122-2127). Austin, TX, USA: Cognitive Science Society.

Cite as: https://hdl.handle.net/21.11116/0000-0006-B43A-E
Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward associated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation between context and reward and improve their decision strategy. We present two studies on how people behave in CMAB tasks. Within a stationary environment, we find that participants are best described by Thompson Sampling-based Gaussian Process models. In a dynamic CMAB task we again find that participants are best described by probability matching of Gaussian Process expectations. Our findings imply that behavior previously referred to as "irrational" can actually be seen as a well-adapted strategy based on powerful inference algorithms.