Putting Bandits Into Context: How Function Learning Supports Decision Making

Schulz, E; Konstantinidis, E; Speekenbrink, M

doi:10.1037/xlm0000463

Local TagsRelease HistoryDetailsSummary

Putting Bandits Into Context: How Function Learning Supports Decision Making

Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2018). Putting Bandits Into Context: How Function Learning Supports Decision Making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(6), 927-943. doi:10.1037/xlm0000463.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0006-B3FB-5 Version Permalink: https://hdl.handle.net/21.11116/0000-0006-B3FD-3

Genre: Journal Article

Files

show Files

Locators

show

hide

Locator:
https://psycnet.apa.org/fulltext/2017-50771-001.pdf (Publisher version) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Schulz, E¹, Author
Konstantinidis, E, Author
Speekenbrink, M, Author

Affiliations:
1External Organizations, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: The authors introduce the contextual multi-armed bandit task as a framework to investigate learning and decision making in uncertain environments. In this novel paradigm, participants repeatedly choose between multiple options in order to maximize their rewards. The options are described by a number of contextual features which are predictive of the rewards through initially unknown functions. From their experience with choosing options and observing the consequences of their decisions, participants can learn about the functional relation between contexts and rewards and improve their decision strategy over time. In three experiments, the authors explore participants' behavior in such learning environments. They predict participants' behavior by context-blind (mean-tracking, Kalman filter) and contextual (Gaussian process and linear regression) learning approaches combined with different choice strategies. Participants are mostly able to learn about the context-reward functions and their behavior is best described by a Gaussian process learning strategy which generalizes previous experience to similar instances. In a relatively simple task with binary features, they seem to combine this learning with a probability of improvement decision strategy which focuses on alternatives that are expected to lead to an improvement upon a current favorite option. In a task with continuous features that are linearly related to the rewards, participants seem to more explicitly balance exploration and exploitation. Finally, in a difficult learning environment where the relation between features and rewards is nonlinear, some participants are again well-described by a Gaussian process learning strategy, whereas others revert to context-blind strategies.

Details

show

hide

Language(s):

Dates: Date issued: 2018-06

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1037/xlm0000463

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Journal of Experimental Psychology: Learning, Memory, and Cognition

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Washington, D.C. : American Psychological Association (PsycARTICLES)

Pages: - Volume / Issue: 44 (6) Sequence Number: - Start / End Page: 927 - 943 Identifier: ISSN: 0278-7393
CoNE: https://pure.mpg.de/cone/journals/resource/954927606766