Finding structure in multi-armed bandits

Schulz, E; Franklin, NT; Gershman, SJ

doi:10.1016/j.cogpsych.2019.101261

Lokale TagsFreigabegeschichteDetailsÜbersicht

Finding structure in multi-armed bandits

Schulz, E., Franklin, N., & Gershman, S. (2020). Finding structure in multi-armed bandits. Cognitive Psychology, 119: 101261, pp. 1-35. doi:10.1016/j.cogpsych.2019.101261.

Item is Freigegeben

einblenden: alle

Basisdaten

ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0005-D582-7 Versions-Permalink: https://hdl.handle.net/21.11116/0000-0005-D583-6

Genre: Zeitschriftenartikel

Dateien

einblenden: Dateien

Externe Referenzen

ausblenden:

externe Referenz:
https://www.sciencedirect.com/science/article/abs/pii/S0010028519302518 (Verlagsversion) Open Access Status unbekannt

Beschreibung:
-

OA-Status:

Urheber

ausblenden:

Urheber:
Schulz, E¹, Autor
Franklin, NT, Autor
Gershman, SJ, Autor

Affiliations:
1External Organizations, ou_persistent22

Inhalt

ausblenden:

Schlagwörter: -

Zusammenfassung: How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, which require participants to trade off exploration and exploitation. Standard multi-armed bandits assume that each option has an independent reward distribution. However, learning about options independently is unrealistic, since in the real world options often share an underlying structure. We study a class of structured bandit tasks, which we use to probe how generalization guides exploration. In a structured multi-armed bandit, options have a correlation structure dictated by a latent function. We focus on bandits in which rewards are linear functions of an option’s spatial position. Across 5 experiments, we find evidence that participants utilize functional structure to guide their exploration, and also exhibit a learning-to-learn effect across rounds, becoming progressively faster at identifying the latent function. Our experiments rule out several heuristic explanations and show that the same findings obtain with non-linear functions. Comparing several models of learning and decision making, we find that the best model of human behavior in our tasks combines three computational mechanisms: (1) function learning, (2) clustering of reward distributions across rounds, and (3) uncertainty-guided exploration. Our results suggest that human reinforcement learning can utilize latent structure in sophisticated ways to improve efficiency.

Details

ausblenden:

Sprache(n):

Datum: Erschienen: 2020-06

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: DOI: 10.1016/j.cogpsych.2019.101261

Art des Abschluß: -

Quelle 1

ausblenden:

Titel: Cognitive Psychology

Genre der Quelle: Zeitschrift

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: Academic Press

Seiten: - Band / Heft: 119 Artikelnummer: 101261 Start- / Endseite: 1 - 35 Identifikator: ISSN: 0010-0285
CoNE: https://pure.mpg.de/cone/journals/resource/954922645010

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1