An inductive bias for slowly changing features in human reinforcement learning

Hedrich, NL; Schulz, E; Hall-McMaster, S; Schuck, NW

doi:10.1101/2024.01.24.576910

Local TagsRelease HistoryDetailsSummary

An inductive bias for slowly changing features in human reinforcement learning

Hedrich, N., Schulz, E., Hall-McMaster, S., & Schuck, N. (submitted). An inductive bias for slowly changing features in human reinforcement learning.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000E-4A5B-A Version Permalink: https://hdl.handle.net/21.11116/0000-000E-4A5C-9

Genre: Preprint

Files

show Files

Locators

show

hide

Locator:
https://www.biorxiv.org/content/10.1101/2024.01.24.576910v1.full.pdf (Any fulltext) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Hedrich, NL, Author
Schulz, E¹, Author
Hall-McMaster, S, Author
Schuck, NW, Author

Affiliations:
1Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3189356

Content

show

hide

Free keywords: -

Abstract: Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 100 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Participants accrued more reward and achieved better generalisation to unseen feature values when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. Participants were also more likely to incorrectly base their choices on the irrelevant feature when it changed slowly versus quickly. These effects were stronger when participants experienced the feature speed before learning about rewards. Modelling this behaviour with a set of four function approximation Kalman filter models that embodied alternative hypotheses about how feature speed could affect learning revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.

Details

show

hide

Language(s):

Dates: Submitted: 2024-01

Publication Status: Submitted

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1101/2024.01.24.576910

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show