hide
Free keywords:
-
Abstract:
Acquiring goal-directed behaviors requires us to learn which features of our environment are task-relevant, and which can be ignored. Machine learning research has suggested that meaningful information in the input data is often represented by features that change slowly over time, while fast variations may represent noise. Focusing on slowly changing features of the environment during learning could therefore be a useful bias for humans when selecting task-relevant features, even when the underlying task structure is unknown. To test this idea, we investigated whether humans are better at learning the reward predictiveness of slow vs fast changing features of two-dimensional bandits. We found that subjects accrued more reward during learning and achieved higher accuracy on subsequent test trials when a bandit's relevant feature changed slowly and its irrelevant feature fast, as compared to the opposite. Model fitting with a set of function approximation models that either had a single fixed learning rate or feature speed dependent learning rates showed that participants with a stronger effect adapted their learning rates to the feature coherence. These results provide evidence that human reinforcement learning is sensitive the timescales over which features change, akin to the ‘temporal coherence prior’ in the machine learning literature.