The Human Brain Computes Two Different Prediction Errors

Glascher, J; Daw, N; Dayan, P; O’Doherty, J

doi:10.3389/conf.neuro.06.2009.03.270

Local TagsRelease HistoryDetailsSummary

The Human Brain Computes Two Different Prediction Errors

Glascher, J., Daw, N., Dayan, P., & O’Doherty, J. (2009). The Human Brain Computes Two Different Prediction Errors. Poster presented at Computational and Systems Neuroscience Meeting (COSYNE 2009), Salt Lake City, UT, USA. doi:10.3389/conf.neuro.06.2009.03.270.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0005-0E80-B Version Permalink: https://hdl.handle.net/21.11116/0000-0005-0E81-A

Genre: Poster

Files

show Files

Locators

show

hide

Locator:
https://www.frontiersin.org/10.3389/conf.neuro.06.2009.03.270/event_abstract (Publisher version) Open Access status unknown

Description:
-

OA-Status:

Creators

show

hide

Creators:
Glascher, J, Author
Daw, N, Author
Dayan, P¹, Author
O’Doherty, J, Author

Affiliations:
1External Organizations, ou_persistent22

Content

show

hide

Free keywords: -

Abstract: Reinforcement learning (RL) provides a framework involving two diverse approaches to reward-based decision making: model-free RL assesses candidate actions by directly learning their expected long-term reward consequences using a reward prediction error (RPE), whereas model-based RL uses experience with the sequential occurrence of situations (‘states’) to build a model of the state transition and outcome structure of the environment and then searches forward in it to evaluate actions. This latter, model-based approach requires a state prediction error (SPE), which trains predictions about the transitions between different states in the world rather than about sum future rewards.

Eighteen human subjects performed a probabilistic Markov decision task while being scanned with functional magnetic resonance imaging. The task required subjects to make two sequential choices, the first leading them probabilistically into an intermediary state, and the second into one of three outcome states enjoying different reward magnitudes. In an attempt to dissociate model-based from model-free RL completely, we exposed our subjects in the first scanning session just to transitions in the state space in complete absence of rewards or free choices. This permits a pure assessment of the model-building aspects of model-based RL, because model-free RL cannot learn about future expected rewards in their absence, and the RPE is therefore nil. Prior to the second, free-choice session subjects were exposed to the rewards that would be available at the outcome state. Our subjects demonstrated the essential model-based RL competence of combining the information about the structure of the state space with the reward information, by making more optimal choices at the beginning of the free-choice session than would have been expected by chance (p<0.05, sign test, one-tailed).

In order to assess the neural signatures of these two error signals, we formalized the computational approaches as trial-by-trial mathematical models and determined free parameters by fitting the model to behavioral choices. An RPE and an SPE were derived from a model-free SARSA learner and a model-based FORWARD (model) learner respectively. Choices from the FORWARD learner were computed using dynamic programming. A combined model was derived by weighting the choice preferences of SARSA and FORWARD; the relative influence of the latter was found to decrease over trials. The trial-by-trial reward and state error signals derived from the two model components were included in the analysis of the imaging data in order to seek their correlations with neural signals.

We found evidence of a neural state prediction error in addition to the previously well characterized RPE. The SPE was present bilaterally in intraparietal sulcus (IPS) and lateral prefrontal cortex (latPFC), and was clearly dissociable from the RPE located predominantly in ventral striatum (all regions p<0.05, whole-brain correction). Importantly, the left latPFC and right IPS also correlated with the SPE during the non-rewarded first session, underlining their importance in pure state space learning. These findings provide evidence for the existence of two unique forms of learning signals in humans, which may form the basis of distinct computational strategies for guiding behavior.

Details

show

hide

Language(s):

Dates: Published Online: 2009-01

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.3389/conf.neuro.06.2009.03.270

Degree: -

Event

show

hide

Title: Computational and Systems Neuroscience Meeting (COSYNE 2009)

Place of Event: Salt Lake City, UT, USA

Start-/End Date: 2009-02-26 - 2009-03-03

Legal Case

show

Project information

show

Source 1

show

hide

Title: Frontiers in Systems Neuroscience

Abbreviation : Front Syst Neurosci

Source Genre: Journal

Creator(s):

Affiliations:

Publ. Info: Lausanne, Switzerland : Frontiers Research Foundation

Pages: - Volume / Issue: 2009 (Conference Abstracts: Computational and Systems Neuroscience) Sequence Number: I-12 Start / End Page: 52 - 53 Identifier: ISSN: 1662-5137
CoNE: https://pure.mpg.de/cone/journals/resource/1662-5137