Help Privacy Policy Disclaimer
  Advanced SearchBrowse





The Human Brain Computes Two Different Prediction Errors

There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Glascher, J., Daw, N., Dayan, P., & O’Doherty, J. (2009). The Human Brain Computes Two Different Prediction Errors. Poster presented at Computational and Systems Neuroscience Meeting (COSYNE 2009), Salt Lake City, UT, USA. doi:10.3389/conf.neuro.06.2009.03.270.

Cite as: https://hdl.handle.net/21.11116/0000-0005-0E80-B
Reinforcement learning (RL) provides a framework involving two diverse approaches to reward-based decision making: model-free RL assesses candidate actions by directly learning their expected long-term reward consequences using a reward prediction error (RPE), whereas model-based RL uses experience with the sequential occurrence of situations (‘states’) to build a model of the state transition and outcome structure of the environment and then searches forward in it to evaluate actions. This latter, model-based approach requires a state prediction error (SPE), which trains predictions about the transitions between different states in the world rather than about sum future rewards.

Eighteen human subjects performed a probabilistic Markov decision task while being scanned with functional magnetic resonance imaging. The task required subjects to make two sequential choices, the first leading them probabilistically into an intermediary state, and the second into one of three outcome states enjoying different reward magnitudes. In an attempt to dissociate model-based from model-free RL completely, we exposed our subjects in the first scanning session just to transitions in the state space in complete absence of rewards or free choices. This permits a pure assessment of the model-building aspects of model-based RL, because model-free RL cannot learn about future expected rewards in their absence, and the RPE is therefore nil. Prior to the second, free-choice session subjects were exposed to the rewards that would be available at the outcome state. Our subjects demonstrated the essential model-based RL competence of combining the information about the structure of the state space with the reward information, by making more optimal choices at the beginning of the free-choice session than would have been expected by chance (p<0.05, sign test, one-tailed).

In order to assess the neural signatures of these two error signals, we formalized the computational approaches as trial-by-trial mathematical models and determined free parameters by fitting the model to behavioral choices. An RPE and an SPE were derived from a model-free SARSA learner and a model-based FORWARD (model) learner respectively. Choices from the FORWARD learner were computed using dynamic programming. A combined model was derived by weighting the choice preferences of SARSA and FORWARD; the relative influence of the latter was found to decrease over trials. The trial-by-trial reward and state error signals derived from the two model components were included in the analysis of the imaging data in order to seek their correlations with neural signals.

We found evidence of a neural state prediction error in addition to the previously well characterized RPE. The SPE was present bilaterally in intraparietal sulcus (IPS) and lateral prefrontal cortex (latPFC), and was clearly dissociable from the RPE located predominantly in ventral striatum (all regions p<0.05, whole-brain correction). Importantly, the left latPFC and right IPS also correlated with the SPE during the non-rewarded first session, underlining their importance in pure state space learning. These findings provide evidence for the existence of two unique forms of learning signals in humans, which may form the basis of distinct computational strategies for guiding behavior.