Help Privacy Policy Disclaimer
  Advanced SearchBrowse





The role of prefrontal cortex and basal ganglia in model-based and model-free reinforcement learning

There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Miranda, B., Malalasekera, N., & Dayan, P. (2013). The role of prefrontal cortex and basal ganglia in model-based and model-free reinforcement learning. Poster presented at 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2013), Princeton, NJ, USA.

Cite as: https://hdl.handle.net/21.11116/0000-0004-DAF7-0
Animals can learn to influence their environment either by exploiting stimulus-response associa-
tions that have been productive in the past, or by predicting the likely worth of actions in the future based on
their causal relationships with outcomes. These respectively model-free (MF) and model-based (MB) strate-
gies are supported by structures including midbrain dopaminergic neurons, striatum and prefrontal cortex
(PFC), but it is not clear how they interact to realize these two types of reinforcement learning (RL).
We trained rhesus monkeys to perform a two-stage Markov decision task that induces a combination of MB
and MF behavior. The task starts with a choice between two options. Each of these is more often associated
with one of two second-stage states with probabilities that are fixed throughout the experiment. A second
two-option choice is required in order to obtain one of three different levels of reward. These second-stage
outcomes change independently, according to a random walk, and thus induce exploration.
A descriptive analysis of our behavioral data shows that the immediate reward history (of MF and MB
importance) and the interaction between reward history and the structure of the task (of MB importance) both
significantly influenced stage one choices. On the other hand, only the immediate reward history seemed
to influence reaction time. When we performed a trial-by-trial computational analysis on our data using
different RL algorithms, we found that in the model that best fit the data, choices were made according to a
weighted combination of MF-RL and MB-RL action values (with a weight for MB-RL of 84.3
3.2 %).
Our behavioral findings support a more integrated view of MF and MB learning strategies. They also illu-
minate the way that the vigor of responding relates to average rate of reward delivery. Neurophysiological
recordings are currently being performed in subregions of PFC and the striatum during task performance.