Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

A model of hippocampally dependent navigation, using the temporal difference learning rule

There are no MPG-Authors in the publication available
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available

Foster, D., Morris, R., & Dayan, P. (2000). A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus, 10(1), 1-16. doi:10.1002/(SICI)1098-1063(2000)10:1<1:AID-HIPO1>3.0.CO;2-1.

Cite as: https://hdl.handle.net/21.11116/0000-0002-D4EF-2
This paper presents a model of how hippocampal place cells might be used for spatial navigation in two watermaze tasks: the standard reference memory task and a delayed matching-to-place task. In the reference memory task, the escape platform occupies a single location and rats gradually learn relatively direct paths to the goal over the course of days, in each of which they perform a fixed number of trials. In the delayed matching-to-place task, the escape platform occupies a novel location on each day, and rats gradually acquire one-trial learning, i.e., direct paths on the second trial of each day. The model uses a local, incremental, and statistically efficient connectionist algorithm called temporal difference learning in two distinct components. The first is a reinforcement-based "actor-critic" network that is a general model of classical and instrumental conditioning. In this case, it is applied to navigation, using place cells to provide information about state. By itself, the actor-critic can learn the reference memory task, but this learning is inflexible to changes to the platform location. We argue that one-trial learning in the delayed matching-to-place task demands a goal-independent representation of space. This is provided by the second component of the model: a network that uses temporal difference learning and self-motion information to acquire consistent spatial coordinates in the environment. Each component of the model is necessary at a different stage of the task; the actor-critic provides a way of transferring control to the component that performs best. The model successfully captures gradual acquisition in both tasks, and, in particular, the ultimate development of one-trial learning in the delayed matching-to-place task. Place cells report a form of stable, allocentric information that is well-suited to the various kinds of learning in the model.