Prospective and retrospective temporal difference learning

Dayan, P

doi:10.1080/09548980902759086

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

学術論文

Prospective and retrospective temporal difference learning

MPS-Authors

There are no MPG-Authors in the publication available

External Resource

https://www.tandfonline.com/doi/full/10.1080/09548980902759086
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Dayan, P. (2009). Prospective and retrospective temporal difference learning. Network: Computation in Neural Systems, 20(1), 32-46. doi:10.1080/09548980902759086.

引用: https://hdl.handle.net/21.11116/0000-0002-CB86-2

要旨

A striking recent finding is that monkeys behave maladaptively in a class of tasks in which they know that reward is going to be systematically delayed. This may be explained by a malign Pavlovian influence arising from states with low predicted values. However, by very carefully analyzing behavioral data from such tasks, La Camera and Richmond (2008) observed the additional important characteristic that subjects perform differently on states in the task that are at equal distances from the future reward, depending on what has happened in the recent past. The authors pointed out that this violates the definition of state value in the standard reinforcement learning models that are ubiquitous as accounts of operant and classical conditioned behavior; they suggested and analyzed an alternative temporal difference (TD) model in which past and future are melded. Here, we show that, in fact, a standard TD model can actually exhibit the same behavior, and that this avoids deleterious consequences for choice. At the heart of the model is the average reward per step, which acts as a baseline for measuring immediate rewards. Relatively subtle changes to this baseline occasioned by the past can markedly influence predictions and thus behavior.