Models and Methods for Reinforcement Learning

Dayan, P; Nakahara, H

doi:10.1002/9781119170174.epcn513

アイテム詳細

登録内容を編集ファイル形式で保存

ダウンロード電子メール

このアイテムの新しいバージョンが利用可能です:
https://pure.mpg.de/pubman/item/item_3260644_4

詳細要約

Models and Methods for Reinforcement Learning

Dayan, P., & Nakahara, H. (2018). Models and Methods for Reinforcement Learning. In J., Wixted, & E.-J., Wagenmakers (Eds.), Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience (4., pp. 1-40). Hoboken, NJ, USA: Wiley. doi:10.1002/9781119170174.epcn513.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-0007-4A48-6 版のパーマリンク: https://hdl.handle.net/21.11116/0000-0007-4A4A-4

資料種別: 書籍の一部

ファイル

表示: ファイル

作成者

表示:

非表示:

作成者:
Dayan, P^{1, 2}, 著者
Nakahara, H, 著者

所属:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

内容説明

表示:

非表示:

キーワード: -

要旨: The temporal difference (TD) learning framework is a major paradigm for understanding value-based decision making and related neural activities (e.g., dopamine activity). The representation of time in neural processes modeled by a TD framework, however, is poorly understood. To address this issue, we propose a TD formulation that separates the time of the operator (neural valuation processes), which we refer to as internal time, from the time of the observer (experiment), which we refer to as conventional time. We provide the formulation and theoretical characteristics of this TD model based on internal time, called internal-time TD, and explore the possible consequences of the use of this model in neural value-based decision making. Due to the separation of the two times, internal-time TD computations, such as TD error, are expressed differently, depending on both the time frame and time unit. We examine this operator-observer problem in relation to the time representation used in previous TD models. An internal time TD value function exhibits the co-appearance of exponential and hyperbolic discounting at different delays in intertemporal choice tasks. We further examine the effects of internal time noise on TD error, the dynamic construction of internal time, and the modulation of internal time with the internal time hypothesis of serotonin function. We also relate the internal TD formulation to research on interval timing and subjective time.

資料詳細

表示:

非表示:

言語:

日付: オンライン出版: 2018-03出版: 2018

出版の状態: 出版

ページ: -

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: DOI: 10.1002/9781119170174.epcn513

学位: -

訴訟

表示:

Project information

表示:

出版物 1

表示:

非表示:

出版物名: Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience

種別: 書籍

著者・編者:
Wixted, JT, 編集者
Wagenmakers, E-J, 編集者

所属:
-

出版社, 出版地: Hoboken, NJ, USA : Wiley, 4.

ページ: - 巻号: 5: Methodology 通巻号: - 開始・終了ページ: 1 - 40 識別子（ISBN, ISSN, DOIなど）: ISBN: 978-1-119-17016-7

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物 1