日本語
 
Help Privacy Policy ポリシー/免責事項
  詳細検索ブラウズ

アイテム詳細

登録内容を編集ファイル形式で保存
 
 
ダウンロード電子メール
  Models and Methods for Reinforcement Learning

Dayan, P., & Nakahara, H. (2018). Models and Methods for Reinforcement Learning. In J., Wixted, & E.-J., Wagenmakers (Eds.), Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience (4., pp. 1-40). Hoboken, NJ, USA: Wiley. doi:10.1002/9781119170174.epcn513.

Item is

基本情報

表示: 非表示:
アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-0007-4A48-6 版のパーマリンク: https://hdl.handle.net/21.11116/0000-0007-4A4A-4
資料種別: 書籍の一部

ファイル

表示: ファイル

関連URL

表示:
非表示:
説明:
-
OA-Status:
Not specified

作成者

表示:
非表示:
 作成者:
Dayan, P1, 2, 著者           
Nakahara, H, 著者
所属:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468              
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794              

内容説明

表示:
非表示:
キーワード: -
 要旨: The temporal difference (TD) learning framework is a major paradigm for understanding value-based decision making and related neural activities (e.g., dopamine activity). The representation of time in neural processes modeled by a TD framework, however, is poorly understood. To address this issue, we propose a TD formulation that separates the time of the operator (neural valuation processes), which we refer to as internal time, from the time of the observer (experiment), which we refer to as conventional time. We provide the formulation and theoretical characteristics of this TD model based on internal time, called internal-time TD, and explore the possible consequences of the use of this model in neural value-based decision making. Due to the separation of the two times, internal-time TD computations, such as TD error, are expressed differently, depending on both the time frame and time unit. We examine this operator-observer problem in relation to the time representation used in previous TD models. An internal time TD value function exhibits the co-appearance of exponential and hyperbolic discounting at different delays in intertemporal choice tasks. We further examine the effects of internal time noise on TD error, the dynamic construction of internal time, and the modulation of internal time with the internal time hypothesis of serotonin function. We also relate the internal TD formulation to research on interval timing and subjective time.

資料詳細

表示:
非表示:
言語:
 日付: 2018-032018
 出版の状態: 出版
 ページ: -
 出版情報: -
 目次: -
 査読: -
 識別子(DOI, ISBNなど): DOI: 10.1002/9781119170174.epcn513
 学位: -

関連イベント

表示:

訴訟

表示:

Project information

表示:

出版物 1

表示:
非表示:
出版物名: Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience
種別: 書籍
 著者・編者:
Wixted, JT, 編集者
Wagenmakers, E-J, 編集者
所属:
-
出版社, 出版地: Hoboken, NJ, USA : Wiley, 4.
ページ: - 巻号: 5: Methodology 通巻号: - 開始・終了ページ: 1 - 40 識別子(ISBN, ISSN, DOIなど): ISBN: 978-1-119-17016-7