Models and Methods for Reinforcement Learning

Dayan, P; Nakahara, H

doi:10.1002/9781119170174.epcn513

DetailsSummary

Models and Methods for Reinforcement Learning

Dayan, P., & Nakahara, H. (2018). Models and Methods for Reinforcement Learning. In J. Wixted, & E.-J. Wagenmakers (Eds.), Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience (4., pp. 1-40). Hoboken, NJ, USA: Wiley. doi:10.1002/9781119170174.epcn513.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0007-4A48-6 Version Permalink: https://hdl.handle.net/21.11116/0000-0007-4A4A-4

Genre: Book Chapter

Files

show Files

Locators

show

hide

Locator:
https://onlinelibrary.wiley.com/doi/10.1002/9781119170174.epcn513 (Publisher version) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Dayan, P^{1, 2}, Author
Nakahara, H, Author

Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794

Content

show

hide

Free keywords: -

Abstract: The temporal difference (TD) learning framework is a major paradigm for understanding value-based decision making and related neural activities (e.g., dopamine activity). The representation of time in neural processes modeled by a TD framework, however, is poorly understood. To address this issue, we propose a TD formulation that separates the time of the operator (neural valuation processes), which we refer to as internal time, from the time of the observer (experiment), which we refer to as conventional time. We provide the formulation and theoretical characteristics of this TD model based on internal time, called internal-time TD, and explore the possible consequences of the use of this model in neural value-based decision making. Due to the separation of the two times, internal-time TD computations, such as TD error, are expressed differently, depending on both the time frame and time unit. We examine this operator-observer problem in relation to the time representation used in previous TD models. An internal time TD value function exhibits the co-appearance of exponential and hyperbolic discounting at different delays in intertemporal choice tasks. We further examine the effects of internal time noise on TD error, the dynamic construction of internal time, and the modulation of internal time with the internal time hypothesis of serotonin function. We also relate the internal TD formulation to research on interval timing and subjective time.

Details

show

hide

Language(s):

Dates: Published Online: 2018-03Date issued: 2018

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.1002/9781119170174.epcn513

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience

Source Genre: Book

Creator(s):
Wixted, JT, Editor
Wagenmakers, E-J, Editor

Affiliations:
-

Publ. Info: Hoboken, NJ, USA : Wiley, 4.

Pages: - Volume / Issue: 5: Methodology Sequence Number: - Start / End Page: 1 - 40 Identifier: ISBN: 978-1-119-17016-7