A Local Temporal Difference Code for Distributional Reinforcement Learning

Tano, P; Dayan, P; Pouget, A

Local TagsRelease HistoryDetailsSummary

A Local Temporal Difference Code for Distributional Reinforcement Learning

Tano, P., Dayan, P., & Pouget, A. (2021). A Local Temporal Difference Code for Distributional Reinforcement Learning. Poster presented at Computational and Systems Neuroscience Meeting (COSYNE 2021).

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0007-F694-C Version Permalink: https://hdl.handle.net/21.11116/0000-000A-F737-2

Genre: Poster

Files

show Files

Locators

show

hide

Locator:
http://www.cosyne.org/cosyne21/Cosyne2021_program_book.pdf (Abstract) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Tano, P, Author
Dayan, P¹, Author
Pouget, A, Author

Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468

Content

show

hide

Free keywords: -

Abstract: Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times.This increases representational capacity and allows for temporally-flexible computations that immediately adjust to changing horizons or discount factors.

Details

show

hide

Language(s):

Dates: Published Online: 2021-03

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: -

Event

show

hide

Title: Computational and Systems Neuroscience Meeting (COSYNE 2021)

Place of Event: -

Start-/End Date: 2021-03-23 - 2021-03-26

Legal Case

show

Project information

show

Source 1

show

hide

Title: Computational and Systems Neuroscience Meeting (COSYNE 2021)

Source Genre: Proceedings

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: 2-088 Start / End Page: 143 - 144 Identifier: -