Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  A local temporal difference code for distributional reinforcement learning

Tano, P., Dayan, P., & Pouget, A. (in press). A local temporal difference code for distributional reinforcement learning. In Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020).

Item is

Basisdaten

einblenden: ausblenden:
Genre: Konferenzbeitrag

Externe Referenzen

einblenden:
ausblenden:
Beschreibung:
-
OA-Status:

Urheber

einblenden:
ausblenden:
 Urheber:
Tano, P, Autor
Dayan, P1, 2, Autor           
Pouget, A, Autor
Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468              
2Max Planck Institute for Biological Cybernetics, Max Planck Society, Spemannstrasse 38, 72076 Tübingen, DE, ou_1497794              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we propose a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three completely separated dimensions: reward magnitude (related to distributional quantiles), time horizon (related to eligibility traces) and temporal discounting (related to the Laplace transform of future immediate rewards). Besides lending itself to a local learning rule, the decomposition can be exploited by model-based computations, for instance allowing immediate adjustments to changing horizons or discount factors. Finally, we show that our code can be computed linearly from an ensemble of successor representations with multiple temporal discounts which, according to a recent proposal, might be implemented in the hippocampus.

Details

einblenden:
ausblenden:
Sprache(n):
 Datum: 2020-10
 Publikationsstatus: Angenommen
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: -
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Veranstaltungsort: -
Start-/Enddatum: 2020-12-06 - 2020-12-12

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Genre der Quelle: Konferenzband
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: -
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: - Identifikator: -