Automatic Task Decomposition using Compositional Reinforcement Learning

Tano, P; Dayan, P; Pouget, A

Lokale TagsFreigabegeschichteDetailsÜbersicht

Automatic Task Decomposition using Compositional Reinforcement Learning

Tano, P., Dayan, P., & Pouget, A. (2022). Automatic Task Decomposition using Compositional Reinforcement Learning. Poster presented at Computational and Systems Neuroscience Meeting (COSYNE 2022), Lisboa, Portugal.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-000A-0340-A Versions-Permalink: https://hdl.handle.net/21.11116/0000-000C-9149-E

Genre: Poster

Dateien

einblenden: Dateien

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://static1.squarespace.com/static/6102ca347474c263c40150cd/t/62325b5f6dbf95289c4472e3/1647467367870/Cosyne2022_program_book.pdf (Zusammenfassung) Open Access Status unbekannt

Beschreibung:
-

OA-Status:
Keine Angabe

Urheber

einblenden:

ausblenden:

Urheber:
Tano, P, Autor
Dayan, P¹, Autor
Pouget, A, Autor

Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Decomposing complex tasks into their simpler components is often the only way for animals to make any meaningful progress at all. We show that reusing the traditional reward prediction error machinery at multiple hierarchical levels allows complex tasks to be automatically decomposed in a compositional manner, leading to fast and flexible reinforcement learning. In this compositional reinforcement learning (CRL) framework, the agent computes a set of predictions for each state in the form of hierarchically organized general value functions (GVFs). Level 0 GVFs predict whether continuing straight along cardinal directions in the state space will lead to a rewarded location; while a level P GVF predicts whether the same simple straight ahead policy leads to any location with a high value in any of the level P-1 GVFs. Learning involves two steps: (1) learning the mapping from state to GVFs and (2) learning the policy from the GVFs. These steps are fast in environments with natural cardinal directions and strong compositional structure. Learning the mapping from states to the GVFs with TD learning is fast because it involves simple policies which have low entropy in their outcomes and are able to efficiently explore the state space; while learning the mapping from GVFs to policy is greatly simplified by the compositional structure of the GVFs and the simple mapping from the cardinal directions to available actions. In rapidly changing environments, as is typical for the real world, CRL leads to remarkably fast learning. For instance, CRL vastly outperforms traditional approaches in a maze task in which the maze changes frequently, or when learning to reach for an object, whose location varies over trials, with a robotic arm. This work provides a biologically plausible framework to study task decomposition in animals confronted with rapidly changing environments.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Online veröffentlicht: 2022-03

Publikationsstatus: Online veröffentlicht

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: -

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: Computational and Systems Neuroscience Meeting (COSYNE 2022)

Veranstaltungsort: Lisboa, Portugal

Start-/Enddatum: 2022-03-17 - 2022-03-20

ausblenden:

Titel: Computational and Systems Neuroscience Meeting (COSYNE 2022)

Genre der Quelle: Konferenzband

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: -

Seiten: - Band / Heft: - Artikelnummer: 2-105 Start- / Endseite: 169 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1