Reinforcement Learning with Simple Sequence Priors

Saanum, T; Éltetö, N; Dayan, P; Binz, M; Schulz, E

Lokale TagsFreigabegeschichteDetailsÜbersicht

Reinforcement Learning with Simple Sequence Priors

Saanum, T., Éltetö, N., Dayan, P., Binz, M., & Schulz, E. (2024). Reinforcement Learning with Simple Sequence Priors. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (pp. 61985-62005). Red Hook, NY, USA: Curran.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-000D-3A4B-F Versions-Permalink: https://hdl.handle.net/21.11116/0000-000F-7307-8

Genre: Konferenzbeitrag

Dateien

einblenden: Dateien

Externe Referenzen

einblenden:

ausblenden:

externe Referenz:
https://openreview.net/pdf?id=qxF8Pge6vM (beliebiger Volltext) Open Access Status unbekannt

Beschreibung:
-

OA-Status:
Keine Angabe

Urheber

einblenden:

ausblenden:

Urheber:
Saanum, T¹, Autor
Éltetö, N², Autor
Dayan, P², Autor
Binz, M¹, Autor
Schulz, E¹, Autor

Affiliations:
1Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3189356
2Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2024-05

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: -

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

Veranstaltungsort: New Orleans, LA, USA

Start-/Enddatum: 2023-12-10 - 2023-12-16

ausblenden:

Titel: Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Genre der Quelle: Konferenzband

Urheber:
Oh, A, Herausgeber
Naumann, T, Herausgeber
Globerson, A, Herausgeber
Saenko, K, Herausgeber
Hardt, M, Herausgeber
Levine, S, Herausgeber

Affiliations:
-

Ort, Verlag, Ausgabe: Red Hook, NY, USA : Curran

Seiten: - Band / Heft: - Artikelnummer: 2710 Start- / Endseite: 61985 - 62005 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1