Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT
  Reinforcement Learning with Simple Sequence Priors

Saanum, T., Éltetö, N., Dayan, P., Binz, M., & Schulz, E. (2024). Reinforcement Learning with Simple Sequence Priors. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (pp. 61985-62005). Red Hook, NY, USA: Curran.

Item is

Basisdaten

einblenden: ausblenden:
Genre: Konferenzbeitrag

Externe Referenzen

einblenden:
ausblenden:
externe Referenz:
https://openreview.net/pdf?id=qxF8Pge6vM (beliebiger Volltext)
Beschreibung:
-
OA-Status:
Keine Angabe

Urheber

einblenden:
ausblenden:
 Urheber:
Saanum, T1, Autor           
Éltetö, N2, Autor                 
Dayan, P2, Autor                 
Binz, M1, Autor                 
Schulz, E1, Autor           
Affiliations:
1Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3189356              
2Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.

Details

einblenden:
ausblenden:
Sprache(n):
 Datum: 2024-05
 Publikationsstatus: Erschienen
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: -
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Veranstaltungsort: New Orleans, LA, USA
Start-/Enddatum: 2023-12-10 - 2023-12-16

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Genre der Quelle: Konferenzband
 Urheber:
Oh, A, Herausgeber
Naumann, T, Herausgeber
Globerson, A, Herausgeber
Saenko, K, Herausgeber
Hardt, M, Herausgeber
Levine, S, Herausgeber
Affiliations:
-
Ort, Verlag, Ausgabe: Red Hook, NY, USA : Curran
Seiten: - Band / Heft: - Artikelnummer: 2710 Start- / Endseite: 61985 - 62005 Identifikator: -