Reinforcement Learning with Simple Sequence Priors

Saanum, T; Éltetö, N; Dayan, P; Binz, M; Schulz, E

Local TagsRelease HistoryDetailsSummary

Reinforcement Learning with Simple Sequence Priors

Saanum, T., Éltetö, N., Dayan, P., Binz, M., & Schulz, E. (2024). Reinforcement Learning with Simple Sequence Priors. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (pp. 61985-62005). Red Hook, NY, USA: Curran.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000D-3A4B-F Version Permalink: https://hdl.handle.net/21.11116/0000-000F-7307-8

Genre: Conference Paper

Files

show Files

Locators

show

hide

Locator:
https://openreview.net/pdf?id=qxF8Pge6vM (Any fulltext) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Saanum, T¹, Author
Éltetö, N², Author
Dayan, P², Author
Binz, M¹, Author
Schulz, E¹, Author

Affiliations:
1Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3189356
2Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468

Content

show

hide

Free keywords: -

Abstract: In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.

Details

show

hide

Language(s):

Dates: Date issued: 2024-05

Publication Status: Issued

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: -

Degree: -

Event

show

hide

Title: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

Place of Event: New Orleans, LA, USA

Start-/End Date: 2023-12-10 - 2023-12-16

Legal Case

show

Project information

show

Source 1

show

hide

Title: Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Source Genre: Proceedings

Creator(s):
Oh, A, Editor
Naumann, T, Editor
Globerson, A, Editor
Saenko, K, Editor
Hardt, M, Editor
Levine, S, Editor

Affiliations:
-

Publ. Info: Red Hook, NY, USA : Curran

Pages: - Volume / Issue: - Sequence Number: 2710 Start / End Page: 61985 - 62005 Identifier: -