日本語
 
Help Privacy Policy ポリシー/免責事項
  詳細検索ブラウズ

アイテム詳細


公開

会議論文

Reinforcement Learning with Simple Sequence Priors

MPS-Authors
/persons/resource/persons263619

Saanum,  T
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons242765

Éltetö,  N       
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons217460

Dayan,  P       
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons256660

Binz,  M       
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons139782

Schulz,  E
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://openreview.net/pdf?id=qxF8Pge6vM
(全文テキスト(全般))

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
フルテキスト (公開)
公開されているフルテキストはありません
付随資料 (公開)
There is no public supplementary material available
引用

Saanum, T., Éltetö, N., Dayan, P., Binz, M., & Schulz, E. (2024). Reinforcement Learning with Simple Sequence Priors. In A., Oh, T., Naumann, A., Globerson, K., Saenko, M., Hardt, & S., Levine (Eds.), Advances in Neural Information Processing Systems 36: 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (pp. 61985-62005). Red Hook, NY, USA: Curran.


引用: https://hdl.handle.net/21.11116/0000-000D-3A4B-F
要旨
In reinforcement learning (RL), simplicity is typically quantified on an action-by-action basis -- but this timescale ignores temporal regularities, like repetitions, often present in sequential strategies. We therefore propose an RL algorithm that learns to solve tasks with sequences of actions that are compressible. We explore two possible sources of simple action sequences: Sequences that can be learned by autoregressive models, and sequences that are compressible with off-the-shelf data compression algorithms. Distilling these preferences into sequence priors, we derive a novel information-theoretic objective that incentivizes agents to learn policies that maximize rewards while conforming to these priors. We show that the resulting RL algorithm leads to faster learning, and attains higher returns than state-of-the-art model-free approaches in a series of continuous control tasks from the DeepMind Control Suite. These priors also produce a powerful information-regularized agent that is robust to noisy observations and can perform open-loop control.