Modeling Human Exploration Through Resource-Rational Reinforcement Learning

Binz, M; Schulz, E

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

会議論文

Modeling Human Exploration Through Resource-Rational Reinforcement Learning

MPS-Authors

/persons/resource/persons256660

Binz, M
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons139782

Schulz, E
Research Group Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://papers.nips.cc/paper_files/paper/2022/file/cde542f47c67907e170a1e1a7b32f6ad-Paper-Conference.pdf
(出版社版)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Binz, M., & Schulz, E. (2023). Modeling Human Exploration Through Resource-Rational Reinforcement Learning. In S., Koyejo, & S., Mohamed (Eds.), Advances in Neural Information Processing Systems 35: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) (pp. 31755-31768). Red Hook, NY, USA: Curran.

引用: https://hdl.handle.net/21.11116/0000-000D-ADFC-5

要旨

Equipping artificial agents with useful exploration mechanisms remains a challenge to this day. Humans, on the other hand, seem to manage the trade-off between exploration and exploitation effortlessly. In the present article, we put forward the hypothesis that they accomplish this by making optimal use of limited computational resources. We study this hypothesis by meta-learning reinforcement learning algorithms that sacrifice performance for a shorter description length (defined as the number of bits required to implement the given algorithm). The emerging class of models captures human exploration behavior better than previously considered approaches, such as Boltzmann exploration, upper confidence bound algorithms, and Thompson sampling. We additionally demonstrate that changing the description length in our class of models produces the intended effects: reducing description length captures the behavior of brain-lesioned patients while increasing it mirrors cognitive development during adolescence.