Detecting and Deterring Manipulation in a Cognitive Hierarchy

Alon, N; Schulz, L; Barnby, JM; Rosenschein, JS; Dayan, P

doi:10.48550/arXiv.2405.01870

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

Preprint

Detecting and Deterring Manipulation in a Cognitive Hierarchy

MPS-Authors

/persons/resource/persons242761

Alon, N
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons241804

Schulz, L
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons217460

Dayan, P
Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://arxiv.org/pdf/2405.01870
(全文テキスト（全般）)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Alon, N., Schulz, L., Barnby, J., Rosenschein, J., & Dayan, P. (submitted). Detecting and Deterring Manipulation in a Cognitive Hierarchy.

引用: https://hdl.handle.net/21.11116/0000-000F-4280-5

要旨

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, ℵ-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the ℵ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.