Detecting and Deterring Manipulation in a Cognitive Hierarchy

Alon, N; Schulz, L; Barnby, JM; Rosenschein, JS; Dayan, P

doi:10.48550/arXiv.2405.01870

Local TagsRelease HistoryDetailsSummary

Detecting and Deterring Manipulation in a Cognitive Hierarchy

Alon, N., Schulz, L., Barnby, J., Rosenschein, J., & Dayan, P. (submitted). Detecting and Deterring Manipulation in a Cognitive Hierarchy.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-000F-4280-5 Version Permalink: https://hdl.handle.net/21.11116/0000-000F-4281-4

Genre: Preprint

Files

show Files

Locators

show

hide

Locator:
https://arxiv.org/pdf/2405.01870 (Any fulltext) Open Access status unknown

Description:
-

OA-Status:
Not specified

Creators

show

hide

Creators:
Alon, N¹, Author
Schulz, L¹, Author
Barnby, JM, Author
Rosenschein, JS, Author
Dayan, P¹, Author

Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468

Content

show

hide

Free keywords: -

Abstract: Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, ℵ-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the ℵ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

Details

show

hide

Language(s):

Dates: Submitted: 2024-05

Publication Status: Submitted

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: DOI: 10.48550/arXiv.2405.01870

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show