English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Detecting and Deterring Manipulation in a Cognitive Hierarchy

Alon, N., Schulz, L., Barnby, J., Rosenschein, J., & Dayan, P. (submitted). Detecting and Deterring Manipulation in a Cognitive Hierarchy.

Item is

Files

show Files

Locators

show
hide
Locator:
https://arxiv.org/pdf/2405.01870 (Any fulltext)
Description:
-
OA-Status:
Not specified

Creators

show
hide
 Creators:
Alon, N1, Author                 
Schulz, L1, Author                 
Barnby, JM, Author
Rosenschein, JS, Author
Dayan, P1, Author                 
Affiliations:
1Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_3017468              

Content

show
hide
Free keywords: -
 Abstract: Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, ℵ-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the ℵ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

Details

show
hide
Language(s):
 Dates: 2024-05
 Publication Status: Submitted
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.48550/arXiv.2405.01870
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show