Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in 
Unknown Environments

Rakhsha, Amin; Zhang, Xuezhou; Zhu, Xiaojin; Singla, Adish

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Forschungspapier

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

MPG-Autoren

/persons/resource/persons216578

Singla, Adish
Group A. Singla, Max Planck Institute for Software Systems, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

arXiv:2102.08492.pdf
(Preprint), 343KB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Rakhsha, A., Zhang, X., Zhu, X., & Singla, A. (2021). Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments. Retrieved from https://arxiv.org/abs/2102.08492.

Zitierlink: https://hdl.handle.net/21.11116/0000-0009-F810-D

Zusammenfassung

We study black-box reward poisoning attacks against reinforcement learning
(RL), in which an adversary aims to manipulate the rewards to mislead a
sequence of RL agents with unknown algorithms to learn a nefarious policy in an
environment unknown to the adversary a priori. That is, our attack makes
minimum assumptions on the prior knowledge of the adversary: it has no initial
knowledge of the environment or the learner, and neither does it observe the
learner's internal mechanism except for its performed actions. We design a
novel black-box attack, U2, that can provably achieve a near-matching
performance to the state-of-the-art white-box attack, demonstrating the
feasibility of reward poisoning even in the most challenging black-box setting.