Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in 
Unknown Environments

Rakhsha, Amin; Zhang, Xuezhou; Zhu, Xiaojin; Singla, Adish

Local TagsRelease HistoryDetailsSummary

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Rakhsha, A., Zhang, X., Zhu, X., & Singla, A. (2021). Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments. Retrieved from https://arxiv.org/abs/2102.08492.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0009-F810-D Version Permalink: https://hdl.handle.net/21.11116/0000-000E-7F44-8

Genre: Paper

Files

show Files

hide Files

:

arXiv:2102.08492.pdf (Preprint), 343KB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0009-F812-B

Name:
arXiv:2102.08492.pdf

Description:
File downloaded from arXiv at 2022-02-14 10:32

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Locators

show

Creators

show

hide

Creators:
Rakhsha, Amin¹, Author
Zhang, Xuezhou¹, Author
Zhu, Xiaojin¹, Author
Singla, Adish², Author

Affiliations:
1External Organizations, ou_persistent22
2Group A. Singla, Max Planck Institute for Software Systems, Max Planck Society, ou_2541698

Content

show

hide

Free keywords: Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Cryptography and Security, cs.CR

Abstract: We study black-box reward poisoning attacks against reinforcement learning
(RL), in which an adversary aims to manipulate the rewards to mislead a
sequence of RL agents with unknown algorithms to learn a nefarious policy in an
environment unknown to the adversary a priori. That is, our attack makes
minimum assumptions on the prior knowledge of the adversary: it has no initial
knowledge of the environment or the learner, and neither does it observe the
learner's internal mechanism except for its performed actions. We design a
novel black-box attack, U2, that can provably achieve a near-matching
performance to the state-of-the-art white-box attack, demonstrating the
feasibility of reward poisoning even in the most challenging black-box setting.

Details

show

hide

Language(s): eng - English

Dates: Created: 2021-02-16Published Online: 2021

Publication Status: Published online

Pages: 22 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 2102.08492
URI: https://arxiv.org/abs/2102.08492
BibTex Citekey: Raksha2021

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show