Defense Against Reward Poisoning Attacks in Reinforcement Learning

Banihashem, Kiarash; Singla, Adish; Radanovic, Goran

Defense Against Reward Poisoning Attacks in Reinforcement Learning

Banihashem, K., Singla, A., & Radanovic, G. (2021). Defense Against Reward Poisoning Attacks in Reinforcement Learning. Retrieved from https://arxiv.org/abs/2102.05776.

Item is 公開

表示: 全項目非表示: 全項目

基本情報

表示: 非表示:

アイテムのパーマリンク: https://hdl.handle.net/21.11116/0000-0009-F7F4-D 版のパーマリンク: https://hdl.handle.net/21.11116/0000-000E-7F49-3

資料種別: 成果報告書

ファイル

表示: ファイル

非表示: ファイル

:

arXiv:2102.05776.pdf (プレプリント), 968KB

表示保存

ファイルのパーマリンク:
https://hdl.handle.net/21.11116/0000-0009-F7F6-B

ファイル名:
arXiv:2102.05776.pdf

説明:
File downloaded from arXiv at 2022-02-14 10:18

OA-Status:

閲覧制限:
公開

MIMEタイプ / チェックサム:
application/pdf / [MD5]

技術的なメタデータ:

表示

著作権日付:
-

著作権情報:
-

CCライセンス:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

作成者

表示:

非表示:

作成者:
Banihashem, Kiarash¹, 著者
Singla, Adish¹, 著者
Radanovic, Goran², 著者

所属:
1Group A. Singla, Max Planck Institute for Software Systems, Max Planck Society, ou_2541698
2Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society, ou_2105291

内容説明

表示:

非表示:

キーワード: Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI

要旨: We study defense strategies against reward poisoning attacks in reinforcement
learning. As a threat model, we consider attacks that minimally alter rewards
to make the attacker's target policy uniquely optimal under the poisoned
rewards, with the optimality gap specified by an attack parameter. Our goal is
to design agents that are robust against such attacks in terms of the
worst-case utility w.r.t. the true, unpoisoned, rewards while computing their
policies under the poisoned rewards. We propose an optimization framework for
deriving optimal defense policies, both when the attack parameter is known and
unknown. Moreover, we show that defense policies that are solutions to the
proposed optimization problems have provable performance guarantees. In
particular, we provide the following bounds with respect to the true,
unpoisoned, rewards: a) lower bounds on the expected return of the defense
policies, and b) upper bounds on how suboptimal these defense policies are
compared to the attacker's target policy. We conclude the paper by illustrating
the intuitions behind our formal results, and showing that the derived bounds
are non-trivial.

資料詳細

表示:

非表示:

言語: eng - English

日付: 作成: 2021-02-10修正: 2021-06-20オンライン出版: 2021

出版の状態: オンラインで出版済み

ページ: 36 p.

出版情報: -

目次: -

査読: -

識別子（DOI, ISBNなど）: arXiv: 2102.05776
URI: https://arxiv.org/abs/2102.05776
BibTex参照ID: Banihashem2021

学位: -

アイテム詳細

基本情報

ファイル

関連URL

作成者

内容説明

資料詳細

関連イベント

訴訟

Project information

出版物