English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Defense Against Reward Poisoning Attacks in Reinforcement Learning

Banihashem, K., Singla, A., & Radanovic, G. (2021). Defense Against Reward Poisoning Attacks in Reinforcement Learning. Retrieved from https://arxiv.org/abs/2102.05776.

Item is

Files

show Files
hide Files
:
arXiv:2102.05776.pdf (Preprint), 968KB
Name:
arXiv:2102.05776.pdf
Description:
File downloaded from arXiv at 2022-02-14 10:18
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Banihashem, Kiarash1, Author           
Singla, Adish1, Author                 
Radanovic, Goran2, Author           
Affiliations:
1Group A. Singla, Max Planck Institute for Software Systems, Max Planck Society, ou_2541698              
2Group K. Gummadi, Max Planck Institute for Software Systems, Max Planck Society, ou_2105291              

Content

show
hide
Free keywords: Computer Science, Learning, cs.LG,Computer Science, Artificial Intelligence, cs.AI
 Abstract: We study defense strategies against reward poisoning attacks in reinforcement
learning. As a threat model, we consider attacks that minimally alter rewards
to make the attacker's target policy uniquely optimal under the poisoned
rewards, with the optimality gap specified by an attack parameter. Our goal is
to design agents that are robust against such attacks in terms of the
worst-case utility w.r.t. the true, unpoisoned, rewards while computing their
policies under the poisoned rewards. We propose an optimization framework for
deriving optimal defense policies, both when the attack parameter is known and
unknown. Moreover, we show that defense policies that are solutions to the
proposed optimization problems have provable performance guarantees. In
particular, we provide the following bounds with respect to the true,
unpoisoned, rewards: a) lower bounds on the expected return of the defense
policies, and b) upper bounds on how suboptimal these defense policies are
compared to the attacker's target policy. We conclude the paper by illustrating
the intuitions behind our formal results, and showing that the derived bounds
are non-trivial.

Details

show
hide
Language(s): eng - English
 Dates: 2021-02-102021-06-202021
 Publication Status: Published online
 Pages: 36 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 2102.05776
URI: https://arxiv.org/abs/2102.05776
BibTex Citekey: Banihashem2021
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show