English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

RAID: Randomized Adversarial-Input Detection for Neural Networks

MPS-Authors
/persons/resource/persons266149

Eniser,  Hassan Ferit
Group M. Christakis, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons231014

Christakis,  Maria
Group M. Christakis, Max Planck Institute for Software Systems, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2002.02776.pdf
(Preprint), 9KB

Supplementary Material (public)
There is no public supplementary material available
Citation

Eniser, H. F., Christakis, M., & Wüstholz, V. (2021). RAID: Randomized Adversarial-Input Detection for Neural Networks. Retrieved from https://arxiv.org/abs/2002.02776.


Cite as: https://hdl.handle.net/21.11116/0000-0009-6F56-B
Abstract
In recent years, neural networks have become the default choice for image
classification and many other learning tasks, even though they are vulnerable
to so-called adversarial attacks. To increase their robustness against these
attacks, there have emerged numerous detection mechanisms that aim to
automatically determine if an input is adversarial. However, state-of-the-art
detection mechanisms either rely on being tuned for each type of attack, or
they do not generalize across different attack types. To alleviate these
issues, we propose a novel technique for adversarial-image detection, RAID,
that trains a secondary classifier to identify differences in neuron activation
values between benign and adversarial inputs. Our technique is both more
reliable and more effective than the state of the art when evaluated against
six popular attacks. Moreover, a straightforward extension of RAID increases
its robustness against detection-aware adversaries without affecting its
effectiveness.