RAID: Randomized Adversarial-Input Detection for Neural Networks

Eniser, Hassan Ferit; Christakis, Maria; Wüstholz, Valentin

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Paper

RAID: Randomized Adversarial-Input Detection for Neural Networks

MPS-Authors

/persons/resource/persons266149

Eniser, Hassan Ferit
Group M. Christakis, Max Planck Institute for Software Systems, Max Planck Society;

/persons/resource/persons231014

Christakis, Maria
Group M. Christakis, Max Planck Institute for Software Systems, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2002.02776.pdf
(Preprint), 9KB

Supplementary Material (public)

There is no public supplementary material available

Citation

Eniser, H. F., Christakis, M., & Wüstholz, V. (2021). RAID: Randomized Adversarial-Input Detection for Neural Networks. Retrieved from https://arxiv.org/abs/2002.02776.

Cite as: https://hdl.handle.net/21.11116/0000-0009-6F56-B

Abstract

In recent years, neural networks have become the default choice for image
classification and many other learning tasks, even though they are vulnerable
to so-called adversarial attacks. To increase their robustness against these
attacks, there have emerged numerous detection mechanisms that aim to
automatically determine if an input is adversarial. However, state-of-the-art
detection mechanisms either rely on being tuned for each type of attack, or
they do not generalize across different attack types. To alleviate these
issues, we propose a novel technique for adversarial-image detection, RAID,
that trains a secondary classifier to identify differences in neuron activation
values between benign and adversarial inputs. Our technique is both more
reliable and more effective than the state of the art when evaluated against
six popular attacks. Moreover, a straightforward extension of RAID increases
its robustness against detection-aware adversaries without affecting its
effectiveness.