Confidence-Calibrated Adversarial Training and Detection: More Robust Models 
Generalizing Beyond the Attack Used During Training

Stutz, David; Hein, Matthias; Schiele, Bernt

DetailsSummary

Confidence-Calibrated Adversarial Training and Detection: More Robust Models Generalizing Beyond the Attack Used During Training

Stutz, D., Hein, M., & Schiele, B. (2019). Confidence-Calibrated Adversarial Training and Detection: More Robust Models Generalizing Beyond the Attack Used During Training. Retrieved from http://arxiv.org/abs/1910.06259.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/21.11116/0000-0005-5559-8 Version Permalink: https://hdl.handle.net/21.11116/0000-0005-555A-7

Genre: Paper

Files

show Files

hide Files

:

arXiv:1910.06259.pdf (Preprint), 2MB

View Save

File Permalink:
https://hdl.handle.net/21.11116/0000-0005-555B-6

Name:
arXiv:1910.06259.pdf

Description:
File downloaded from arXiv at 2019-12-09 13:21

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Locators

show

Creators

show

hide

Creators:
Stutz, David¹, Author
Hein, Matthias², Author
Schiele, Bernt¹, Author

Affiliations:
1Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society, ou_1116547
2External Organizations, ou_persistent22

Content

show

hide

Free keywords: Computer Science, Learning, cs.LG,Computer Science, Cryptography and Security, cs.CR,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Statistics, Machine Learning, stat.ML

Abstract: Adversarial training is the standard to train models robust against
adversarial examples. However, especially for complex datasets, adversarial
training incurs a significant loss in accuracy and is known to generalize
poorly to stronger attacks, e.g., larger perturbations or other threat models.
In this paper, we introduce confidence-calibrated adversarial training (CCAT)
where the key idea is to enforce that the confidence on adversarial examples
decays with their distance to the attacked examples. We show that CCAT
preserves better the accuracy of normal training while robustness against
adversarial examples is achieved via confidence thresholding, i.e., detecting
adversarial examples based on their confidence. Most importantly, in strong
contrast to adversarial training, the robustness of CCAT generalizes to larger
perturbations and other threat models, not encountered during training. For
evaluation, we extend the commonly used robust test error to our detection
setting, present an adaptive attack with backtracking and allow the attacker to
select, per test example, the worst-case adversarial example from multiple
black- and white-box attacks. We present experimental results using $L_\infty$,
$L_2$, $L_1$ and $L_0$ attacks on MNIST, SVHN and Cifar10.

Details

show

hide

Language(s): eng - English

Dates: Created: 2019-10-14Modified: 2019-11-25Published Online: 2019

Publication Status: Published online

Pages: -

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1910.06259
URI: http://arxiv.org/abs/1910.06259
BibTex Citekey: Stutz_arXiv1910.06259

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show