English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Confidence-Calibrated Adversarial Training and Detection: More Robust Models Generalizing Beyond the Attack Used During Training

Stutz, D., Hein, M., & Schiele, B. (2019). Confidence-Calibrated Adversarial Training and Detection: More Robust Models Generalizing Beyond the Attack Used During Training. Retrieved from http://arxiv.org/abs/1910.06259.

Item is

Files

show Files
hide Files
:
arXiv:1910.06259.pdf (Preprint), 2MB
Name:
arXiv:1910.06259.pdf
Description:
File downloaded from arXiv at 2019-12-09 13:21
OA-Status:
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Stutz, David1, Author           
Hein, Matthias2, Author
Schiele, Bernt1, Author           
Affiliations:
1Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society, ou_1116547              
2External Organizations, ou_persistent22              

Content

show
hide
Free keywords: Computer Science, Learning, cs.LG,Computer Science, Cryptography and Security, cs.CR,Computer Science, Computer Vision and Pattern Recognition, cs.CV,Statistics, Machine Learning, stat.ML
 Abstract: Adversarial training is the standard to train models robust against
adversarial examples. However, especially for complex datasets, adversarial
training incurs a significant loss in accuracy and is known to generalize
poorly to stronger attacks, e.g., larger perturbations or other threat models.
In this paper, we introduce confidence-calibrated adversarial training (CCAT)
where the key idea is to enforce that the confidence on adversarial examples
decays with their distance to the attacked examples. We show that CCAT
preserves better the accuracy of normal training while robustness against
adversarial examples is achieved via confidence thresholding, i.e., detecting
adversarial examples based on their confidence. Most importantly, in strong
contrast to adversarial training, the robustness of CCAT generalizes to larger
perturbations and other threat models, not encountered during training. For
evaluation, we extend the commonly used robust test error to our detection
setting, present an adaptive attack with backtracking and allow the attacker to
select, per test example, the worst-case adversarial example from multiple
black- and white-box attacks. We present experimental results using $L_\infty$,
$L_2$, $L_1$ and $L_0$ attacks on MNIST, SVHN and Cifar10.

Details

show
hide
Language(s): eng - English
 Dates: 2019-10-142019-11-252019
 Publication Status: Published online
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 1910.06259
URI: http://arxiv.org/abs/1910.06259
BibTex Citekey: Stutz_arXiv1910.06259
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show