Measuring the Robustness of Neural Networks via Minimal Adversarial Examples
Abstract
Neural networks are highly sensitive to adversarial examples, which cause large output deviations with only small input perturbations. However, little is known quantitatively about the distribution and prevalence of such adversarial examples. To address this issue, we propose a rigorous search method that provably finds the smallest possible adversarial example. The key benefit of our method is that it gives precise quantitative insight into the distribution of adversarial examples, and guarantees the absence of adversarial examples if they are not found. The primary idea is to consider the nonlinearity exhibited by the network in a small region of the input space, and search exhaustively for adversarial examples in that region. We show that the frequency of adversarial examples and robustness of neural networks is up to twice as large as reported in previous works that use empirical adversarial attacks. In addition, we provide an approach to approximate the nonlinear behavior of neural networks, that makes our search method computationally feasible.
Attached Files
Submitted - measuring-robustness-neural__6_.pdf
Files
Name | Size | Download all |
---|---|---|
md5:3486c3ef2d5cb30cca7897d42d617f7a
|
297.5 kB | Preview Download |
Additional details
- Eprint ID
- 83561
- Resolver ID
- CaltechAUTHORS:20171128-230807299
- Created
-
2017-11-30Created from EPrint's datestamp field
- Updated
-
2019-10-03Created from EPrint's last_modified field