Enhancing adversarial example transferability with an intermediate level attack

Publikation: Bidrag til tidsskrift › Konferenceartikel › Forskning › fagfællebedømt

Qian Huang
Isay Katsman
Zeqi Gu
Horace He
Belongie, Serge
Ser Nam Lim

Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples using intermediate feature maps.

Originalsprog	Engelsk
Tidsskrift	Proceedings of the IEEE International Conference on Computer Vision
Sider (fra-til)	4732-4741
Antal sider	10
ISSN	1550-5499
DOI	https://doi.org/10.1109/ICCV.2019.00483
Status	Udgivet - okt. 2019
Begivenhed	17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Sydkorea Varighed: 27 okt. 2019 → 2 nov. 2019

Konference

Konference	17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Land	Sydkorea
By	Seoul
Periode	27/10/2019 → 02/11/2019
Sponsor	Computer Vision Foundation, IEEE

Bibliografisk note

ID: 301824085

Niels Bohr Institutet