Defending Against Attacks Tailored to Transfer Learning Via Feature Distancing

Sangwoo Ji,Namgyu Park,Dongbin Na,Bin Zhu,Jong Kim

SSRN Electronic Journal（2021）

引用 0|浏览0

暂无评分

摘要

Transfer learning is preferable for training a deep neural network with a small training dataset by leveraging a pre-trained teacher model. However, transfer learning opens a door for new attacks that generate adversarial examples using the pre-trained teacher model. In this paper, we propose a novel method called feature distancing to defend against adversarial attacks tailored to transfer learning. The method aims to train a student model with a distinct feature representation from the teacher model. We generate adversarial examples of the mimic attack with the teacher model, and the examples are used to train the student model. We use triplet loss to put the mimic attack examples close to their source images and far from their target images in the feature space of the student model. The proposed method is evaluated on three different transfer learning tasks with diverse attack configurations. It is the only method that achieves high “robust accuracy” and high “test accuracy” on every task we evaluate.

查看译文

关键词

68T45,68T05

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要