Transfer2Attack: Text Adversarial Attack with Cross-Domain Interpretability

arxiv(2020)

引用 0|浏览45
暂无评分
摘要
Training robust deep learning models for down-stream tasks is a critical challenge. Research has shown that common down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Transfer2Attack, a black-box adversarial attack on text classification task, that employs cross-domain interpretability to reduce target model queries during attack. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
更多
查看译文
关键词
text adversarial transfer2attack,interpretability,cross-domain
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要