AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
View the video

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
As we will show in the experiments, contextualized infilling produces more fluent and grammatical outputs compared to the context-agnostic counterparts, especially when using masked language models trained on large-scale data

Contextualized Perturbation for Textual Adversarial Attack

NAACL-HLT, pp.5053-5069, (2021)

Cited: 30|Views40856
EI
Full Text
Bibtex
Weibo

Abstract

Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. Thi...More

Code:

Data:

0
Introduction
  • Adversarial example generation for natural language processing (NLP) tasks aims to perturb input text to trigger errors in machine learning models, while keeping the output close to the original.
  • Generating adversarial examples for NLP tasks can be challenging, in part due to the discrete nature of natural language text.
  • Adversarial example generation centers around a victim model f , which the authors assume is a text classifier.
Highlights
  • Adversarial example generation for natural language processing (NLP) tasks aims to perturb input text to trigger errors in machine learning models, while keeping the output close to the original
  • We evaluate CLARE on text classification, natural language inference, and sentence paraphrase tasks, by attacking finetuned BERT models (Devlin et al, 2019)
  • Our analysis further suggests that the CLARE can be used to improve the robustness of the downstream models, and improve their accuracy when the available training data is limited
  • As we will show in the experiments (§3.3), contextualized infilling produces more fluent and grammatical outputs compared to the context-agnostic counterparts, especially when using masked language models trained on large-scale data
  • 9 In preliminary experiments, we found that it is more difficult to use other models to attack a victim model trained with the adversarial examples generated by CLARE, than to use CLARE itself
  • We have presented CLARE, a contextualized adversarial example generation model for text
Methods
  • The authors evaluate CLARE on text classification, natural language inference, and sentence paraphrase tasks.
  • Merge perturbation can only merge noun phrases, extracted with the NLTK toolkit. The authors find that this helps produce more grammatical outputs
Results
  • With a very limited set of synonym candidates from WordNet, PWWS fails to attack a BERT model on most of inputs.
  • Using word embeddings to find synonyms, TextFooler achieves a higher success rate, but tends to produce less grammatical and less natural outputs.
  • Equipped with a language model, TextFooler+LM does better in terms of perplexity, yet this brings little grammaticality improvement and comes at a cost to attack success rate.
Conclusion
  • A key technique of CLARE is the local mask--infill perturbation. This comes with several advantages.
  • Generating adversarial examples with masked language models is explored by a concurrent work (Li et al, 2020)
  • Their method is similar to a CLARE model except that it only uses the Replace action.
  • As shown in the ablation study (§4.1), using all three actions helps CLARE achieve a better attack performance.The authors have presented CLARE, a contextualized adversarial example generation model for text.
  • The authors release the code and models at https://github.com/cookielee77/CLARE
Tables
  • Table1: Some statistics of datasets. The last column indicates the victim model’s accuracy on the original test set without adversarial attack
  • Table2: Adversarial example generation performance in attack success rate (A-rate), modification rate (Mod), perplexity (PPL), number of increased grammar errors (GErr), and textual similarity (Sim). The perplexity of the original inputs is indicated in parentheses for each dataset. Bold font indicates the best performance for each metric
  • Table3: Human evaluation performance in percentage on the AG News dataset. ± indicates confidence intervals with a 95% confidence level
  • Table4: Ablation study results. “w/o sim > ” ablates the textual similarity constraint when constructing the candidate sets, while “w/o pMLM > k” ablates the masked language model probability constraint
  • Table5: Results of CLARE implemented with different masked language models (MLM)
  • Table6: Adversarial training results on AG news test set. “Acc” indicates accuracy
  • Table7: Top: Top-3 POS tags (or POS tag bigrams) and their percentages for each perturbation type. (a, b): insert a token between a and b. a-b: merge a and b into a token. Bottom: An AG news sample, where CLARE perturbs token “cybersecurity.” Neither PWWS nor TextFooler is able to attack this token since it is out of their vocabularies
  • Table8: Adversarial examples produced by different models. The gold label of the original is shown below the (bolded) dataset name. Replace, Insert and Merge are highlighted in italic red, bold blue and sans serif yellow, respectively. (Best viewed in color)
  • Table9: Table 9
  • Table10: Compared with all baselines, CLARE achieves the best performance on attack success rate, perplexity, grammaticality, and similarity. It is consistent with our observation in §3.3. Adversarial example generation performance in attack success rate (A-rate), modification rate (Mod), perplexity (PPL), number of increased grammar errors (GErr), and text similarity (Sim). The perplexity of the original inputs is indicated in parentheses for each dataset. Bold indicates the best performance on each metric
  • Table11: Speed experiment with different attack models
Download tables as Excel
Related work
Funding
  • The trend is similar for fluency & grammaticality (42% vs. 9%)
  • As shown in Table 6, when the full training data is available, adversarial training slightly decreases the test accuracy by 0.2% and 0.4% respectively
  • As shown in Table 6, in 3 out of the 4 cases, adversarial training helps to decrease the attack success rate by more than 10.2%, and to increase the number of modifications needed by more than 0.7
Study subjects and analysis
datasets: 4
Table 1 summarizes some statistics of the datasets. In addition to the above four datasets, we experiment with DBpedia ontology dataset (Zhang et al, 2015), Stanford sentiment treebank (Socher et al, 2013), Microsoft Research Paraphrase Corpus (Dolan and Brockett, 2005), and Quora Question Pairs from the GLUE benchmark. The results on these datasets are summarized in Appendix A.2

datasets: 4
Table 2 summarizes the results. Although PWWS achieves the best modification rate on 3 out of the 4 datasets, it underperforms CLARE in terms of other metrics. With a very limited set of synonym candidates from WordNet, PWWS fails to attack a BERT model on most of inputs

cases: 4
A higher success rate and fewer modifications indicate a victim classifier is more vulnerable to adversarial attacks. As shown in Table 6, in 3 out of the 4 cases, adversarial training helps to decrease the attack success rate by more than 10.2%, and to increase the number of modifications needed by more than 0.7. The only exception is the TextCNN model trained with 10% data

Reference
  • Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. 2018. Generating natural language adversarial examples. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175.
    Findings
  • Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust neural machine translation with doubly adversarial inputs. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • William B Dolan and Chris Brockett. 200Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing.
    Google ScholarLocate open access versionFindings
  • Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. Hotflip: White-box adversarial examples for text classification. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018. Black-box generation of adversarial text sequences to evade deep learning classifiers. In IEEE Security and Privacy Workshops (SPW).
    Google ScholarLocate open access versionFindings
  • Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2018. Non-autoregressive neural machine translation. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Mohit Iyyer, John Wieting, Kevin Gimpel, and Luke Zettlemoyer. 2018. Adversarial example generation with syntactically controlled paraphrase networks. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Robin Jia, Aditi Raghunathan, Kerem Goksel, and Percy Liang. 2019. Certified robustness to adversarial word substitutions. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is bert really robust? natural language attack on text classification and entailment. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Erik Jones, Robin Jia, Aditi Raghunathan, and Percy Liang. 2020. Robust encodings: A framework for combating adversarial typos. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Katharina Kann, Sascha Rothe, and Katja Filippova. 2018. Sentence-level fluency evaluation: References help, but can be spared! In Proc. of CNLP, pages 313–323.
    Google ScholarLocate open access versionFindings
  • Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight poisoning attacks on pre-trained models. arXiv preprint arXiv:2004.06660.
    Findings
  • Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic non-autoregressive neural sequence modeling by iterative refinement. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2018. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271.
    Findings
  • Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. 2016. Visualizing and understanding neural models in nlp. In Proc. of NAACL, pages 681–691.
    Google ScholarLocate open access versionFindings
  • Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, and Xipeng Qiu. 2020. Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984.
    Findings
  • Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2019. Deep text classification can be fooled. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    Findings
  • Xuezhe Ma, Chunting Zhou, Xian Li, Graham Neubig, and Eduard Hovy. 2019. Flowseq: Nonautoregressive conditional sequence generation with generative flow. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Ga Miller. 1995. Wordnet: A lexical database for english communications of the acm vol. 38.
    Google ScholarLocate open access versionFindings
  • John X Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. 2020. Reevaluating adversarial examples in natural language. arXiv preprint arXiv:2004.14174.
    Findings
  • Nikola Mrksic, Diarmuid O Seaghdha, Blaise Thomson, Milica Gasic, Lina M Rojas Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2016. Counter-fitting word vectors to linguistic constraints. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Daniel Naber et al. 2003. A rule-based style and grammar checker. Citeseer.
    Google ScholarFindings
  • Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Danish Pruthi, Bhuwan Dhingra, and Zachary C Lipton. 2019. Combating adversarial misspellings with robust word recognition. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Jipeng Qiang, Yun Li, Yi Zhu, and Yunhao Yuan. 2020. A simple bert-based approach for lexical simplification. In Proc. of AAAI.
    Google ScholarLocate open access versionFindings
  • Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.
    Google ScholarLocate open access versionFindings
  • Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating natural language adversarial examples through probability weighted word saliency. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Yi Ren, Jinglin Liu, Xu Tan, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2020. A study of non-autoregressive model for sequence generation. arXiv preprint arXiv:2004.10454.
    Findings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, and Zhihong Deng. 2019. Fast structured decoding for sequence models. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2018. Robustness may be at odds with accuracy. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Prashanth Vijayaraghavan and Deb Roy. 2019. Generating black-box adversarial examples for text classifiers using a deep reinforced model. arXiv preprint arXiv:1909.07873.
    Findings
  • Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal adversarial triggers for attacking and analyzing nlp. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019a. Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Boxin Wang, Hengzhi Pei, Han Liu, and Bo Li. 2019b. Advcodec: Towards a unified framework for adversarial text generation. arXiv preprint arXiv:1912.10375.
    Findings
  • Xiaosen Wang, Hao Jin, and Kun He. 2019c. Natural language adversarial attacks and defenses in word level. arXiv preprint arXiv:1909.06723.
    Findings
  • Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proc. of NAACL.
    Google ScholarLocate open access versionFindings
  • Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
    Findings
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Semantically equivalent adversarial rules for debugging nlp models. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, and Songlin Hu. 2019a. Conditional bert contextual augmentation. In Proc. of ICCS.
    Google ScholarLocate open access versionFindings
  • Suranjana Samanta and Sameep Mehta. 2017. Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812.
    Findings
  • Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019.
    Google ScholarFindings
  • Xing Wu, Tao Zhang, Liangjun Zang, Jizhong Han, and Songlin Hu. 2019b. ”mask and infill”: Applying masked language model to sentiment transfer. In Proc. of IJCAI.
    Google ScholarLocate open access versionFindings
  • Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, and Maosong Sun. 2020. Word-level textual adversarial attacking as combinatorial optimization. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Huangzhao Zhang, Hao Zhou, Ning Miao, and Lei Li. 2019. Generating fluent adversarial examples for natural languages. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Wei Emma Zhang, Quan Z Sheng, Ahoud Alhazmi, and Chenliang Li. 2020a. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–41.
    Google ScholarLocate open access versionFindings
  • Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proc. of NeurIPS.
    Google ScholarLocate open access versionFindings
  • Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and Bill Dolan. 2020b. Pointer: Constrained text generation via insertionbased generative pre-training. arXiv preprint arXiv:2005.00558.
    Findings
  • Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating natural adversarial examples. In Proc. of ICLR.
    Google ScholarLocate open access versionFindings
  • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, and Ming Zhou. 2019a. Bert-based lexical substitution. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, and Wei Wang. 2019b. Learning to discriminate perturbations for blocking adversarial attacks in text classification. In Proc. of EMNLP.
    Google ScholarLocate open access versionFindings
  • Wei Zou, Shujian Huang, Jun Xie, Xinyu Dai, and Jiajun Chen. 2020. A reinforced generation of adversarial samples for neural machine translation. In Proc. of ACL.
    Google ScholarLocate open access versionFindings
  • Model Implementation. All pretrained models and victim models based on RoBERTa and BERTbase are implemented with Hugging Face transformers10 (Wolf et al., 2019) based on PyTorch (Paszke et al., 2019). RoBERTadistill, RoBERTabase and uncase BERTbase models have 82M, 125M and 110M parameters, respectively. We use RoBERTadistill as our main backbone for fast inference purpose. PWWS11 and TextFooler12 are built with their open source implementation provided by the authors. In the implementation of TextFooler+LM, we use small sized GPT-2 language model (Radford et al., 2019) to further select those candidate tokens that have top 20% perplexity in the candidate token set. In the adversarial training (§4.2), the small TextCNN victim model (Kim, 2014) has 128 embedding size and 100 filters for 3, 4, 5 window size with 0.5 dropout, resulting in 7M parameters.
    Google ScholarLocate open access versionFindings
  • Evaluation Metric. The similarity function sim builds on the universal sentence encoder (USE; Cer et al., 2018) to measure a local similarity at the perturbation position with window size 15 between the original input and its adversary. All baselines are equipped this sim when constructing the candidate vocabulary. The evaluation metric Sim uses USE to calculate a global similarity between two texts. These procedures are typically following Jin et al. (2020). We mostly rely on human evaluation (§3.3) to conclude the significant advantage of preserving textual similarity on CLARE compared with TextFooler.
    Google ScholarLocate open access versionFindings
  • Data Processing. When processing the data, we keep all punctuation in texts for both victim model training and attacking. Since GLUE benchmark (Wang et al., 2019a) does not provide the label for test set, we instead use its dev set as the the test set for the included datasets (MNLI, QNLI, QQP, MRPC, SST-2) in the evaluation. For
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn